Convert Fairseq wav2vec2 to HF

This code found in this repository was adapted from this original HuggingFace repository. This repository contains two scripts that convert a fairseq wav2vec2 checkpoint to HuggingFace 🤗 Transformers.

Procedure

Create a HF repo :

huggingface-cli repo create <name_of_model> --organization <org_of_model>
git clone https://huggingface.co/<org_of_model>/<name_of_model>

Convert the model

./run_convert.sh \
    --hf-path </path/to/local/hf/repo> \
    --fairseq-path </path/to/fairseq/checkpoint> \
    --size {base, large} \
    [--dict </path/to/dict>] \
    [--copy-fairseq-model]

Verify that models are equal

./run_forward.py \
    --hf-path </path/to/local/hf/repo> \
    --fairseq-path </path/to/fairseq/checkpoint> \
    [--finetuned]

Push to hub

huggingface-cli upload <your-org>/wav2vec2-MFE-0.5K-base </path/to/local/hf/repo>

Changelog

convert_wav2vec2_original_pytorch_checkpoint_to_pytorch.py (originally from official huggingface /transformers) was modifier.

It correctly remaps :

wav2vec2.encoder.pos_conv_embed.conv.weight_g to wav2vec2.encoder.pos_conv_embed.conv.parametrizations.weight.original0
wav2vec2.encoder.pos_conv_embed.conv.weight_v to wav2vec2.encoder.pos_conv_embed.conv.parametrizations.weight.original1

The current version of script should (not tested) also be able to correctly handle old weight_g/weight_v. Beware, conversion of finetuned model was not tested with the current version of the script.

sampling_rate and do_normalize are both extracted from the fairseq's original configuration (e.g. cfg['task']['sample_rate']) instead of being guessed.
Create preprocessor_config.json which the original didn't do for pre-trained (i.e. non-finetuned models)

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
convert_wav2vec2_original_pytorch_checkpoint_to_pytorch.py		convert_wav2vec2_original_pytorch_checkpoint_to_pytorch.py
run_convert.sh		run_convert.sh
run_forward.py		run_forward.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Convert Fairseq wav2vec2 to HF

Procedure

Changelog

About

Languages

LLL-Orleans/convert_wav2vec2_to_hf

Folders and files

Latest commit

History

Repository files navigation

Convert Fairseq wav2vec2 to HF

Procedure

Changelog

About

Resources

Stars

Watchers

Forks

Languages