This code found in this repository was adapted from this original HuggingFace repository. This repository contains two scripts that convert a fairseq wav2vec2 checkpoint to HuggingFace 🤗 Transformers.
- Create a HF repo :
huggingface-cli repo create <name_of_model> --organization <org_of_model>
git clone https://huggingface.co/<org_of_model>/<name_of_model>
- Convert the model
./run_convert.sh \
--hf-path </path/to/local/hf/repo> \
--fairseq-path </path/to/fairseq/checkpoint> \
--size {base, large} \
[--dict </path/to/dict>] \
[--copy-fairseq-model]
- Verify that models are equal
./run_forward.py \
--hf-path </path/to/local/hf/repo> \
--fairseq-path </path/to/fairseq/checkpoint> \
[--finetuned]
- Push to hub
huggingface-cli upload <your-org>/wav2vec2-MFE-0.5K-base </path/to/local/hf/repo>
convert_wav2vec2_original_pytorch_checkpoint_to_pytorch.py (originally from official huggingface /transformers) was modifier.
- It correctly remaps :
wav2vec2.encoder.pos_conv_embed.conv.weight_g
towav2vec2.encoder.pos_conv_embed.conv.parametrizations.weight.original0
wav2vec2.encoder.pos_conv_embed.conv.weight_v
towav2vec2.encoder.pos_conv_embed.conv.parametrizations.weight.original1
The current version of script should (not tested) also be able to correctly handle old weight_g
/weight_v
. Beware, conversion of finetuned model was not tested with the current version of the script.
-
sampling_rate
anddo_normalize
are both extracted from the fairseq's original configuration (e.g.cfg['task']['sample_rate']
) instead of being guessed. -
Create
preprocessor_config.json
which the original didn't do for pre-trained (i.e. non-finetuned models)