Skip to content

Utility scripts to convert a wav2vec2 Fairseq model to HuggingFace Transformers model (adapted from https://huggingface.co/HfSpeechUtils/convert_wav2vec2_to_hf)

Notifications You must be signed in to change notification settings

LLL-Orleans/convert_wav2vec2_to_hf

Repository files navigation

Convert Fairseq wav2vec2 to HF

This code found in this repository was adapted from this original HuggingFace repository. This repository contains two scripts that convert a fairseq wav2vec2 checkpoint to HuggingFace 🤗 Transformers.

Procedure

  1. Create a HF repo :
huggingface-cli repo create <name_of_model> --organization <org_of_model>
git clone https://huggingface.co/<org_of_model>/<name_of_model>
  1. Convert the model
./run_convert.sh \
    --hf-path </path/to/local/hf/repo> \
    --fairseq-path </path/to/fairseq/checkpoint> \
    --size {base, large} \
    [--dict </path/to/dict>] \
    [--copy-fairseq-model]
  1. Verify that models are equal
./run_forward.py \
    --hf-path </path/to/local/hf/repo> \
    --fairseq-path </path/to/fairseq/checkpoint> \
    [--finetuned]
  1. Push to hub
huggingface-cli upload <your-org>/wav2vec2-MFE-0.5K-base </path/to/local/hf/repo>

Changelog

convert_wav2vec2_original_pytorch_checkpoint_to_pytorch.py (originally from official huggingface /transformers) was modifier.

  1. It correctly remaps :
  • wav2vec2.encoder.pos_conv_embed.conv.weight_g to wav2vec2.encoder.pos_conv_embed.conv.parametrizations.weight.original0
  • wav2vec2.encoder.pos_conv_embed.conv.weight_v to wav2vec2.encoder.pos_conv_embed.conv.parametrizations.weight.original1

The current version of script should (not tested) also be able to correctly handle old weight_g/weight_v. Beware, conversion of finetuned model was not tested with the current version of the script.

  1. sampling_rate and do_normalize are both extracted from the fairseq's original configuration (e.g. cfg['task']['sample_rate']) instead of being guessed.

  2. Create preprocessor_config.json which the original didn't do for pre-trained (i.e. non-finetuned models)

About

Utility scripts to convert a wav2vec2 Fairseq model to HuggingFace Transformers model (adapted from https://huggingface.co/HfSpeechUtils/convert_wav2vec2_to_hf)

Resources

Stars

Watchers

Forks