Skip to content

Latest commit

 

History

History
35 lines (27 loc) · 1.38 KB

README.md

File metadata and controls

35 lines (27 loc) · 1.38 KB

MooseNet PLDA

MooseNet: A Trainable Metric for Synthesized Speech with a PLDA Module, on Arxiv.
Accepted to Speech Synthesis Workshop 12, 2023, Grenoble
Presentation slides

 

Moosenet PLDA

MooseNet is a trainable metric for synthesized speech. We experimented with SSL NN models and PLDA module. See the MooseNet-PLDA paper.

Installation

# Optional for reinstallation
conda deactivate; rm -rf env; 
# Installing new conda environment and editable pip moosenet package
conda env create --prefix ./env -f environment.yml \
  && conda activate ./env \
  && pip install -e .[dev] 

Reproducing the Experiments

  • The commands for fine-tuning a SSL models (XLS-R and Wav2Vec 2.0) to MooseNet NN on the English data from the main track can be found in ./main.sh
  • For the commands for fine-tuning MooseNet NN on main and the Chinese set from OOD track see ./ood.sh

Acknowledgements

This work was co-funded by Charles University projects GAUK 40222, SVV 260575 and the European Union (ERC, NG-NLG, 101039303). erc-logo