Skip to content

Latest commit

 

History

History
36 lines (21 loc) · 1.89 KB

README.md

File metadata and controls

36 lines (21 loc) · 1.89 KB

HiFiVC

This repository contains implementation of HiFi-VC paper. Model structure is based on analysis of graph and code methods of TorchScript checkpoint provided by the authors of the paper. Most of the missing details were recovered. In addition, repository containt pre-trained versions of Speaker Encoder: VAE-part of the original solution and ECAPA-TDNN taken from available implementation.

Differences from the article

Currently, this implementation does not support F0 training. However, authors reported results are not that different with or without F0.

To stabilize training, Extra-Adam implementation was added based on this repo.

Installation

Install all packages using pip install -r requirements.txt.

If you want to use pre-trained VAE, run the following script:

pip install gdown
gdown 1oFwMeuQtwaBEyOFkyG7c7LfBQiRe3RdW -O "model.pt"

Training

To run the experiment, run the following command:

python3 train.py -cn CONFIG_NAME +dataset.data_path=PATH_TO_WAV48_DIR

Where CONFIG_NAME is the name of the file (without .yaml) from src/configs folder, and PATH_TO_WAV48_DIR is the path to the VCTK dataset. For example, in Kaggle the path may look like this: /kaggle/input/vctk-corpus/VCTK-Corpus/VCTK-Corpus/wav48.

Note: add HYDRA_FULL_ERROR=1 before python3 to see errors.

Credits

Official repository (only inference). Extra-Adam implementation was taken from this repository and ECAPA-TDNN from this one