Is this AlignTTS training healthy? #580

zubairahmed-ai · 2021-06-18T04:45:40Z

zubairahmed-ai
Jun 18, 2021

I am training AlignTTS on my custom dataset that follows the same dataset as the first speaker of VCTK dataset p225 (Mono, 48KHz)
I am training with the default learning rate 1e-4 from the aligntts_transformers.json

I just wanted to check if the training below at 2375 epoch out of 10K is going good, batch size 8 on 4x GPUs
is there anything to change?

Eval stats

Training set stats

erogol · 2021-06-18T08:59:07Z

erogol
Jun 18, 2021
Maintainer

you also need to post the figures for the alignment and the outputs

0 replies

zubairahmed-ai · 2021-06-18T09:23:24Z

zubairahmed-ai
Jun 18, 2021
Author

You're right, here's the current status at 2925 epoch

Eval

Training

Alignments

Here's the test audio from the output folder
audio

1 reply

zubairahmed-ai Jun 20, 2021
Author

Appreciate any help here

zubairahmed-ai · 2021-06-21T03:53:12Z

zubairahmed-ai
Jun 21, 2021
Author

Here's final output after training stopped
Eval

Training

Eval Figures

Test Figures

Here's the final audio sample
During the training at one point audio started to sound a tiny-bit close to my voice but the end-result is a robotic voice

Another question, I was reading AlignTTS paper and they were able to train on 2x Tesla V100 GPUs with Batch size 16 whereas I am unable to train on 4x same GPUs with anything more than batch size 8, any ideas?

0 replies

erogol · 2021-06-21T08:50:59Z

erogol
Jun 21, 2021
Maintainer

I think your spectrogram parameters are borken. Even the ground truth specs look unusual. Can you post your audio parameter from your config?

3 replies

zubairahmed-ai Jun 21, 2021
Author

aligntts_transformers.txt

Strange here's the complete json config

zubairahmed-ai Jun 21, 2021
Author

@erogol Thanks for responding, tagging you

zubairahmed-ai Jun 21, 2021
Author

Plot Dataset Statistics

erogol · 2021-06-21T15:17:34Z

erogol
Jun 21, 2021
Maintainer

sample_rate is too large. You better go for 22050 at the most.

Also try this for the audio

    // AUDIO PARAMETERS
    "audio":{
        "fft_size": 1024,         // number of stft frequency levels. Size of the linear spectogram frame.
        "win_length": 1024,      // stft window length in ms.
        "hop_length": 256,       // stft window hop-lengh in ms.
        "frame_length_ms": null, // stft window length in ms.If null, 'win_length' is used.
        "frame_shift_ms": null,  // stft window hop-lengh in ms. If null, 'hop_length' is used.

        // Audio processing parameters
        "sample_rate": 22050,   // DATASET-RELATED: wav sample-rate. If different than the original data, it is resampled.
        "preemphasis": 0.0,     // pre-emphasis to reduce spec noise and make it more structured. If 0.0, no -pre-emphasis.
        "ref_level_db": 20,     // reference level db, theoretically 20db is the sound of air.
        "log_func": "np.log",

        // Silence trimming
        "do_trim_silence": false,// enable trimming of slience of audio as you load it. LJspeech (false), TWEB (false), Nancy (true)
        "trim_db": 60,          // threshold for timming silence. Set this according to your dataset.

        // MelSpectrogram parameters
        "num_mels": 80,         // size of the mel spec frame.
        "mel_fmin": 0.0,        // minimum freq level for mel-spec. ~50 for male and ~95 for female voices. Tune for dataset!!
        "mel_fmax": 8000.0,     // maximum freq level for mel-spec. Tune for dataset!!
        "spec_gain": 1.0,         // scaler value appplied after log transform of spectrogram.

        // Normalization parameters
        "signal_norm": false,    // normalize spec values. Mean-Var normalization if 'stats_path' is defined otherwise range normalization defined by the other params.
        "min_level_db": -100,   // lower bound for normalization
        "symmetric_norm": true, // move normalization to range [-1, 1]
        "max_norm": 4.0,        // scale normalization to range [-max_norm, max_norm] or [0, max_norm]
        "clip_norm": true,      // clip normalized values into the range.
        "stats_path": null    // DO NOT USE WITH MULTI_SPEAKER MODEL. scaler stats file computed by 'compute_statistics.py'. If it is defined, mean-std based notmalization is used and other normalization params are ignored
    },  ```

8 replies

zubairahmed-ai Jun 21, 2021
Author

I had to bring back the "power" and "griffin_lim_iters" to start training
Finally I can set training batch-size 32 and eval batch size 16 and still have some GPU left, tempted to set even higher batch size

zubairahmed-ai Jun 22, 2021
Author

After doing the above changes and letting it run for a while, it looks like this

Eval

Train

Eval Figures

Test Figures

And my voice is finally beginning to sound like me

erogol Jun 22, 2021
Maintainer

let it train for a while

zubairahmed-ai Jun 22, 2021
Author

Thank you yes that's exactly the plan and I will keep an eye on your instructions on when to stop training

zubairahmed-ai Jun 23, 2021
Author

@erogol At 5600 epoch this is how my training looks like. I think it's overfit, the audio output isn't improving

If your final model does not work well at this stage, you can retrain the model with a higher weight decay, a larger dataset, or a bigger dropout rate.

I am tempting to try this, please let me know what you think? from here

audio output, my voice still sounds robotic

Eval

Train

Eval Figures

Test Figures

erogol · 2021-06-23T09:23:34Z

erogol
Jun 23, 2021
Maintainer

how large is your dataset?

9 replies

erogol Jun 23, 2021
Maintainer

Then you should try finetuning an English model we released but your dataset is too small. In general you need at least 5 hours for something reasonable.

zubairahmed-ai Jun 23, 2021
Author

oh mine is just 9mins and 16 seconds

erogol Jun 23, 2021
Maintainer

You are not fine tuning you are training a model. Unfortunately this is all I can do for you. Good luck 🤞

zubairahmed-ai Jun 23, 2021
Author

You're right. Sorry I overlooked that step even after identifying that model. I will fine-tune and let you know. Thanks for your help, much appreciated @erogol

zubairahmed-ai Jun 23, 2021
Author

I have started fine-tuning the default https://github.com/coqui-ai/TTS/releases/download/v0.0.12/tts_models--en--vctk--sc-glowtts-transformer.zip
Somewhere I read your comment that there is no default vocoder in Align-TTS, what does it mean and how to choose one?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is this AlignTTS training healthy? #580

{{title}}

Replies: 6 comments 21 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Is this AlignTTS training healthy? #580

zubairahmed-ai Jun 18, 2021

Replies: 6 comments · 21 replies

erogol Jun 18, 2021 Maintainer

zubairahmed-ai Jun 18, 2021 Author

zubairahmed-ai Jun 20, 2021 Author

zubairahmed-ai Jun 21, 2021 Author

erogol Jun 21, 2021 Maintainer

zubairahmed-ai Jun 21, 2021 Author

zubairahmed-ai Jun 21, 2021 Author

zubairahmed-ai Jun 21, 2021 Author

erogol Jun 21, 2021 Maintainer

zubairahmed-ai Jun 21, 2021 Author

zubairahmed-ai Jun 22, 2021 Author

erogol Jun 22, 2021 Maintainer

zubairahmed-ai Jun 22, 2021 Author

zubairahmed-ai Jun 23, 2021 Author

erogol Jun 23, 2021 Maintainer

erogol Jun 23, 2021 Maintainer

zubairahmed-ai Jun 23, 2021 Author

erogol Jun 23, 2021 Maintainer

zubairahmed-ai Jun 23, 2021 Author

zubairahmed-ai Jun 23, 2021 Author

zubairahmed-ai
Jun 18, 2021

Replies: 6 comments 21 replies

erogol
Jun 18, 2021
Maintainer

zubairahmed-ai
Jun 18, 2021
Author

zubairahmed-ai Jun 20, 2021
Author

zubairahmed-ai
Jun 21, 2021
Author

erogol
Jun 21, 2021
Maintainer

zubairahmed-ai Jun 21, 2021
Author

zubairahmed-ai Jun 21, 2021
Author

zubairahmed-ai Jun 21, 2021
Author

erogol
Jun 21, 2021
Maintainer

zubairahmed-ai Jun 21, 2021
Author

zubairahmed-ai Jun 22, 2021
Author

erogol Jun 22, 2021
Maintainer

zubairahmed-ai Jun 22, 2021
Author

zubairahmed-ai Jun 23, 2021
Author

erogol
Jun 23, 2021
Maintainer

erogol Jun 23, 2021
Maintainer

zubairahmed-ai Jun 23, 2021
Author

erogol Jun 23, 2021
Maintainer

zubairahmed-ai Jun 23, 2021
Author

zubairahmed-ai Jun 23, 2021
Author