TTS models since Tacotron-2 #583

swarajdalmia · 2021-06-19T08:26:48Z

swarajdalmia
Jun 19, 2021

Tacotron-2 is one of the most popular models in usage out there. However, the paper was published in 2018. Though it performs well it misses/repeats words occasionally and sometimes produces gibberish as well. There have been lots of models that have come out since then. Which ones would do you think are the best and most stable ones when it comes to a more natural TTS performance and prosody in a conversational setting ? Let's assume we are working with high quality conversational voice recordings. Or do you think TC2 is still the best one out there.

erogol · 2021-06-19T10:14:48Z

erogol
Jun 19, 2021
Maintainer

Have you tried any of other alternations of Tacotron2 like Double Decoder Consistency Dynamic Convolution Attention or others? We have many methods for Tacotron implemented in 🐸TTS.

To my experience, the best model in terms of naturalness with less tunning is Tacotron. Other models can overdo but they need more tunning.

2 replies

swarajdalmia Jun 19, 2021
Author

I have tried Fast-Speech-2 which i felt was worse than TC2. But since TTS models are computationally expensive to train, was looking for suggestions of 2-3 models that have a comparable/better results compared to TC2 to get started with.

erogol Jun 19, 2021
Maintainer

Then I'd suggest glow TTS or Tacotron2 with DDC

astricks · 2021-06-19T11:26:26Z

astricks
Jun 19, 2021

I’d have to agree with @erogol, tacotron2 has been by far the easiest to work with and train correctly. And this is even without DDC. If your dataset is large enough (20-30 hours to start with) and error-free, tacotron2 synthesizes accurately once it has learned to attend.

…

On Sat, Jun 19, 2021 at 6:15 AM Eren Gölge ***@***.***> wrote: Have you tried any of other alternations of Tacotron2 like Double Decoder Consistency Dynamic Convolution Attention or others? We have many methods for Tacotron implemented in 🐸TTS. To my experience, the best model in terms of naturalness with less tunning is Tacotron. Other models can overdo but they need more tunning. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#583 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAH2BVO5K5Z3DOFJQ7YNAZ3TTRU2HANCNFSM4663FEZQ> .

1 reply

swarajdalmia Jun 19, 2021
Author

Any other models you'd suggest apart from TC2 ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TTS models since Tacotron-2 #583

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

TTS models since Tacotron-2 #583

swarajdalmia Jun 19, 2021

Replies: 2 comments · 3 replies

erogol Jun 19, 2021 Maintainer

swarajdalmia Jun 19, 2021 Author

erogol Jun 19, 2021 Maintainer

astricks Jun 19, 2021

swarajdalmia Jun 19, 2021 Author

swarajdalmia
Jun 19, 2021

Replies: 2 comments 3 replies

erogol
Jun 19, 2021
Maintainer

swarajdalmia Jun 19, 2021
Author

erogol Jun 19, 2021
Maintainer

astricks
Jun 19, 2021

swarajdalmia Jun 19, 2021
Author