TTS models since Tacotron-2 #583
Unanswered
swarajdalmia
asked this question in
General Q&A
Replies: 2 comments 3 replies
-
Have you tried any of other alternations of Tacotron2 like Double Decoder Consistency Dynamic Convolution Attention or others? We have many methods for Tacotron implemented in 🐸TTS. To my experience, the best model in terms of naturalness with less tunning is Tacotron. Other models can overdo but they need more tunning. |
Beta Was this translation helpful? Give feedback.
2 replies
-
I’d have to agree with @erogol, tacotron2 has been by far the easiest to
work with and train correctly. And this is even without DDC.
If your dataset is large enough (20-30 hours to start with) and error-free,
tacotron2 synthesizes accurately once it has learned to attend.
…On Sat, Jun 19, 2021 at 6:15 AM Eren Gölge ***@***.***> wrote:
Have you tried any of other alternations of Tacotron2 like Double Decoder
Consistency Dynamic Convolution Attention or others? We have many methods
for Tacotron implemented in 🐸TTS.
To my experience, the best model in terms of naturalness with less tunning
is Tacotron. Other models can overdo but they need more tunning.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#583 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAH2BVO5K5Z3DOFJQ7YNAZ3TTRU2HANCNFSM4663FEZQ>
.
|
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Tacotron-2 is one of the most popular models in usage out there. However, the paper was published in 2018. Though it performs well it misses/repeats words occasionally and sometimes produces gibberish as well. There have been lots of models that have come out since then. Which ones would do you think are the best and most stable ones when it comes to a more natural TTS performance and prosody in a conversational setting ? Let's assume we are working with high quality conversational voice recordings. Or do you think TC2 is still the best one out there.
Beta Was this translation helpful? Give feedback.
All reactions