Replies: 4 comments
-
>>> geneing |
Beta Was this translation helpful? Give feedback.
-
>>> erogol |
Beta Was this translation helpful? Give feedback.
-
>>> geneing |
Beta Was this translation helpful? Give feedback.
-
>>> erogol |
Beta Was this translation helpful? Give feedback.
-
>>> geneing
[September 14, 2019, 8:16pm]
I implemented the method of predicting style tokens from text alone as
described in this paper. The method
works, and the effect, while subtle, is that of a more expressive
speech. Here's an example after less than 100K steps. Sound
samples.
Check for example TestSentence_1.wav vs TestSentence_GST_1.wav.
The pairs of test sentences are generated by the same tacotron network.
For the GST wav file, the style tokens were generated by a separate
network that takes tacotron encoder output and produces style tokens.
The non GST file was generated with the style token set to zero.
[Any step by step how-to/documentation on synthesizing with a
pre-trained model?
[This is an archived TTS discussion thread from discourse.mozilla.org/t/my-progress-on-expressive-speech-synthesis]
Beta Was this translation helpful? Give feedback.
All reactions