Replies: 8 comments
-
>>> georroussos |
Beta Was this translation helpful? Give feedback.
-
>>> baconator |
Beta Was this translation helpful? Give feedback.
-
>>> georroussos |
Beta Was this translation helpful? Give feedback.
-
>>> nmstoker |
Beta Was this translation helpful? Give feedback.
-
>>> baconator |
Beta Was this translation helpful? Give feedback.
-
>>> baconator |
Beta Was this translation helpful? Give feedback.
-
>>> nmstoker |
Beta Was this translation helpful? Give feedback.
-
>>> baconator |
Beta Was this translation helpful? Give feedback.
-
>>> baconator
[August 16, 2020, 10:03pm]
I have a 14,000 sentence dataset (clean audio, single speaker, correctly
transcribed) that I'm trying to model from. It aligned by 10k steps, now
just past 90k. For the most part it's sounding good.
Words with 'ah' or ending with long 'a' tend to have a weird rolled-r
sound after (it's like pirate speak, but unwanted). I've listened to
sentences in the dataset, and added a few to the test sentences,
including words spoken correctly in the source, and they come out as
'ar' when generated. 'Athena sprang from the head of Zeus' would end up
sounding like 'Arthenar sprang...'
I should also add I've trained the same config parameters (other than
dataset) with LJ and not had this issue.
Start over? Adjust the files we're using in the dataset? Add even more
sentences with the correct pronunciations? Everything else seems to be
good, even extremely long sentences.
[This is an archived TTS discussion thread from discourse.mozilla.org/t/one-phonemes-pronunciation-not-matching-dataset]
Beta Was this translation helpful? Give feedback.
All reactions