Additional ideas around dataset and tts output testing #253

JRMeyer · 2021-03-07T08:51:08Z

JRMeyer
Mar 7, 2021
Maintainer

>>> nmstoker
[August 10, 2020, 1:13pm]

There are already some handy tools in the repo for looking at dataset
issues (eg
here and
here).

However for a while I've had an idea in the back of my mind about
looking into the syllables present in the audio and comparing that to
the transcript text to highlight discrepancies, and seeing if it could
be semi-automated to save time.

There are various ways you could do this, including with use of speech
recognition on the audio side, but I identified an approach for the
audio that works tolerably well (it's not perfect but seems to work
reasonably well).

## It's presented in a Gist here: https://gist.github.com/nmstoker/f1590847a16b66ab22c16722aac1cc51

If people think it might be useful added to the repo, I'm happy to do a
PR.

It uses a library called
parselmouth in turn
calling a Praat script for the audio. For the text syllables there is a
handy little library called
syllapy

I ran it on LJSpeech 1.1 as
that's what people often use here at least for experimentation. That
dataset is a well produced dataset, but it actually did identify one
particular case with a clear problem.

For new / private / self-produced datasets this could be a very useful
way to avoiding the need to manually inspect each audio/transcript pair.
At the very least it lets you initially target such efforts.

And there is also scope on running it on audio output from TTS to see
that there aren't cases of repeating words (ie as often happens when
there are stopnet issues). You could create a large-ish batch of new
transcript sentences to test, fire these at TTS using requests to create
the audio files and then run the comparison between the audio and
transcript to focus on problem cases.

With Praat, there are potentially options to go a bit further than
purely syllables (eg to use their 'voice report' (some details
),
so if people have feedback or suggestions before adding this, do fire
away

[This is an archived TTS discussion thread from discourse.mozilla.org/t/additional-ideas-around-dataset-and-tts-output-testing]

JRMeyer · 2021-03-07T08:51:11Z

JRMeyer
Mar 7, 2021
Maintainer Author

[Archived] Additional ideas around dataset and tts output testing

>>> nmstoker
[August 10, 2020, 1:13pm]

There are already some handy tools in the repo for looking at dataset
issues (eg
here and
here).

However for a while I've had an idea in the back of my mind about
looking into the syllables present in the audio and comparing that to
the transcript text to highlight discrepancies, and seeing if it could
be semi-automated to save time.

There are various ways you could do this, including with use of speech
recognition on the audio side, but I identified an approach for the
audio that works tolerably well (it's not perfect but seems to work
reasonably well).

## It's presented in a Gist here: https://gist.github.com/nmstoker/f1590847a16b66ab22c16722aac1cc51

If people think it might be useful added to the repo, I'm happy to do a
PR.

It uses a library called
parselmouth in turn
calling a Praat script for the audio. For the text syllables there is a
handy little library called
syllapy

I ran it on LJSpeech 1.1 as
that's what people often use here at least for experimentation. That
dataset is a well produced dataset, but it actually did identify one
particular case with a clear problem.

For new / private / self-produced datasets this could be a very useful
way to avoiding the need to manually inspect each audio/transcript pair.
At the very least it lets you initially target such efforts.

And there is also scope on running it on audio output from TTS to see
that there aren't cases of repeating words (ie as often happens when
there are stopnet issues). You could create a large-ish batch of new
transcript sentences to test, fire these at TTS using requests to create
the audio files and then run the comparison between the audio and
transcript to focus on problem cases.

With Praat, there are potentially options to go a bit further than
purely syllables (eg to use their 'voice report' (some details
),
so if people have feedback or suggestions before adding this, do fire
away

### This is an archived TTS discussion thread from discourse.mozilla.org/t/additional-ideas-around-dataset-and-tts-output-testing

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Additional ideas around dataset and tts output testing #253

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Additional ideas around dataset and tts output testing #253

JRMeyer Mar 7, 2021 Maintainer

Replies: 1 comment

JRMeyer Mar 7, 2021 Maintainer Author

JRMeyer
Mar 7, 2021
Maintainer

JRMeyer
Mar 7, 2021
Maintainer Author