You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, slash I am currently working on Polish version of TTS, but my final goal is to obtain a Polish-speaking lector for films. slash Of course, I can use simple program like Sony Vegas Studio to merge film with my .wav file, but my question is, how to generate .wav file, which will exactly fit into intervals of time? slash Example: slash a person is speaking from (mm:ss:msms) 00:00:02 to 00:08:01, and next person is speaking from 00:11:01 to 00:19:22. slash My lector is a single voice, no division into man/female voice etc.
Do you have any advice how to do it?
Regards
[This is an archived TTS discussion thread from discourse.mozilla.org/t/tts-for-film-subtitles]
I think this is slightly outside the scope of the project, but I was curious about your goal so took a look for potential solutions
Lector in the context you give wasn't familiar to me, but am I right to understand you want to be able to read subtitles for films with TTS, and you're trying to make sure that they are spoken to fit into the associated time period so that it is lined up with the video correctly?
There could be other options from googling, but I found the repo below which allows adjusting audio duration without affecting the pitch (handy if you don't want it to sound like chipmunks!) There's a demo in one of the notebooks under the examples folder which looks like it might do the trick.
Real-time Audio time-scale and pitch modification in Python - pierre-rouanet/aupyom
I'm guessing it might get more complicated in reality but roughly you'd take your sentence, run it through TTS, get the time of the sentence audio generated and stretch it by the factor that the TTS time is different to the time slot you've got from the video and then append that to your output soundtrack.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
>>> shad94
[November 17, 2019, 2:55pm]
Hi, slash
I am currently working on Polish version of TTS, but my final goal is to
obtain a Polish-speaking lector for films. slash
Of course, I can use simple program like Sony Vegas Studio to merge film
with my .wav file, but my question is, how to generate .wav file, which
will exactly fit into intervals of time? slash
Example: slash
a person is speaking from (mm:ss:msms) 00:00:02 to 00:08:01, and next
person is speaking from 00:11:01 to 00:19:22. slash
My lector is a single voice, no division into man/female voice etc.
Do you have any advice how to do it?
Regards
[This is an archived TTS discussion thread from discourse.mozilla.org/t/tts-for-film-subtitles]
Beta Was this translation helpful? Give feedback.
All reactions