You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If there is an audio clip that starts with a laugh and is followed by normal speech text, how should I handle the preceding laughter? For example, if I want to manually mark the preceding laughter as eg: "[laugh] Ha ha ha, that's funny!" How should I handle the preceding laughter marking using normal G2P
I have seen that in the past, vits or other multilingual models, if they want to speak both Chinese and English at the same time, their common practice is this: eg: [ZH] Chinese [ZH] [EN] hello world [EN], and then when using G2P mapping as a marker, if they encounter ZH, they use the Chinese processing method, and if they encounter EN, they use the English processing method. So is it possible to do the same for [laugh]
The text was updated successfully, but these errors were encountered:
Is there a relatively simple way to directly process laughter or wheezing sounds using the 'laugh' method, similar to the processing method of the multilingual TTS model
If there is an audio clip that starts with a laugh and is followed by normal speech text, how should I handle the preceding laughter? For example, if I want to manually mark the preceding laughter as eg: "[laugh] Ha ha ha, that's funny!" How should I handle the preceding laughter marking using normal G2P
I have seen that in the past, vits or other multilingual models, if they want to speak both Chinese and English at the same time, their common practice is this: eg: [ZH] Chinese [ZH] [EN] hello world [EN], and then when using G2P mapping as a marker, if they encounter ZH, they use the Chinese processing method, and if they encounter EN, they use the English processing method. So is it possible to do the same for [laugh]
The text was updated successfully, but these errors were encountered: