How to handle non-verbal (eg: laughter, wheezing, crying) audio in text preprocessing #838

EobardThawne721 · 2025-01-06T11:02:05Z

If there is an audio clip that starts with a laugh and is followed by normal speech text, how should I handle the preceding laughter? For example, if I want to manually mark the preceding laughter as eg: "[laugh] Ha ha ha, that's funny!" How should I handle the preceding laughter marking using normal G2P
I have seen that in the past, vits or other multilingual models, if they want to speak both Chinese and English at the same time, their common practice is this: eg: [ZH] Chinese [ZH] [EN] hello world [EN], and then when using G2P mapping as a marker, if they encounter ZH, they use the Chinese processing method, and if they encounter EN, they use the English processing method. So is it possible to do the same for [laugh]

EobardThawne721 · 2025-01-06T11:03:34Z

Is there a relatively simple way to directly process laughter or wheezing sounds using the 'laugh' method, similar to the processing method of the multilingual TTS model

aluminumbox · 2025-01-07T06:46:48Z

our instruct data are all human labeled

EobardThawne721 · 2025-01-07T06:49:18Z

How should i handle it？I haven't dealt with this type of label before. Can you provide a reference for how to handle it? Thank you

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to handle non-verbal (eg: laughter, wheezing, crying) audio in text preprocessing #838

How to handle non-verbal (eg: laughter, wheezing, crying) audio in text preprocessing #838

EobardThawne721 commented Jan 6, 2025

EobardThawne721 commented Jan 6, 2025

aluminumbox commented Jan 7, 2025

EobardThawne721 commented Jan 7, 2025

How to handle non-verbal (eg: laughter, wheezing, crying) audio in text preprocessing #838

How to handle non-verbal (eg: laughter, wheezing, crying) audio in text preprocessing #838

Comments

EobardThawne721 commented Jan 6, 2025

EobardThawne721 commented Jan 6, 2025

aluminumbox commented Jan 7, 2025

EobardThawne721 commented Jan 7, 2025