Cosyvoice2可以不用prompt_speech，随机音色进行推理吗 #822

rulerman · 2025-01-02T06:29:25Z

官方给的三个推理接口都是需要prompt_speech的

`cosyvoice = CosyVoice2('pretrained_models/CosyVoice2-0.5B', load_jit=False, load_trt=False, fp16=False)

NOTE if you want to reproduce the results on https://funaudiollm.github.io/cosyvoice2, please add text_frontend=False during inference

zero_shot usage

prompt_speech_16k = load_wav('zero_shot_prompt.wav', 16000)
for i, j in enumerate(cosyvoice.inference_zero_shot('收到好友从远方寄来的生日礼物，那份意外的惊喜与深深的祝福让我心中充满了甜蜜的快乐，笑容如花儿般绽放。', '希望你以后能够做的比我还好呦。', prompt_speech_16k, stream=False)):
torchaudio.save('zero_shot_{}.wav'.format(i), j['tts_speech'], cosyvoice.sample_rate)

fine grained control, for supported control, check cosyvoice/tokenizer/tokenizer.py#L248

for i, j in enumerate(cosyvoice.inference_cross_lingual('在他讲述那个荒诞故事的过程中，他突然[laughter]停下来，因为他自己也被逗笑了[laughter]。', prompt_speech_16k, stream=False)):
torchaudio.save('fine_grained_control_{}.wav'.format(i), j['tts_speech'], cosyvoice.sample_rate)

instruct usage

for i, j in enumerate(cosyvoice.inference_instruct2('收到好友从远方寄来的生日礼物，那份意外的惊喜与深深的祝福让我心中充满了甜蜜的快乐，笑容如花儿般绽放。', '用四川话说这句话', prompt_speech_16k, stream=False)):
torchaudio.save('instruct_{}.wav'.format(i), j['tts_speech'], cosyvoice.sample_rate)`

aluminumbox · 2025-01-02T07:42:13Z

可以随机音色，但是embedding的mean和std我们没有统计，你如果有足够多数据可以统计一下并随机

fushh · 2025-01-02T09:25:07Z

请问能给一个推理例子吗？我看所有接口prompt_speech都是必须传入的参数

shirubei · 2025-01-10T13:19:50Z

请问能给一个推理例子吗？我看所有接口prompt_speech都是必须传入的参数

同样，请给个例子

tianzhangwu · 2025-01-22T12:19:00Z

同需要

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cosyvoice2可以不用prompt_speech，随机音色进行推理吗 #822

Cosyvoice2可以不用prompt_speech，随机音色进行推理吗 #822

rulerman commented Jan 2, 2025

aluminumbox commented Jan 2, 2025

fushh commented Jan 2, 2025

shirubei commented Jan 10, 2025

tianzhangwu commented Jan 22, 2025

Cosyvoice2可以不用prompt_speech，随机音色进行推理吗 #822

Cosyvoice2可以不用prompt_speech，随机音色进行推理吗 #822

Comments

rulerman commented Jan 2, 2025

NOTE if you want to reproduce the results on https://funaudiollm.github.io/cosyvoice2, please add text_frontend=False during inference

zero_shot usage

fine grained control, for supported control, check cosyvoice/tokenizer/tokenizer.py#L248

instruct usage

aluminumbox commented Jan 2, 2025

fushh commented Jan 2, 2025

shirubei commented Jan 10, 2025

tianzhangwu commented Jan 22, 2025