Problems with finetuned flow used in CosyVoice2 while in SFT scenario #826

OswaldoBornemann · 2025-01-02T13:00:03Z

So we tried to finetune flow module used in CosyVoice2 in the SFT scenario. We found that in the inference, the generated speech quality will be better if we feed the prompt speech into the flow model as input as well, while the fine-tuned flow module generated poor speech quality with zeros prompt speech. Is that normal? @aluminumbox

aluminumbox · 2025-01-02T14:17:56Z

we haven't update cosyvoice2 finetune script yet, you can finetune it, but you need to add dynamic chunk mask by yourself

OswaldoBornemann · 2025-01-03T02:14:27Z

@aluminumbox Yes I see your point. My point is that when I finetune it without using dynamic mask, the quality of speech produced by the process is better with prompt speech than without prompt speech, and I am in the SFT scenario.

ukemamaster · 2025-01-03T12:12:32Z

we haven't update cosyvoice2 finetune script yet, you can finetune it, but you need to add dynamic chunk mask by yourself

When do you plan to update the finetune script for CosyVoice2 model?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problems with finetuned flow used in CosyVoice2 while in SFT scenario #826

Problems with finetuned flow used in CosyVoice2 while in SFT scenario #826

OswaldoBornemann commented Jan 2, 2025

aluminumbox commented Jan 2, 2025

OswaldoBornemann commented Jan 3, 2025

ukemamaster commented Jan 3, 2025

Problems with finetuned flow used in CosyVoice2 while in SFT scenario #826

Problems with finetuned flow used in CosyVoice2 while in SFT scenario #826

Comments

OswaldoBornemann commented Jan 2, 2025

aluminumbox commented Jan 2, 2025

OswaldoBornemann commented Jan 3, 2025

ukemamaster commented Jan 3, 2025