Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems with finetuned flow used in CosyVoice2 while in SFT scenario #826

Open
OswaldoBornemann opened this issue Jan 2, 2025 · 3 comments

Comments

@OswaldoBornemann
Copy link

So we tried to finetune flow module used in CosyVoice2 in the SFT scenario. We found that in the inference, the generated speech quality will be better if we feed the prompt speech into the flow model as input as well, while the fine-tuned flow module generated poor speech quality with zeros prompt speech. Is that normal? @aluminumbox

@aluminumbox
Copy link
Collaborator

we haven't update cosyvoice2 finetune script yet, you can finetune it, but you need to add dynamic chunk mask by yourself

@OswaldoBornemann
Copy link
Author

@aluminumbox Yes I see your point. My point is that when I finetune it without using dynamic mask, the quality of speech produced by the process is better with prompt speech than without prompt speech, and I am in the SFT scenario.

@ukemamaster
Copy link

we haven't update cosyvoice2 finetune script yet, you can finetune it, but you need to add dynamic chunk mask by yourself

When do you plan to update the finetune script for CosyVoice2 model?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants