You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
So we tried to finetune flow module used in CosyVoice2 in the SFT scenario. We found that in the inference, the generated speech quality will be better if we feed the prompt speech into the flow model as input as well, while the fine-tuned flow module generated poor speech quality with zeros prompt speech. Is that normal? @aluminumbox
The text was updated successfully, but these errors were encountered:
@aluminumbox Yes I see your point. My point is that when I finetune it without using dynamic mask, the quality of speech produced by the process is better with prompt speech than without prompt speech, and I am in the SFT scenario.
So we tried to finetune flow module used in CosyVoice2 in the SFT scenario. We found that in the inference, the generated speech quality will be better if we feed the prompt speech into the flow model as input as well, while the fine-tuned flow module generated poor speech quality with zeros prompt speech. Is that normal? @aluminumbox
The text was updated successfully, but these errors were encountered: