Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPT accuarcy for cosyvoice 2.0 speech tokenizer #837

Open
yuzuda283 opened this issue Jan 6, 2025 · 2 comments
Open

GPT accuarcy for cosyvoice 2.0 speech tokenizer #837

yuzuda283 opened this issue Jan 6, 2025 · 2 comments

Comments

@yuzuda283
Copy link

i try to train a GPT model use the token extracted by speech tokenizer v2, but i found the top10 accuracy in training is almost half of using the speech tokenizer v1, is that normal? or maybe i have some wrong operation?

@aluminumbox
Copy link
Collaborator

sorry we haven't tried using gpt as llm, you can try inference with it, if it can generate correct sound, then your acc is ok. it is okay that v1 and v2 tokenizer acc is different

@yuzuda283
Copy link
Author

sorry I misuse the words, which I mean is I train a llm model(sth with llama structure), should be similar with Qwen model. what i found is that when i use v1 speechtokenizer the top10 acc is quiet high(over 70%), but when it changes to v2 tokens, the acc is down to 50%(it's quiet low since acoustic token such as Encodec, can reach 60%+). Is that normal, just because the n_tokens become larger? By the way, under certain synthesis parameters, the V2 model can synthesize correct sounds.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants