GPT accuarcy for cosyvoice 2.0 speech tokenizer #837

yuzuda283 · 2025-01-06T09:42:54Z

i try to train a GPT model use the token extracted by speech tokenizer v2, but i found the top10 accuracy in training is almost half of using the speech tokenizer v1, is that normal? or maybe i have some wrong operation?

aluminumbox · 2025-01-07T06:31:13Z

sorry we haven't tried using gpt as llm, you can try inference with it, if it can generate correct sound, then your acc is ok. it is okay that v1 and v2 tokenizer acc is different

yuzuda283 · 2025-01-10T10:01:21Z

sorry I misuse the words, which I mean is I train a llm model(sth with llama structure), should be similar with Qwen model. what i found is that when i use v1 speechtokenizer the top10 acc is quiet high(over 70%), but when it changes to v2 tokens, the acc is down to 50%(it's quiet low since acoustic token such as Encodec, can reach 60%+). Is that normal, just because the n_tokens become larger? By the way, under certain synthesis parameters, the V2 model can synthesize correct sounds.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPT accuarcy for cosyvoice 2.0 speech tokenizer #837

GPT accuarcy for cosyvoice 2.0 speech tokenizer #837

yuzuda283 commented Jan 6, 2025

aluminumbox commented Jan 7, 2025

yuzuda283 commented Jan 10, 2025

GPT accuarcy for cosyvoice 2.0 speech tokenizer #837

GPT accuarcy for cosyvoice 2.0 speech tokenizer #837

Comments

yuzuda283 commented Jan 6, 2025

aluminumbox commented Jan 7, 2025

yuzuda283 commented Jan 10, 2025