You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
i try to train a GPT model use the token extracted by speech tokenizer v2, but i found the top10 accuracy in training is almost half of using the speech tokenizer v1, is that normal? or maybe i have some wrong operation?
The text was updated successfully, but these errors were encountered:
sorry we haven't tried using gpt as llm, you can try inference with it, if it can generate correct sound, then your acc is ok. it is okay that v1 and v2 tokenizer acc is different
sorry I misuse the words, which I mean is I train a llm model(sth with llama structure), should be similar with Qwen model. what i found is that when i use v1 speechtokenizer the top10 acc is quiet high(over 70%), but when it changes to v2 tokens, the acc is down to 50%(it's quiet low since acoustic token such as Encodec, can reach 60%+). Is that normal, just because the n_tokens become larger? By the way, under certain synthesis parameters, the V2 model can synthesize correct sounds.
i try to train a GPT model use the token extracted by speech tokenizer v2, but i found the top10 accuracy in training is almost half of using the speech tokenizer v1, is that normal? or maybe i have some wrong operation?
The text was updated successfully, but these errors were encountered: