What are some best practices to avoid hallucinations or to improve accuracy? #117
Replies: 1 comment
-
Yes I agree Whisper is amazing tech. As a fun example I remember I tried giving it a synth speech clip from a C64 game (Druid II) and it even handled that one swimmingly. Then again, it can sometimes stumble pretty badly too. I don't know if they're best practices, but I've outlined some of my methods in issue #82. I got rid of hallucinations with --suppress-tokens, there's discussion about it in another thread. Your result with WAV vs. MP3 is interesting. I tried the same thing when I first tried Whisper, but didn't notice any difference (logically thinking there should be no difference, but who knows). You wouldn't happen to have any URLs or clips to share? |
Beta Was this translation helpful? Give feedback.
-
I'm trying to use whisper-timestamped on some somewhat-long audio files (between 15 and 45 minutes).
The result is incredible. Definitely better than me at understanding speech.
However the model hallucinates somewhat frequently.
I've tried to fiddle with the various parameters (e.g.
vad=True
,compression_ratio_threshold=1
), but to no avail.Eventually I realized that this problem happens much more often with WAV files rather than with MP3. So I've been re-encoding all my files using ffmpeg and the issues almost disappeared.
Even though I did achieve a good result, the solution puzzled me.
Thus I'm wondering: are there other best practices I should try, in order to improve the results?
Beta Was this translation helpful? Give feedback.
All reactions