You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I used both whisperX and fasterwhisper to transcribe the same audio, and the two resulting subtitles have the following differences:
WhisperX’s subtitles miss some parts of the content, but the timeline alignment is relatively good.
The subtitles transcribed by fasterwhisper are almost complete in terms of content, but it feels like there are some timestamp issues—either inaccurate or too long.
Even when I use the alternative VAD method (Silero) in whisperX to transcribe the audio, it still doesn’t capture as much content as fasterwhisper.
My question is: why does this happen? Isn’t whisperX also using fasterwhisper for transcription? Why is there missing content?
Is it possible to modify some parameters in whisperX so that it achieves the same transcription completeness as fasterwhisper while retaining whisperX’s alignment capability?
Does anyone with experience in improving transcription quality have any suggestions that could help me out?
Thanks
The text was updated successfully, but these errors were encountered:
Please provide the parameters you used for whisperX and faster-whisper, preferably with an audio file and reproducible steps. Otherwise, I can only guess that this is related to word level timestamp.
Hi,
I used both whisperX and fasterwhisper to transcribe the same audio, and the two resulting subtitles have the following differences:
WhisperX’s subtitles miss some parts of the content, but the timeline alignment is relatively good.
The subtitles transcribed by fasterwhisper are almost complete in terms of content, but it feels like there are some timestamp issues—either inaccurate or too long.
Even when I use the alternative VAD method (Silero) in whisperX to transcribe the audio, it still doesn’t capture as much content as fasterwhisper.
My question is: why does this happen? Isn’t whisperX also using fasterwhisper for transcription? Why is there missing content?
Is it possible to modify some parameters in whisperX so that it achieves the same transcription completeness as fasterwhisper while retaining whisperX’s alignment capability?
Does anyone with experience in improving transcription quality have any suggestions that could help me out?
Thanks
The text was updated successfully, but these errors were encountered: