faster-whisper vs. whisperX #1242

sijitang · 2025-02-09T19:07:48Z

Hi,

I used both whisperX and fasterwhisper to transcribe the same audio, and the two resulting subtitles have the following differences:

WhisperX’s subtitles miss some parts of the content, but the timeline alignment is relatively good.
The subtitles transcribed by fasterwhisper are almost complete in terms of content, but it feels like there are some timestamp issues—either inaccurate or too long.
Even when I use the alternative VAD method (Silero) in whisperX to transcribe the audio, it still doesn’t capture as much content as fasterwhisper.

My question is: why does this happen? Isn’t whisperX also using fasterwhisper for transcription? Why is there missing content?
Is it possible to modify some parameters in whisperX so that it achieves the same transcription completeness as fasterwhisper while retaining whisperX’s alignment capability?

Does anyone with experience in improving transcription quality have any suggestions that could help me out?

Thanks

heimoshuiyu · 2025-02-10T02:38:28Z

Please provide the parameters you used for whisperX and faster-whisper, preferably with an audio file and reproducible steps. Otherwise, I can only guess that this is related to word level timestamp.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

faster-whisper vs. whisperX #1242

faster-whisper vs. whisperX #1242

sijitang commented Feb 9, 2025

heimoshuiyu commented Feb 10, 2025

faster-whisper vs. whisperX #1242

faster-whisper vs. whisperX #1242

Comments

sijitang commented Feb 9, 2025

heimoshuiyu commented Feb 10, 2025