You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The JSON files produced by Whisperx are very similar in structure to those created by vosk, which are already supported by VideoGrep. Both formats include comparable fields, such as text instead of content. The start and end timestamps are the same.
It would be fantastic to have videogrep support the JSON files generated by Whisperx, given these similarities.
Here’s a snippet of the JSON output from Vosk for reference:
In short, while videogrep can't currently process JSON files from Whisperx, the differences between these files and the vosk-supported JSON files are minimal. Adding support for Whisperx JSON could enhance compatibility significantly.
The thing is, I have a python script to convert these to be recognized by videogrep. It's a workable workflow on a file to file basis, but when you have dozens and dozens of new transcription constantly coming out, and not having duplicates (and sometimes not compatible with other tools), it would be easier and more practical to have videgrep internally support the files coming out of Whisper
I totally get that this is probably a passion project (I'm guessing), but I just want to say how much more useful this little gem is beyond just pulling out fun specific sentences for fun. I’ve used it to isolate speakers in really long interviews, and even though I had to tweak the XML files and the srt/json transcriptions files to get them to work better with videogrep, I saved hours of manual work.
There's a pretty big audience out there for these awesome tools!
I honestly couldn't believe this existed when I found it last year.
Keep up the amazing work, and thanks a ton for this gem!
The text was updated successfully, but these errors were encountered:
neopiccolorat
changed the title
Support of the json files made with Whisper
Support of the json files made with Whisper, for a better workflow
Jan 19, 2025
The JSON files produced by Whisperx are very similar in structure to those created by vosk, which are already supported by VideoGrep. Both formats include comparable fields, such as text instead of content. The start and end timestamps are the same.
It would be fantastic to have videogrep support the JSON files generated by Whisperx, given these similarities.
Here’s a snippet of the JSON output from Vosk for reference:
In short, while videogrep can't currently process JSON files from Whisperx, the differences between these files and the vosk-supported JSON files are minimal. Adding support for Whisperx JSON could enhance compatibility significantly.
And this is what's coming out of whisperx :
The thing is, I have a python script to convert these to be recognized by videogrep. It's a workable workflow on a file to file basis, but when you have dozens and dozens of new transcription constantly coming out, and not having duplicates (and sometimes not compatible with other tools), it would be easier and more practical to have videgrep internally support the files coming out of Whisper
I totally get that this is probably a passion project (I'm guessing), but I just want to say how much more useful this little gem is beyond just pulling out fun specific sentences for fun. I’ve used it to isolate speakers in really long interviews, and even though I had to tweak the XML files and the srt/json transcriptions files to get them to work better with videogrep, I saved hours of manual work.
There's a pretty big audience out there for these awesome tools!
I honestly couldn't believe this existed when I found it last year.
Keep up the amazing work, and thanks a ton for this gem!
The text was updated successfully, but these errors were encountered: