You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When using word timestamps, words with apostrophes end up split in two, like this (I'm french, so my examples will all be in french) :
Bonjour
j'
ai
perdu
l'
avion
Here is the Python script you can reproduce this behaviour : https://pastebin.com/qzLSnfnk
I can send you a sample MP3 file if needed.
This can be easily corrected by patching 2 lines in utils/transcribe.py :
Line 1874 in merge_punctuations if previous["word"].startswith(" ") and previous["word"].strip() in prepended:
Should become if previous["word"].startswith(" ") and any(previous["word"].strip().endswith(p) for p in prepended):
Line 1890 in merge_punctuations if not previous["word"].endswith(" ") and following["word"] in appended:
Should become if not previous["word"].endswith(" ") and any(following["word"].startswith(p) for p in appended):
I hope it won't break anything. XD
Thanks in advance for your review.
The text was updated successfully, but these errors were encountered:
Hello,
When using word timestamps, words with apostrophes end up split in two, like this (I'm french, so my examples will all be in french) :
Here is the Python script you can reproduce this behaviour : https://pastebin.com/qzLSnfnk
I can send you a sample MP3 file if needed.
This can be easily corrected by patching 2 lines in utils/transcribe.py :
Line 1874 in merge_punctuations
if previous["word"].startswith(" ") and previous["word"].strip() in prepended:
Should become
if previous["word"].startswith(" ") and any(previous["word"].strip().endswith(p) for p in prepended):
Line 1890 in merge_punctuations
if not previous["word"].endswith(" ") and following["word"] in appended:
Should become
if not previous["word"].endswith(" ") and any(following["word"].startswith(p) for p in appended):
I hope it won't break anything. XD
Thanks in advance for your review.
The text was updated successfully, but these errors were encountered: