Replies: 1 comment
-
You can definitely pass segments of audio to whisper-timestamped, if you are able to pre-segment the audio in a relevant way. If you do so, you can use the prompt option to pass the transcription from the previous segment, at each step, if you want to condition the "language modeling" part of Whisper (although I don't recommend to do so : in my experience things work better without conditioning). I don't know it that answers. Don't hesitate to continue discussion. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
We often run automatic captioning of news broadcasts. These can be 6-8 minutes, and contain 3-4 parts put together. We often experience problems with time codes in the transition between the different parts of the main broadcast. If we run the parts separately, things work almost perfectly. At the end of broadcasts there is usually a few seconds of silence or music. Time codes also become a mess here. An explanation could be: Basically, all these attachments are divided into 30 second blocks, with a certain overlap. Then the software makes a prediction for each of them, and finally conflicts have to be cleared up. In some cases this can be difficult. Single sentences can overlap the segments, and they can actually "look different" from one side than from another.
Is it possible for us to send information so that the result is better? Is it possible to exclude parts of the file e.g. last 10 seconds and is it possible to split the file from our side without physically splitting it? Notice whisper-timestamped that content is changing?
Beta Was this translation helpful? Give feedback.
All reactions