What is the meaning of initial_prompt ? #77

amitli1 · 2023-04-14T11:50:25Z

amitli1
Apr 14, 2023

When using transcribe function, what is the meaning of the initial_prompt parameter ?

I read the documentation ("optional text to provide as a prompt for the first window.) and still not sure I understand the meaning.

What it do ?
How it helps for the first window ?
For the rest of the windows, it has no meaning?

Jeronymous · 2023-04-14T14:48:44Z

Jeronymous
Apr 14, 2023
Maintainer

Whisper is an acoustic model and a language model in once.
Well, we can see it as a language model, that looks at an audio chunk (of max 30 sec) to generate text.

Like any language model, you can give Whisper a prompt, that is not just "" special token(s), but a (small) text, which can typically correspond to what has been said previously to the currently processed audio chunk.
So for the first 30 sec of audio, you can use this prompt to enforce to predict stuff, or to enforce a genre.
For instance, Whisper tends to remove speech disfluencies (like "hmm", "you know", or repetitions "I I I would like to to" -> "I would like to"). You can encourage Whisper to output disfluencies by giving a prompt with disfluencies (like "Hum, you know, I I I would like to to say that... OK").

Then, when OpenAI Whisper transcribes an audio of more than 30 sec, it takes what was described in the first ~30 sec as a prompt for the next 30 sec of audio (and the initial prompt is no more taken into consideration).

I hope that it clarifies. Do not hesitate to ask more clarification.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What is the meaning of initial_prompt ? #77

{{title}}

Replies: 1 comment

{{title}}

Select a reply

What is the meaning of initial_prompt ? #77

amitli1 Apr 14, 2023

Replies: 1 comment

Jeronymous Apr 14, 2023 Maintainer

amitli1
Apr 14, 2023

Jeronymous
Apr 14, 2023
Maintainer