Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

large-v2 does not support "yue" but no bug in faster-whisper #1228

Open
Coconut3223 opened this issue Jan 17, 2025 · 0 comments
Open

large-v2 does not support "yue" but no bug in faster-whisper #1228

Coconut3223 opened this issue Jan 17, 2025 · 0 comments

Comments

@Coconut3223
Copy link

Coconut3223 commented Jan 17, 2025

First large-v2 does not support the "yue" language token but large-v3 does.

However when I used faster-whisper to load large-v2 and then transcribed sentences with the param language="yue", there was no bug. It should not work.


faster_whisper

large-v2

None is returned when use large-v2's tokenizer to encode "yue".

>>> from faster_whisper import WhisperModel
>>> model = WhisperModel("large-v2",
...                 device=DEVICE,
...                 )
>>> print(model.hf_tokenizer.token_to_id("<|%s|>" % "yue"))
None
>>> print(model.hf_tokenizer.token_to_id("<|%s|>" % "zh"))
xxx

The result is returned normally and the language of TranscriptionInfo is "yue".

input_language = 'yue'
transcribe_params = {
    "language": input_language,
    "word_timestamps": True,
    "vad_filter": True,
    "initial_prompt": initial_prompt,
    "vad_parameters": dict(min_silence_duration_ms=1000,),
}
whisper_segments, info = model.transcribe(audio, **transcribe_params)
for whis_seg in whisper_segments:
    print(whis_seg.text.strip())
print(info)

""" Result
2023-2024年度修訂預算,
受環球利率上升的
TranscriptionInfo(language='yue', language_probability=1, ....)
"""
input_language = 'zh'
transcribe_params = {
    "language": input_language,
    "word_timestamps": True,
    "vad_filter": True,
    "initial_prompt": initial_prompt,
    "vad_parameters": dict(min_silence_duration_ms=1000,),
}
whisper_segments, info = model.transcribe(audio, **transcribe_params)
for whis_seg in whisper_segments:
    print(whis_seg.text.strip())
print(info)

""" Result
二零二三二四年度修訂預算
受環球利率上升
TranscriptionInfo(language='zh', language_probability=1, ....)
"""

large-V3

from faster_whisper import WhisperModel
model3 = WhisperModel("large-v3",
                    device=DEVICE,
                    )
>>> print(model3.hf_tokenizer.token_to_id("<|%s|>" % "yue"))
50358
>>> print(model3.hf_tokenizer.token_to_id("<|%s|>" % "zh"))
50260


openai/whisper

large-V2

import whisper
model = whisper.load_model("large-v2",)

input_language = 'yue'
result  = model.transcribe(audio, language=input_language)

"""
--> [154]  sot_sequence.append(sot + 1 + langs.index(self.language))

ValueError: tuple.index(x): x not in tuple

"""

large-V3

import whisper
model = whisper.load_model("large-v3",)
input_language = 'yue'
result  = model.transcribe(audio, language=input_language)

"""
{'text': ' 二零二三二四年度修訂預算受環球利率上升',
 'segments': [{'id': 0,
   'seek': 0,
...}
"""

Question:

Language-token is put the start of encoded_input in openai/whisper. However, the language tokens don't seem to be parsed in the way we expect.

Image

@Coconut3223 Coconut3223 changed the title large-v2 does not support the "yue" but no bug in faster-whisper large-v2 does not support "yue" but no bug in faster-whisper Jan 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant