Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

audio file as input parameter for model.transcribe works well but ndarray-typed parameter captured with sounddevice does not work #1253

Open
cyflhn opened this issue Feb 21, 2025 · 4 comments

Comments

@cyflhn
Copy link

cyflhn commented Feb 21, 2025

model.transcribe works well when I use an audio file as an input parameter. But when I use sounddevice to record a period of speech and save the speech result as ndarray and send it directly for model.transcribe , it cannot recognize speech.
But I save the speech recorded by sounddevice as an audio file and then use this file as input paramter for model.transcribe , the speech can be recognized. What is the problem? Is there any specific format requirement for ndarray parameter?

@MahmoudAshraf97
Copy link
Collaborator

make sure the array is mono and sampled at 16khz float32

@cyflhn
Copy link
Author

cyflhn commented Feb 25, 2025

mono

what does mono array mean? Could you please give me an example? I am not quite into video technique. Here is my code for recording speech:
recording = sd.rec(int(duration * fs), samplerate=fs, channels=2, device=device_index)

@MahmoudAshraf97
Copy link
Collaborator

Mono means single channel
And sr should be 16000

@cyflhn
Copy link
Author

cyflhn commented Feb 25, 2025

Mono means single channel And sr should be 16000

I modify my code according to your suggestion, but still did not work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants