Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kindly add OpenVINO backend support #32

Open
gericho opened this issue May 6, 2024 · 7 comments
Open

Kindly add OpenVINO backend support #32

gericho opened this issue May 6, 2024 · 7 comments

Comments

@gericho
Copy link

gericho commented May 6, 2024

Will it be possible to add OpenVINO support for Intel-based processors? The repo by @zhuzilin here shows a speed improvement of nearly 50%, so users will be able to use larger models without sacrificing current performance. Thank you!

Screenshot 2024-05-06 150935

@zachs-55
Copy link

You can use OpenVINO with whisper.cpp (I personally found using CLBlast instead was a little faster on my weak Celeron n5095, though):
https://github.com/ggerganov/whisper.cpp#openvino-support

Then you can use this for a Wyoming endpoint: https://github.com/ser/wyoming-whisper-api-client

@gericho
Copy link
Author

gericho commented May 14, 2024

Thank you very much, I'll give it a try soon!

@tannisroot
Copy link

faster-whisper uses CTranslate2 which doesn't have OpenVino support.

@monoamin
Copy link

monoamin commented Jan 8, 2025

For future reference, and anyone stumbling into here trying to get whisper to use their Intel GPU, I have created a working demo Dockerfile and Compose here:

https://github.com/monoamin/wyoming-whispercpp-openvino-gpu

@MaximumFish
Copy link

This is perfect timing as I just started looking at this yesterday. @monoamin have you run any comparison tests at all? Interestingly the page for faster-whisper pegs it as slightly faster with no acceleration compared to whispercpp with OpenVINO. I'd be very curious as to whether you see similar results.

@monoamin
Copy link

monoamin commented Jan 13, 2025

@MaximumFish I have not extensively tested this, but I might be able to give you some quick numbers.
My primary intent with running it on the GPU was to utilize the hardware I have the most I can.
In terms of hardware, my server is running an Intel Arc A380 w/ 6GB Vram, and a Ryzen 7 5700x w/ 32G Ram.

This is with whisper.cpp, first CPU-only, then GPU, with some random speech sample:

root on monoamin in ~
$ ffmpeg -i speechtest_fixed.wav 2>&1 | grep Duration 
  **Duration: 00:00:24.17**, bitrate: 256 kb/s

root on monoamin in ~
$ **python3 testwhispertime.py # CPU Only**              
Transcription: {"text":" Das Internet ist für uns alle Neuland und es ermöglicht auch Feinden und Gegnern unserer demokratischen Grundordnung natürlich mit völlig neuen Möglichkeiten und völlig neuen Herangehensweisen unsere Art zu leben in Gefahr zu bringen.\n"}
**Time taken: 6.303544759750366 seconds**

root on monoamin in ~
$ **python3 testwhispertime.py # With GPU**            
Transcription: {"text":" Das Internet ist für uns alle Neuland und es ermöglicht auch Feinden und Gegnern unserer demokratischen Grundordnung natürlich, mit völlig neuen Möglichkeiten und völlig neuen Herangehensweisen unsere Art zu leben in Gefahr zu bringen.\n"}
**Time taken: 4.317137241363525 seconds**

I currently don't have a faster-whisper container running but I'll see if I can set one up today to compare results.

@MaximumFish
Copy link

Thanks for testing it! Definitely interested to see the results vs faster-whisper.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants