Whisper.cpp HTTP transcription server with OAI-like API in Docker.
This project provides a Dockerized transcription server based on whisper.cpp.
- Dockerized whisper.cpp HTTP server for audio transcription
- Configurable via environment variables
- Automatically converts audio to WAV format
- Automatically downloads required model on startup
- Can quantize any Whisper model to the required type on startup
Before you begin, ensure you have a machine with an GPU that supports modern CUDA, due to the computational demands of the docker image.
- Nvidia GPU
- CUDA
- Docker
- Docker Compose
- Nvidia Docker Runtime
For detailed instructions on how to prepare a Linux machine for running neural networks, including the installation of CUDA, Docker, and Nvidia Docker Runtime, please refer to the publication "How to Prepare Linux for Running and Training Neural Networks? (+ Docker)" on Russian.
-
Clone the repo and switch to sources root:
git clone https://github.com/EvilFreelancer/docker-whisper-server.git cd docker-whisper-server
-
Copy the provided Docker Compose template:
cp docker-compose.dist.yml docker-compose.yml
-
Build the Docker image:
docker-compose build
-
Start the services:
docker-compose up -d
-
Navigate to http://localhost:8080 in browser:
Transcribe an audio file:
curl 127.0.0.1:9000/inference \
-H "Content-Type: multipart/form-data" \
-F file="@<file-path>" \
-F temperature="0.0" \
-F temperature_inc="0.2" \
-F response_format="json"
Load a new Whisper model:
curl 127.0.0.1:9000/load \
-H "Content-Type: multipart/form-data" \
-F model="<path-to-model-file-in-docker-container>"
Basic configuration
Name | Default | Description |
---|---|---|
WHISPER_MODEL |
base.en | The default Whisper model to use |
WHISPER_MODEL_PATH |
/app/models/ggml-${WHISPER_MODEL}.bin | The default path to the Whisper model file |
WHISPER_MODEL_QUANTIZATION |
Level of quantization (will be applied only if WHISPER_MODEL_PATH not changed) |
Advanced Configuration
Name | Default | Description |
---|---|---|
WHISPER_THREADS |
4 | Number of threads to use for inference |
WHISPER_PROCESSORS |
1 | Number of processors to use for inference |
WHISPER_HOST |
0.0.0.0 | Host IP or hostname to bind the server to |
WHISPER_PORT |
9000 | Port number to listen on |
WHISPER_INFERENCE_PATH |
/inference | Inference path for all requests |
WHISPER_PUBLIC_PATH |
Path to the public folder | |
WHISPER_REQUEST_PATH |
Request path for all requests | |
WHISPER_OV_E_DEVICE |
CPU | OpenViBE Event Device to use |
WHISPER_OFFSET_T |
0 | Time offset in milliseconds |
WHISPER_OFFSET_N |
0 | Number of seconds to offset |
WHISPER_DURATION |
0 | Duration of the audio file in milliseconds |
WHISPER_MAX_CONTEXT |
-1 | Maximum context size for inference |
WHISPER_MAX_LEN |
0 | Maximum length of output text |
WHISPER_BEST_OF |
2 | Best-of-N strategy for inference |
WHISPER_BEAM_SIZE |
-1 | Beam size for search |
WHISPER_AUDIO_CTX |
0 | Audio context to use for inference |
WHISPER_WORD_THOLD |
0.01 | Word threshold for segmentation |
WHISPER_ENTROPY_THOLD |
2.40 | Entropy threshold for segmentation |
WHISPER_LOGPROB_THOLD |
-1.00 | Log probability threshold for segmentation |
WHISPER_LANGUAGE |
en | Language code to use for translation or diarization |
WHISPER_PROMPT |
Initial prompt | |
WHISPER_DTW |
Compute token-level timestamps | |
WHISPER_CONVERT |
true | Convert audio to WAV, requires ffmpeg on the server |
WHISPER_SPLIT_ON_WORD |
false | Split on word rather than on token |
WHISPER_DEBUG_MODE |
false | Enable debug mode |
WHISPER_TRANSLATE |
false | Translate from source language to english |
WHISPER_DIARIZE |
false | Stereo audio diarization |
WHISPER_TINYDIARIZE |
false | Enable tinydiarize (requires a tdrz model) |
WHISPER_NO_FALLBACK |
false | Do not use temperature fallback while decoding |
WHISPER_PRINT_SPECIAL |
false | Print special tokens |
WHISPER_PRINT_COLORS |
false | Print colors |
WHISPER_PRINT_REALTIME |
false | Print output in realtime |
WHISPER_PRINT_PROGRESS |
false | Print progress |
WHISPER_NO_TIMESTAMPS |
false | Do not print timestamps |
WHISPER_DETECT_LANGUAGE |
false | Exit after automatically detecting language |
- whisper.cpp
- server example of whisper.cpp
[Pavel Rykov]. (2024). Whisper.cpp API Webserver in Docker. GitHub. https://github.com/EvilFreelancer/docker-whisper-server
@misc{pavelrykov2024whisperapi,
author = {Pavel Rykov},
title = {Whisper.cpp API Webserver in Docker},
year = {2024},
url = {https://github.com/EvilFreelancer/docker-whisper-server}
}