Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat/speech generation ui #223

Merged
merged 7 commits into from
Jan 10, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ WORKDIR $HOME/faster-whisper-server
COPY --chown=ubuntu --from=ghcr.io/astral-sh/uv:0.5.14 /uv /bin/uv
# https://docs.astral.sh/uv/guides/integration/docker/#intermediate-layers
# https://docs.astral.sh/uv/guides/integration/docker/#compiling-bytecode
# TODO: figure out if `/home/ubuntu/.cache/uv` should be used instead of `/root/.cache/uv`
RUN --mount=type=cache,target=/root/.cache/uv \
--mount=type=bind,source=uv.lock,target=uv.lock \
--mount=type=bind,source=pyproject.toml,target=pyproject.toml \
Expand All @@ -29,7 +30,7 @@ RUN --mount=type=cache,target=/root/.cache/uv \
# Creating a directory for the cache to avoid the following error:
# PermissionError: [Errno 13] Permission denied: '/home/ubuntu/.cache/huggingface/hub'
# This error occurs because the volume is mounted as root and the `ubuntu` user doesn't have permission to write to it. Pre-creating the directory solves this issue.
RUN mkdir -p $HOME/.cache/huggingface
RUN mkdir -p $HOME/.cache/huggingface/hub
ENV WHISPER__MODEL=Systran/faster-whisper-large-v3
ENV UVICORN_HOST=0.0.0.0
ENV UVICORN_PORT=8000
Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,9 +49,9 @@ docker compose --file compose.cpu.yaml up --detach

```bash
# for GPU support
docker run --gpus=all --publish 8000:8000 --volume ~/.cache/huggingface:/root/.cache/huggingface --detach fedirz/faster-whisper-server:latest-cuda
docker run --gpus=all --publish 8000:8000 --volume hf-hub-cache:/home/ubuntu/.cache/huggingface/hub --detach fedirz/faster-whisper-server:latest-cuda
# for CPU only (use this if you don't have a GPU, as the image is much smaller)
docker run --publish 8000:8000 --volume ~/.cache/huggingface:/root/.cache/huggingface --env WHISPER__MODEL=Systran/faster-whisper-small --detach fedirz/faster-whisper-server:latest-cpu
docker run --publish 8000:8000 --volume hf-hub-cache:/home/ubuntu/.cache/huggingface/hub --env WHISPER__MODEL=Systran/faster-whisper-small --detach fedirz/faster-whisper-server:latest-cpu
```

### Using Kubernetes
Expand Down
Binary file modified audio.wav
Binary file not shown.
4 changes: 2 additions & 2 deletions compose.cpu.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,6 @@ services:
environment:
- WHISPER__MODEL=Systran/faster-whisper-small
volumes:
- hugging_face_cache:/root/.cache/huggingface
- hf-hub-cache:/home/ubuntu/.cache/huggingface/hub
volumes:
hugging_face_cache:
hf-hub-cache:
4 changes: 2 additions & 2 deletions compose.cuda-cdi.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ services:
file: compose.cuda.yaml
service: faster-whisper-server
volumes:
- hugging_face_cache:/root/.cache/huggingface
- hf-hub-cache:/home/ubuntu/.cache/huggingface/hub
deploy:
resources:
reservations:
Expand All @@ -21,4 +21,4 @@ services:
device_ids:
- nvidia.com/gpu=all
volumes:
hugging_face_cache:
hf-hub-cache:
4 changes: 2 additions & 2 deletions compose.cuda.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,11 @@ services:
environment:
- WHISPER__MODEL=Systran/faster-whisper-large-v3
volumes:
- hugging_face_cache:/root/.cache/huggingface
- hf-hub-cache:/home/ubuntu/.cache/huggingface/hub
deploy:
resources:
reservations:
devices:
- capabilities: ["gpu"]
volumes:
hugging_face_cache:
hf-hub-cache:
4 changes: 0 additions & 4 deletions compose.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,3 @@ services:
restart: unless-stopped
ports:
- 8000:8000
develop:
watch:
- path: faster_whisper_server
action: rebuild
41 changes: 31 additions & 10 deletions docs/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,14 +13,14 @@ TODO: just reference the existing compose file in the repo
ports:
- 8000:8000
volumes:
- hugging_face_cache:/root/.cache/huggingface
- hf-hub-cache:/home/ubuntu/.cache/huggingface/hub
deploy:
resources:
reservations:
devices:
- capabilities: ["gpu"]
volumes:
hugging_face_cache:
hf-hub-cache:
```

=== "CUDA (with CDI feature enabled)"
Expand All @@ -35,7 +35,7 @@ TODO: just reference the existing compose file in the repo
ports:
- 8000:8000
volumes:
- hugging_face_cache:/root/.cache/huggingface
- hf-hub-cache:/home/ubuntu/.cache/huggingface/hub
deploy:
resources:
reservations:
Expand All @@ -46,7 +46,7 @@ TODO: just reference the existing compose file in the repo
device_ids:
- nvidia.com/gpu=all
volumes:
hugging_face_cache:
hf-hub-cache:
```

=== "CPU"
Expand All @@ -60,36 +60,57 @@ TODO: just reference the existing compose file in the repo
ports:
- 8000:8000
volumes:
- hugging_face_cache:/root/.cache/huggingface
- hf-hub-cache:/home/ubuntu/.cache/huggingface/hub
volumes:
hugging_face_cache:
hf-hub-cache:
```

## Docker

=== "CUDA"

```bash
docker run --rm --detach --publish 8000:8000 --name faster-whisper-server --volume hugging_face_cache:/root/.cache/huggingface --gpus=all fedirz/faster-whisper-server:latest-cuda
docker run \
--rm \
--detach \
--publish 8000:8000 \
--name faster-whisper-server \
--volume hf-hub-cache:/home/ubuntu/.cache/huggingface/hub \
--gpus=all \
fedirz/faster-whisper-server:latest-cuda
```

=== "CUDA (with CDI feature enabled)"

```bash
docker run --rm --detach --publish 8000:8000 --name faster-whisper-server --volume hugging_face_cache:/root/.cache/huggingface --device=nvidia.com/gpu=all fedirz/faster-whisper-server:latest-cuda
docker run \
--rm \
--detach \
--publish 8000:8000 \
--name faster-whisper-server \
--volume hf-hub-cache:/home/ubuntu/.cache/huggingface/hub \
--device=nvidia.com/gpu=all \
fedirz/faster-whisper-server:latest-cuda
```

=== "CPU"

```bash
docker run --rm --detach --publish 8000:8000 --name faster-whisper-server --volume hugging_face_cache:/root/.cache/huggingface fedirz/faster-whisper-server:latest-cpu
docker run \
--rm \
--detach \
--publish 8000:8000 \
--name faster-whisper-server \
--volume hf-hub-cache:/home/ubuntu/.cache/huggingface/hub \
fedirz/faster-whisper-server:latest-cpu
```

## Kubernetes

WARNING: it was written few months ago and may be outdated.
Please refer to this [blog post](https://substratus.ai/blog/deploying-faster-whisper-on-k8s)

## Python (requires Python 3.12+)
## Python (requires Python 3.12+ and `uv` package manager)

```bash
git clone https://github.com/fedirz/faster-whisper-server.git
Expand Down
4 changes: 2 additions & 2 deletions examples/live-audio/script.sh
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,9 @@ set -e
export WHISPER__MODEL=Systran/faster-distil-whisper-large-v3 # or Systran/faster-whisper-tiny.en if you are running on a CPU for a faster inference.

# Ensure you have `faster-whisper-server` running. If this is your first time running it expect to wait up-to a minute for the model to be downloaded and loaded into memory. You can run `curl localhost:8000/health` to check if the server is ready or watch the logs with `docker logs -f <container_id>`.
docker run --detach --gpus=all --publish 8000:8000 --volume ~/.cache/huggingface:/root/.cache/huggingface --env WHISPER__MODEL=$WHISPER__MODEL fedirz/faster-whisper-server:latest-cuda
docker run --detach --gpus=all --publish 8000:8000 --volume hf-hub-cache:/home/ubuntu/.cache/huggingface/hub --env WHISPER__MODEL=$WHISPER__MODEL fedirz/faster-whisper-server:latest-cuda
# or you can run it on a CPU
# docker run --detach --publish 8000:8000 --volume ~/.cache/huggingface:/root/.cache/huggingface --env WHISPER__MODEL=$WHISPER__MODEL fedirz/faster-whisper-server:latest-cpu
# docker run --detach --publish 8000:8000 --volume hf-hub-cache:/home/ubuntu/.cache/huggingface/hub --env WHISPER__MODEL=$WHISPER__MODEL fedirz/faster-whisper-server:latest-cpu

# `pv` is used to limit the rate at which the audio is streamed to the server. Audio is being streamed at a rate of 32kb/s(16000 sample rate * 16-bit sample / 8 bits per byte = 32000 bytes per second). This emulutes live audio input from a microphone: `ffmpeg -loglevel quiet -f alsa -i default -ac 1 -ar 16000 -f s16le -`
# shellcheck disable=SC2002
Expand Down
4 changes: 2 additions & 2 deletions examples/youtube/script.sh
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,9 @@ set -e
export WHISPER__MODEL=Systran/faster-distil-whisper-large-v3 # or Systran/faster-whisper-tiny.en if you are running on a CPU for a faster inference.

# Ensure you have `faster-whisper-server` running. If this is your first time running it expect to wait up-to a minute for the model to be downloaded and loaded into memory. You can run `curl localhost:8000/health` to check if the server is ready or watch the logs with `docker logs -f <container_id>`.
docker run --detach --gpus=all --publish 8000:8000 --volume ~/.cache/huggingface:/root/.cache/huggingface --env WHISPER__MODEL=$WHISPER__MODEL fedirz/faster-whisper-server:latest-cuda
docker run --detach --gpus=all --publish 8000:8000 --volume hf-hub-cache:/home/ubuntu/.cache/huggingface/hub --env WHISPER__MODEL=$WHISPER__MODEL fedirz/faster-whisper-server:latest-cuda
# or you can run it on a CPU
# docker run --detach --publish 8000:8000 --volume ~/.cache/huggingface:/root/.cache/huggingface --env WHISPER__MODEL=$WHISPER__MODEL fedirz/faster-whisper-server:latest-cpu
# docker run --detach --publish 8000:8000 --volume hf-hub-cache:/home/ubuntu/.cache/huggingface/hub --env WHISPER__MODEL=$WHISPER__MODEL fedirz/faster-whisper-server:latest-cpu

# Download the audio from a YouTube video. In this example I'm downloading "The Evolution of the Operating System" by Asionometry YouTube channel. I highly checking this channel out, the guy produces very high content. If you don't have `youtube-dl`, you'll have to install it. https://github.com/ytdl-org/youtube-dl
youtube-dl --extract-audio --audio-format mp3 -o the-evolution-of-the-operating-system.mp3 'https://www.youtube.com/watch?v=1lG7lFLXBIs'
Expand Down
Loading
Loading