Skip to content

Commit

Permalink
docs: add Realtime API page (WIP)
Browse files Browse the repository at this point in the history
  • Loading branch information
Fedir Zadniprovskyi committed Mar 3, 2025
1 parent 14ebda4 commit 820a374
Show file tree
Hide file tree
Showing 4 changed files with 45 additions and 11 deletions.
18 changes: 9 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
> [!NOTE]
> This project was previously named `faster-whisper-server`. I've decided to change the name from `faster-whisper-server`, as the project has evolved to support more than just transcription.
> This project was previously named `faster-whisper-server`. I've decided to change the name from `faster-whisper-server`, as the project has evolved to support more than just ASR.
# Speaches

Expand All @@ -11,18 +11,18 @@ See the documentation for installation instructions and usage: [speaches.ai](htt

## Features:

- GPU and CPU support.
- [Deployable via Docker Compose / Docker](https://speaches.ai/installation/)
- [Highly configurable](https://speaches.ai/configuration/)
- OpenAI API compatible. All tools and SDKs that work with OpenAI's API should work with `speaches`.
- Streaming support (transcription is sent via SSE as the audio is transcribed. You don't need to wait for the audio to fully be transcribed before receiving it).
- Dynamic model loading / offloading. Just specify which model you want to use in the request and it will be loaded automatically. It will then be unloaded after a period of inactivity.
- Text-to-Speech via `kokoro`(Ranked #1 in the [TTS Arena](https://huggingface.co/spaces/Pendrokar/TTS-Spaces-Arena)) and `piper` models.
- [Coming soon](https://github.com/speaches-ai/speaches/issues/231): Audio generation (chat completions endpoint) | [OpenAI Documentation](https://platform.openai.com/docs/guides/realtime)
- Audio generation (chat completions endpoint) | [OpenAI Documentation](https://platform.openai.com/docs/guides/realtime)
- Generate a spoken audio summary of a body of text (text in, audio out)
- Perform sentiment analysis on a recording (audio in, text out)
- Async speech to speech interactions with a model (audio in, audio out)
- [Coming soon](https://github.com/speaches-ai/speaches/issues/115): Realtime API | [OpenAI Documentation](https://platform.openai.com/docs/guides/realtime)
- Streaming support (transcription is sent via SSE as the audio is transcribed. You don't need to wait for the audio to fully be transcribed before receiving it).
- Dynamic model loading / offloading. Just specify which model you want to use in the request and it will be loaded automatically. It will then be unloaded after a period of inactivity.
- Text-to-Speech via `kokoro`(Ranked #1 in the [TTS Arena](https://huggingface.co/spaces/Pendrokar/TTS-Spaces-Arena)) and `piper` models.
- GPU and CPU support.
- [Deployable via Docker Compose / Docker](https://speaches.ai/installation/)
- [Highly configurable](https://speaches.ai/configuration/)
- [Realtime API](https://speaches.ai/configuration/)

Please create an issue if you find a bug, have a question, or a feature suggestion.

Expand Down
8 changes: 6 additions & 2 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
!!! note

This project was previously named `faster-whisper-server`. I've decided to change the name from `faster-whisper-server`, as the project has evolved to support more than just transcription.
This project was previously named `faster-whisper-server`. I've decided to change the name from `faster-whisper-server`, as the project has evolved to support more than just ASR.

!!! note

Expand All @@ -23,12 +23,16 @@
- GPU and CPU support.
- [Deployable via Docker Compose / Docker](https://speaches.ai/installation/)
- [Highly configurable](https://speaches.ai/configuration/)
- [Coming soon](https://github.com/speaches-ai/speaches/issues/115): Realtime API | [OpenAI Documentation](https://platform.openai.com/docs/guides/realtime)
- [Realtime API](https://speaches.ai/configuration/)

Please create an issue if you find a bug, have a question, or a feature suggestion.

## Demos

### Realtime API

TODO

### Audio Chat

TODO
Expand Down
29 changes: 29 additions & 0 deletions docs/usage/realtime-api.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
!!! note

Before proceeding, you should be familiar with the [OpenAI Realtime API](https://platform.openai.com/docs/guides/realtime) and the relevant [OpenAI API reference](https://platform.openai.com/docs/api-reference/realtime-client-events)

## Prerequisites

Follow the prerequisites in the [Text-to-Speech](./text-to-speech.md) guide. And set the following environmental variables:

- `CHAT_COMPLETION_BASE_URL` to the base URL of an OpenAI API compatible endpoint | [Config](../configuration.md#speaches.config.Config.chat_completion_base_url)
- `CHAT_COMPLETION_API_KEY` if the API you are using requires authentication | [Config](../configuration.md#speaches.config.Config.chat_completion_api_key)

## Demo

TODO

## Limitations

- ["response.cancel"](https://platform.openai.com/docs/api-reference/realtime-client-events/response/cancel) and ["conversation.item.truncate"](https://platform.openai.com/docs/api-reference/realtime-client-events/conversation/item/truncate) client events are not supported
- ["conversation.item.create"](https://platform.openai.com/docs/api-reference/realtime-client-events/conversation/item/create) with `content` field containing `input_audio` message is not supported

## Architecture

TODO

## Next Steps

- Image support
- Speech-to-speech model support
- Optimizations and feature
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ plugins:
nav:
- Introduction: index.md
- Usage / Capabilities:
- Realtime API: usage/realtime-api.md
- Text-to-Speech: usage/text-to-speech.md
- Speech-to-Text: usage/speech-to-text.md
- Voice Chat: usage/voice-chat.md
Expand Down

0 comments on commit 820a374

Please sign in to comment.