docs: add Realtime API page (WIP)

speaches-ai · Mar 3, 2025 · 820a374 · 820a374
1 parent 14ebda4
commit 820a374
Show file tree

Hide file tree

Showing 4 changed files with 45 additions and 11 deletions.
diff --git a/README.md b/README.md
@@ -1,5 +1,5 @@
 > [!NOTE]
-> This project was previously named `faster-whisper-server`. I've decided to change the name from `faster-whisper-server`, as the project has evolved to support more than just transcription.
+> This project was previously named `faster-whisper-server`. I've decided to change the name from `faster-whisper-server`, as the project has evolved to support more than just ASR.
 
 # Speaches
 
@@ -11,18 +11,18 @@ See the documentation for installation instructions and usage: [speaches.ai](htt
 
 ## Features:
 
-- GPU and CPU support.
-- [Deployable via Docker Compose / Docker](https://speaches.ai/installation/)
-- [Highly configurable](https://speaches.ai/configuration/)
 - OpenAI API compatible. All tools and SDKs that work with OpenAI's API should work with `speaches`.
-- Streaming support (transcription is sent via SSE as the audio is transcribed. You don't need to wait for the audio to fully be transcribed before receiving it).
-- Dynamic model loading / offloading. Just specify which model you want to use in the request and it will be loaded automatically. It will then be unloaded after a period of inactivity.
-- Text-to-Speech via `kokoro`(Ranked #1 in the [TTS Arena](https://huggingface.co/spaces/Pendrokar/TTS-Spaces-Arena)) and `piper` models.
-- [Coming soon](https://github.com/speaches-ai/speaches/issues/231): Audio generation (chat completions endpoint) | [OpenAI Documentation](https://platform.openai.com/docs/guides/realtime)
+- Audio generation (chat completions endpoint) | [OpenAI Documentation](https://platform.openai.com/docs/guides/realtime)
   - Generate a spoken audio summary of a body of text (text in, audio out)
   - Perform sentiment analysis on a recording (audio in, text out)
   - Async speech to speech interactions with a model (audio in, audio out)
-- [Coming soon](https://github.com/speaches-ai/speaches/issues/115): Realtime API | [OpenAI Documentation](https://platform.openai.com/docs/guides/realtime)
+- Streaming support (transcription is sent via SSE as the audio is transcribed. You don't need to wait for the audio to fully be transcribed before receiving it).
+- Dynamic model loading / offloading. Just specify which model you want to use in the request and it will be loaded automatically. It will then be unloaded after a period of inactivity.
+- Text-to-Speech via `kokoro`(Ranked #1 in the [TTS Arena](https://huggingface.co/spaces/Pendrokar/TTS-Spaces-Arena)) and `piper` models.
+- GPU and CPU support.
+- [Deployable via Docker Compose / Docker](https://speaches.ai/installation/)
+- [Highly configurable](https://speaches.ai/configuration/)
+- [Realtime API](https://speaches.ai/configuration/)
 
 Please create an issue if you find a bug, have a question, or a feature suggestion.
 

diff --git a/docs/index.md b/docs/index.md
@@ -1,6 +1,6 @@
 !!! note
 
-    This project was previously named `faster-whisper-server`. I've decided to change the name from `faster-whisper-server`, as the project has evolved to support more than just transcription.
+    This project was previously named `faster-whisper-server`. I've decided to change the name from `faster-whisper-server`, as the project has evolved to support more than just ASR.
 
 !!! note
 
@@ -23,12 +23,16 @@
 - GPU and CPU support.
 - [Deployable via Docker Compose / Docker](https://speaches.ai/installation/)
 - [Highly configurable](https://speaches.ai/configuration/)
-- [Coming soon](https://github.com/speaches-ai/speaches/issues/115): Realtime API | [OpenAI Documentation](https://platform.openai.com/docs/guides/realtime)
+- [Realtime API](https://speaches.ai/configuration/)
 
 Please create an issue if you find a bug, have a question, or a feature suggestion.
 
 ## Demos
 
+### Realtime API
+
+TODO
+
 ### Audio Chat
 
 TODO

diff --git a/docs/usage/realtime-api.md b/docs/usage/realtime-api.md
@@ -0,0 +1,29 @@
+!!! note
+
+    Before proceeding, you should be familiar with the [OpenAI Realtime API](https://platform.openai.com/docs/guides/realtime) and the relevant [OpenAI API reference](https://platform.openai.com/docs/api-reference/realtime-client-events)
+
+## Prerequisites
+
+Follow the prerequisites in the [Text-to-Speech](./text-to-speech.md) guide. And set the following environmental variables:
+
+- `CHAT_COMPLETION_BASE_URL` to the base URL of an OpenAI API compatible endpoint | [Config](../configuration.md#speaches.config.Config.chat_completion_base_url)
+- `CHAT_COMPLETION_API_KEY` if the API you are using requires authentication | [Config](../configuration.md#speaches.config.Config.chat_completion_api_key)
+
+## Demo
+
+TODO
+
+## Limitations
+
+- ["response.cancel"](https://platform.openai.com/docs/api-reference/realtime-client-events/response/cancel) and ["conversation.item.truncate"](https://platform.openai.com/docs/api-reference/realtime-client-events/conversation/item/truncate) client events are not supported
+- ["conversation.item.create"](https://platform.openai.com/docs/api-reference/realtime-client-events/conversation/item/create) with `content` field containing `input_audio` message is not supported
+
+## Architecture
+
+TODO
+
+## Next Steps
+
+- Image support
+- Speech-to-speech model support
+- Optimizations and feature
diff --git a/mkdocs.yml b/mkdocs.yml
@@ -42,6 +42,7 @@ plugins:
 nav:
   - Introduction: index.md
   - Usage / Capabilities:
+      - Realtime API: usage/realtime-api.md
       - Text-to-Speech: usage/text-to-speech.md
       - Speech-to-Text: usage/speech-to-text.md
       - Voice Chat: usage/voice-chat.md