Requirement

python 3.8, pip
ffmpeg
download and extract voicevox_engine https://github.com/VOICEVOX/voicevox_engine/releases
set environment variable OPENAI_KEY

Environment and dependencies

python3 -m venv venv
./venv/bin/activate
pip install -r requirements.txt

Api doc generate

python make_docs.py

Run local server

python run.py --voicevox_dir=voicevox_engine_folder

Api docs

We have 2 endpoints

whisper=true => use whisper API whisper=false => use local speech to text model (slower)

API accepts .wav file

Docker

docker build .
docker run -p 50021:50021 -e OPENAI_KEY=xxx container_id

Define connection url in database/setting.py

SQLALCHEMY_DATABASE_URL = "sqlite:///./sql_app.db"

Run with mock engine

python run.py --enable_mock

Sign up and Login to get Authorization token

Then, copy access_token and parse it to Headers when call 2 endpoints voice_voice and text_voice