-
PC with NVIDIA GPU: Ensure you have a PC with an NVIDIA GPU available for running the XTTS server.
-
Raspberry Pi: For obvious reasons.
The TTS server must run on your GPU-enabled PC due to its computational requirements.
- Ensure Python 3.9-3.12 is installed on your PC.
- Install CUDA and cuDNN compatible with your NVIDIA GPU - CUDA Installation
- Install PyTorch compatible with your CUDA and cuDNN versions - PyTorch Installation
Run the following command on your GPU-enabled PC to clone the XTTS API Server repository:
git clone https://github.com/daswer123/xtts-api-server.git
Follow the installation guide for your operating system:
- Windows:
- Create and activate a virtual environment:
python -m venv venv venv\Scripts\activate
- Install xtts-api-server:
pip install xtts-api-server
- Create and activate a virtual environment:
For more details, refer to the official XTTS API Server Installation Guide.
- Download the
TARS-short.wav
andTARS-long.wav
files from theGPTARS_Interstellar
repository underBrain/TTS/wakewords /VoiceClones
. These will be the different voices you can use for TARS. - Place it in the
speakers/
directory within the XTTS project folder. If the directory does not exist, create it.
- Open a terminal in the
xtts-api-server
project directory. - Activate your virtual environment if not already active:
- Start the XTTS API Server:
python -m xtts_api_server --listen --port 8020
- Once the server is running, open a browser and navigate to:
http://localhost:8020/docs
- This will open the API's Swagger documentation interface, which you can use to test the server and its endpoints.
- Locate the GET /speakers endpoint in the API documentation.
- Click "Try it out" and then "Execute" to test the endpoint.
- Ensure the response includes the
TARS-Short
andTARS-Long
speaker files, with entries similar to:[ { "name": "TARS-Long", "voice_id": "TARS-Long", "preview_url": "http://localhost:8020/sample/TARS-Long.wav" }, { "name": "TARS-Short", "voice_id": "TARS-Short", "preview_url": "http://localhost:8020/sample/TARS-Short.wav" } ]
- Locate the POST /tts_to_audio endpoint in the API documentation.
- Click "Try it out" and input the following JSON in the Request Body:
{ "text": "Hello, this is TARS speaking.", "speaker_wav": "TARS-Short", "language": "en" }
- Click "Execute" to send the request.
- Check the response for a generated audio file. You should see a download field where you can download and listen to the audio output.
- Open a terminal on your Raspberry Pi.
- Clone the GPTARS_Interstellar repository:
git clone https://github.com/pyrater/GPTARS_Interstellar.git
- Navigate to the cloned directory:
cd GPTARS_Interstellar
Chromium and Chromedriver are required for Selenium-based operations in the project.
-
Update Your System:
sudo apt update sudo apt upgrade -y
-
Install Chromium:
sudo apt install -y chromium-browser
-
Install Chromedriver for Selenium:
sudo apt install -y chromium-chromedriver
-
Verify Installations:
- Check Chromium installation:
chromium-browser --version
- Check Chromedriver installation:
chromedriver --version
- Check Chromium installation:
- Create a virtual environment:
python3 -m venv venv
- Activate the virtual environment:
source venv/bin/activate
- Install the required dependencies under
Brain/
:pip install -r requirements.txt
- Connect your microphone to the Raspberry Pi via USB.
- Connect your speaker to the Raspberry Pi using the audio output or Bluetooth.
To securely store and use your API keys for OpenAI, Ooba, or Tabby, create and configure a .env
file.
Add API Keys:
Add the following lines to your .env
file. Replace your-actual-api-key
with your actual API key for the desired service:
OPENAI_API_KEY=your-actual-openai-api-key
OOBA_API_KEY=your-actual-ooba-api-key
TABBY_API_KEY=your-actual-tabby-api-key
-
Open the
config.ini
file located in theBrain/
folder. -
Locate the
[TTS]
section and update the parameters:[TTS] # Text-to-Speech configuration (about 3.4GB TTS #xttsv2 or local or TARS) ttsurl = http://<server-ip>:8020 # Replace <server-ip> with the IP address of the machine running the XTTS API Server (e.g., 192.168.2.20). charvoice = True # Use character-specific voice settings. ttsoption = xttsv2 # Set this to `xttsv2` to use the XTTS API Server. ttsclone = TARS-Short # Set this to the desired speaker file (e.g., `TARS-Short` or `TARS-Long`).
- The
ttsurl
should point to the IP and port of the XTTS API Server. - The
ttsclone
should match the desired speaker (e.g.,TARS-Short
).
- The
-
Locate the
[LLM]
section and update the parameters (for OpenAI):[LLM] # Large Language Model configuration (ooba/OAI or tabby) backend = openai # Set this to `openai` if using OpenAI models. base_url = https://api.openai.com # The URL for the OpenAI API. openai_model = gpt-4o-mini # Specify the OpenAI model to use (e.g., gpt-4o-mini or another supported model).
- Confirm that the
openai_model
matches the model available with your OpenAI API key.
- Confirm that the
- Navigate to the
Brain/
folder within the repository:cd Brain/
- Start the application:
python app.py
- The program should now be running and ready to interact using your microphone and speaker.