Vietnamese Speech Synthesis with VITS text to speech model and TTS Coqui framework

This repository is dedicated to the customization and training of VITS (Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech) for text-to-speech (TTS) applications using Vietnamese language data, utilizing the TTS Coqui framework. The repository contains the necessary code and resources to train VITS specifically for generating high-quality speech from Vietnamese text.

Pre-requisites

I highly recommend you to use conda virtual environment, with Python 3.11.5.

conda create -n vits python=3.11.5

In this repo, I use TTS framework version 0.17.5 for statibility.

pip install TTS==0.17.5

Inference

from TTS.api import TTS

tts = TTS('vits_tts',
          model_path='path to the .pth file ',
          config_path='path to the config.json file')

tts.tts_to_file(text="Your example text", file_path="your_filename.wav")

Demo

My trained model is published on this HuggingFace space. Due to the resource factor to train the model, the results achieved are not as expected. The upcoming goal is to collect personal data for implementation voice clone.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Vietnamese Speech Synthesis with VITS text to speech model and TTS Coqui framework

Pre-requisites

Inference

Demo

Files

README.md

Latest commit

History

README.md

File metadata and controls

Vietnamese Speech Synthesis with VITS text to speech model and TTS Coqui framework

Pre-requisites

Inference

Demo