Skip to content

Latest commit

 

History

History
29 lines (22 loc) · 1.84 KB

README.md

File metadata and controls

29 lines (22 loc) · 1.84 KB

# Text-to-Speech Recipe

Users can create speech signals from an input text by using text-to-speech (TTS), also referred to as speech synthesis. Popular TTS and Vocoder models, such as Tacotron 2, are supported by SpeechBrain (e.g, HiFIGAN).

# Speech Separation Recipe

With SpeechBrain, a SepFormer model for speech source separation was created, and it was pretrained on the WHAMR! dataset, which is essentially a variation of the WSJ0-Mix dataset with ambient noise and reverberation. We strongly advise you to learn more about SpeechBrain for a better experience. The model's performance on the test set of the WHAMR! dataset is 13.7 dB SI-SNRi.

# Speech Enhancement Recipe

Several techniques are currently accessible in SpeechBrain, including spectrum masking, spectral mapping, and time-domain improvement. Additionally, separation techniques like Conv-TasNet, DualPath RNN, and SepFormer are used.

# Speaker Recognition Recipe

A wide range of practical applications now use speaker recognition. For speaker recognition, SpeechBrain offers a variety of models, including X-vector, ECAPA-TDNN, PLDA, and contrastive learning.

References