Users can create speech signals from an input text by using text-to-speech (TTS), also referred to as speech synthesis. Popular TTS and Vocoder models, such as Tacotron 2, are supported by SpeechBrain (e.g, HiFIGAN).
With SpeechBrain, a SepFormer model for speech source separation was created, and it was pretrained on the WHAMR! dataset, which is essentially a variation of the WSJ0-Mix dataset with ambient noise and reverberation. We strongly advise you to learn more about SpeechBrain for a better experience. The model's performance on the test set of the WHAMR! dataset is 13.7 dB SI-SNRi.
Several techniques are currently accessible in SpeechBrain, including spectrum masking, spectral mapping, and time-domain improvement. Additionally, separation techniques like Conv-TasNet, DualPath RNN, and SepFormer are used.
A wide range of practical applications now use speaker recognition. For speaker recognition, SpeechBrain offers a variety of models, including X-vector, ECAPA-TDNN, PLDA, and contrastive learning.
- http://haythamfayek.com/2016/04/21/speech-processing-for-machine-learning.html
- https://github.com/wiseman/py-webrtcvad
- https://github.com/jameslyons/python_speech_features
- https://github.com/ZhihaoDU/speech_feature_extractor
- http://ibillxia.github.io/blog/archives/
- http://stevemorphet.weebly.com/speech-and-audio-processing
- MFCC
- Git tutorial