Automatic Speech Recognition for spontaneous and prepared speech & Speech Emotion Recognition in Portuguese (SE&R 2022) Workshop is the first edition of a new series of shared-tasks for the Portuguese language and introduces two versions of a new dataset called CORAA (Corpus of Annotated Audios built in the TaRSila project, an effort of the Center for Artificial Intelligence (C4AI).
The Speech Emotion Recognition track aims to motivate research for SER in our community, mainly to discuss theoretical and practical aspects of SER, pre-processing and feature extraction, and machine learning models for Portuguese. In this task, participants must train their own models using acoustic audio features using the dataset provided called CORAA SER version 1.0 composed of approximately 40 minutes of audio segments labeled in three classes: neutral, non-neutral female, and non-neutral male. While the neutral class represents audio segments with no well-defined emotional state, the non-neutral classes represent segments associated with one of the primary emotional states in the speaker's speech.
This repository presents the code used by our team that got first place on the SER track.
Three key strategies make up the solution:
- The use of a multilingual model (Wav2vec2.0 XLS-R)
- The use of a mixture of voice emotion recognition datasets from several languages
- Dataset normalization
It is important to install the dependencies before launching the application.
Run the following command to install the required dependencies using pip:
sudo pip install -r requeriments
Three more speech emotion recognition datasets were used in addition to the CORAA SER dataset:
Run the script prepare_datasets.sh
to download all of the datasets used in the experiments, then run the script get_metadata.py
to prepare the metadata used to train, validate and test the models.
The default.yaml
file in the config
directory can be used to set the essential configurations for training a model.
If you want to apply gain normalization during training/testing, you must first calculate the mean dbfs level, which you can accomplish by running the get_mean_dbfs.py
script in the utils
directory, and then inserting the result in the target_dbfs
parameter.
The model weights with the best results can be found on the huggingface hub and can be easily fine-tuned in more data by this application.
- Alef Iury Siqueira Ferreira
- e-mail: [email protected]