Active emotion detection | Recognising audio segments with high emotional activation

In this project I build an emotional speech detector. The goal is to train the model which can recognise active emotion in a speech segment.

All experiments and computations are published in the Experiments.ipynb notebook file.

This project is using large audio dataset with emotional annotations to train 5 deep learning models with different architectures to recognise high level of emotional activation in audio segments.

For this projet I developed a custom deep learning framework to configure and run the experiments, as well as to preprocess and load the data.

Most of the models in my experiments are able to learn from mini-batches of different size. For this purpose I developed a dataloader which builds mini batches from the data of similar length to avoid padding overhead.

To run the experiments I created a class which is responsible for initializing and running the training from the provided TrainingConfig. Since some of the models are quite big I added the functionnality of mixed precision training and gradient accumulation.

For evaluation purposes I built a module which computes such metrics as ROC-AUC, best F1 score and corresponding precision and recall.

data module - dataloading, datasets creation, data preprocessing

framework module - training logic and tools

experiments module - experiments running and models architecture

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Active emotion detection | Recognising audio segments with high emotional activation

Models

1. CNN-Transformer

2. ConvNext

3. ViT (DeiT)

4. Wav2Vec 2.0

5. Hybrid

Files

README.md

Latest commit

History

README.md

File metadata and controls

Active emotion detection | Recognising audio segments with high emotional activation

Models

1. CNN-Transformer

2. ConvNext

3. ViT (DeiT)

4. Wav2Vec 2.0

5. Hybrid