In this project I build an emotional speech detector. The goal is to train the model which can recognise active emotion in a speech segment.
All experiments and computations are published in the Experiments.ipynb
notebook file.
This project is using large audio dataset with emotional annotations to train 5 deep learning models with different architectures to recognise high level of emotional activation in audio segments.
For this projet I developed a custom deep learning framework to configure and run the experiments, as well as to preprocess and load the data.
Most of the models in my experiments are able to learn from mini-batches of different size. For this purpose I developed a dataloader which builds mini batches from the data of similar length to avoid padding overhead.
To run the experiments I created a class which is responsible for initializing and running the training from the provided TrainingConfig. Since some of the models are quite big I added the functionnality of mixed precision training and gradient accumulation.
For evaluation purposes I built a module which computes such metrics as ROC-AUC, best F1 score and corresponding precision and recall.
data module - dataloading, datasets creation, data preprocessing
framework module - training logic and tools
experiments module - experiments running and models architecture