Audio Signal Learning

This repository contains sandbox experiments in audio processing with deep learning.

Right now, it implements the following technique:

Classifying recordings of spoken digits with a convolutional neural network on spectrograms
Modeling of an inverse short-time Fourier transform (STFT) by training a 1D transposed convolution layer

Background

By representing audio signals with spectrograms, such as the spoken number six visualized below, they can be processed in matrix format not unlike the digital pixel images that convolutional neural networks were originally designed for.

Credit

The data used here is based on the AudioMNIST dataset of spoken digits by Sören Becker, as found here:
https://github.com/soerenab/AudioMNIST

Some of the techniques used here are inspired by the work of Peter Bermant and his colleagues at the Earth Species Project and their repository on source separation:
https://github.com/earthspecies/cocktail-party-problem
And the torch-stft implementation of pseeth at:
https://github.com/pseeth/torch-stft

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
figures		figures
LICENSE		LICENSE
README.md		README.md
digit_classification.ipynb		digit_classification.ipynb
learned_inverse_stft.ipynb		learned_inverse_stft.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Audio Signal Learning

Background

Credit

About

Releases

Packages

Languages

License

tarolangner/audio_deep_learning

Folders and files

Latest commit

History

Repository files navigation

Audio Signal Learning

Background

Credit

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages