Skip to content

tarolangner/audio_deep_learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Audio Signal Learning

This repository contains sandbox experiments in audio processing with deep learning.

Right now, it implements the following technique:

  • Classifying recordings of spoken digits with a convolutional neural network on spectrograms
  • Modeling of an inverse short-time Fourier transform (STFT) by training a 1D transposed convolution layer

Background

By representing audio signals with spectrograms, such as the spoken number six visualized below, they can be processed in matrix format not unlike the digital pixel images that convolutional neural networks were originally designed for.

Credit

The data used here is based on the AudioMNIST dataset of spoken digits by Sören Becker, as found here:
https://github.com/soerenab/AudioMNIST

Some of the techniques used here are inspired by the work of Peter Bermant and his colleagues at the Earth Species Project and their repository on source separation:
https://github.com/earthspecies/cocktail-party-problem
And the torch-stft implementation of pseeth at:
https://github.com/pseeth/torch-stft

About

Sandbox experiments in deep learning on audio signals

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published