ML Audio Classifier

A project to classify audio files and visualize the result with machine learning based on IBM max-audio-classifier

max-audio-classifier

recognizes a 10-second, signed 16-bit PCM wav file as an input, generates embeddings, applies PCA transformation/quantization, uses the embeddings as an input to a multi-attention classifier and outputs top 5 class predictions and probabilities as output. The model currently supports 527 classes which are part of the google AudioSet Ontology.

The model was trained on AudioSet as described in the paper 'Multi-level Attention Model for Weakly Supervised Audio Classification' by Yu et al.

Google AudioSet

AudioSet consists of an expanding ontology of 632 audio event classes and a collection of 2,084,320 human-labeled 10-second sound clips drawn from YouTube videos. The ontology is specified as a hierarchical graph of event categories, covering a wide range of human and animal sounds, musical instruments and genres, and common everyday environmental sounds.

Step

change mp3 file to wav
cut long audio into 10 seconds pieces
classify
visualize

Result

classification result

bird.wav {"status": "ok", "predictions": [{"label_id": "/m/015p6", "label": "Bird", "probability": 0.42554083466529846}, {"label_id": "/m/09ddx", "label": "Duck", "probability": 0.30564624071121216}, {"label_id": "/m/0jbk", "label": "Animal", "probability": 0.1728699654340744}, {"label_id": "/m/025rv6n", "label": "Fowl", "probability": 0.17106477916240692}, {"label_id": "/m/07qdb04", "label": "Quack", "probability": 0.16343526542186737}]}

visualization result

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.gitignore		.gitignore
AudioSet_Ontology.jpg		AudioSet_Ontology.jpg
README.md		README.md
audiolength.py		audiolength.py
audiowithphoto.py		audiowithphoto.py
base_func.py		base_func.py
category-tree.gv		category-tree.gv
category-tree.png		category-tree.png
classify10.py		classify10.py
classifyDir.py		classifyDir.py
createtime.py		createtime.py
drowFathers.py		drowFathers.py
encodeLabel.py		encodeLabel.py
modifytime.py		modifytime.py
name2id.py		name2id.py
readcsv.py		readcsv.py
showphoto.py		showphoto.py
slice_audio.py		slice_audio.py
storeAudio.py		storeAudio.py
storeSlice.py		storeSlice.py
storeaudiotime.py		storeaudiotime.py
storephototime.py		storephototime.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ML Audio Classifier

max-audio-classifier

Google AudioSet

Step

Result

About

Releases

Packages

Contributors 2

Languages

EverydayDesignStudio/ml-audio-classifier

Folders and files

Latest commit

History

Repository files navigation

ML Audio Classifier

max-audio-classifier

Google AudioSet

Step

Result

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages