Build a machine learning (ML) model that takes audio files as input and returns a corresponding music genre.
Model training is done using the GTZAN dataset for Music Genre Classification (MCG). This dataset consists of 1000 .wav (22KHz 16bit mono) files of 30 second audio excerpts spanning 10 musical genres. Each genre consists of 100 tracks.
It is recommended you install the requirements and you run the code in the notebooks in a separate virtual environment:
$ virtualenv venv
$ source venv/bin/activate
The code was written on Python 3.7.4. The classifier relies on MusiCNN for computing musically-relevant feature vectors, Scikit-learn for training the final classifier, Pandas and Seaborn for visualising the results.
You can install most of the requirements with:
$ pip install -r requirements.txtThe only exception is MusiCNN which you need to install via the command line:
$ git clone https://github.com/jordipons/musicnn
$ cd musicnn
$ python setup.py installVisit the GTZAN homepage and download the GTZAN genre collection (approx. 1.2GB). Extract genres.tar.gz to the data folder in this repo:
tar xzf /path/to/genres.tar.gz -C data/
| Classifier | Avg. Acc | Avg. P | ROC-AUC | Time (training) | Time (testing) |
|---|---|---|---|---|---|
| SVM(RBF,C=1)1 | 0.7724 | -- | -- | --- | --- |
| SVM(RBF,C=1) | 0.7860 | 0.8822 | 0.9740 | 4m40s | 45.9s |
| XGBoost | 0.7826 | 0.8861 | 0.9707 | 26.2s | 569ms |
Please see the jupyter notebooks in the root directory of this repo for how the models are trained and evaluated.
1Reported at https://github.com/jordipons/sklearn-audio-transfer-learning.
The classifier is also provided as a python function in python/classify.py. You can use it to classify an arbritrary audio file as follows:
# Switch to directory
$ cd python/
# (optional) run unit tests
$ python test_classify.py
# python classify.py ../data/genres/blues/blues.00002.wav
blues$The script does not add a new line after execution in order allow its output to be used in e.g. shell scripts. You can also use it from inside another python script as such:
import classify
classify.classify_audio(
"../data/genres/blues/blues.00002.wav",
"../models/features_classifier.pkl",
)The first argument is the audio to be classified, and the second the classifier used (see the notebook files in this repo).
You can find a flask microservice for the classifier in the docker directory. First you need
to place features_classifier.pkl from models/ or Releases in the docker/classifier/models directory. Then you can build and run the microservice with:
$ cd docker
$ docker-compose up --build -dYou can then use curl in order to classify audio files. E.g.:
curl -F 'audio=@classical.00067.wav' http://localhost:5000
{"duration":0.9515135288238525,"message":"classical","status":"success"}When you finish testing you can shut down the container with:
$ docker-compose downIn order to reduce the time taken for classification, as well guarantee constant time regardless of song duration, we used the sox library to trim 7 seconds around the middle of the song (which translates to roughly 5 feature vectors). This resulted in a constant duration of ~0.95 seconds per song (~2-fold decrease) regardless of song duration while accuracy suffered by a mere 2.4%.
| Type | Time (ms) | Accuracy |
|---|---|---|
| Original | 1963±44 | 0.7828 |
| Truncated (7sec) | 998±37 | 0.7586 |
| Difference | 197% | ~2.4% |
Please see doc/TechReport.pdf for a more comprehensive technical report.