Estimating Musical Difficulty

Estimating Musical Difficulty Using Audio Features and Spectrogram based Deep Learning: A Case Study on NCT 127

Overview

Musical difficulty emerges from the interaction between structural complexity and performance intensity, yet existing computational approaches rely primarily on symbolic scores and are poorly suited to contemporary commercial music. This project proposes a fully audio based framework for estimating musical difficulty by integrating interpretable audio features, a Perceptual Difficulty Index (PDI), and a spectrogram based deep learning model.

Using a curated corpus of 35 tracks from NCT 127’s discography, the framework extracts psychoacoustic and information theoretic descriptors to quantify musical complexity and intensity. These measures are combined into a continuous PDI score and discretized into three difficulty levels (Easy, Medium, Hard), which are subsequently used to supervise a convolutional neural network trained on Mel-spectrogram segments.

The results demonstrate that difficulty or related perceptual attributes such as transient density, spectral spread, and high frequency activatio are encoded in audio alone and can be learned computationally, even under limited data conditions.

Methods

The proposed pipeline consists of the following stages:

Audio Feature Extraction Ten psychoacoustic and MIR features are extracted using Librosa, capturing: Structural complexity (e.g., pitch range, timbral variability, harmonic contrast) Performance intensity (e.g., tempo, onset density, spectral flatness, energy)
Feature Normalization All features are min–max normalized to ensure comparability across heterogeneous scales.
Perceptual Difficulty Index (PDI) Structural complexity and performance intensity are computed as separate subscores and combined into a single continuous PDI value. Tracks are discretized into Easy, Medium, and Hard difficulty levels using distribution-based thresholds.
Spectrogram Generation and Label Transfer Each track is segmented into beginning, middle, and end excerpts. 128×128 Mel-spectrograms are generated and inherit the difficulty label of their source track.
Deep Learning Classification An EfficientNet-B0 CNN is trained on the labeled spectrograms to learn difficulty-related spectro-temporal patterns.

Tools Used

Python (Colab)
Librosa
Matplotlib / Seaborn
Scikit-learn, NumPy, Pandas
TensorFlow/Keras

File Structure

notebook/: Main notebook (cleaned and annotated). Feature extraction, PDI computation, CNN training
data/: Extracted features and PDI labels
images/: Spectrograms, PCA plots, model outputs
models/: Trained CNN model

Key Results

The PDI organizes songs into interpretable regions of musical difficulty, reflecting meaningful differences in structural density and performance demand.
Spectrogram inspection reveals systematic acoustic differences across difficulty tiers, particularly in transient activity and spectral distribution.
The CNN achieves 0.35 validation accuracy, exceeding the random baseline (0.33) for three-class classification, indicating modest but genuine learning under small-data conditions.

These findings support the claim that musical difficulty is computationally learnable from audio alone, even without symbolic representations.

Reproducibility

To reproduce the analysis:

Run the notebook in notebooks/
Ensure audio files are placed in the expected directory structure
Execute cells sequentially to regenerate features, PDI scores, figures, and model outputs

Notes

This repository is designed for research transparency and peer review, not for large-scale deployment.
Model performance is intentionally reported conservatively, emphasizing theoretical validity over benchmark optimization.
The framework is extensible to larger datasets, alternative feature sets, and regression-based difficulty modeling.

Author

Imas Viestawati, Independent Researcher
If you find this useful, feel free to like the repo!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Estimating Musical Difficulty

Overview

Methods

Tools Used

File Structure

Key Results

Reproducibility

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
data		data
images		images
models		models
notebook		notebook
README.md		README.md

masevs/nct127-complexity-analysis

Folders and files

Latest commit

History

Repository files navigation

Estimating Musical Difficulty

Overview

Methods

Tools Used

File Structure

Key Results

Reproducibility

Author

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages