Skip to content

Quantifying and Modeling Musical Hardness Using Audio Features and Spectrogram-Based Learning: Study Case NCT 127

Notifications You must be signed in to change notification settings

masevs/nct127-complexity-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

54 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Estimating Musical Difficulty

Estimating Musical Difficulty Using Audio Features and Spectrogram based Deep Learning: A Case Study on NCT 127

Overview

Musical difficulty emerges from the interaction between structural complexity and performance intensity, yet existing computational approaches rely primarily on symbolic scores and are poorly suited to contemporary commercial music. This project proposes a fully audio based framework for estimating musical difficulty by integrating interpretable audio features, a Perceptual Difficulty Index (PDI), and a spectrogram based deep learning model.

Using a curated corpus of 35 tracks from NCT 127’s discography, the framework extracts psychoacoustic and information theoretic descriptors to quantify musical complexity and intensity. These measures are combined into a continuous PDI score and discretized into three difficulty levels (Easy, Medium, Hard), which are subsequently used to supervise a convolutional neural network trained on Mel-spectrogram segments.

The results demonstrate that difficulty or related perceptual attributes such as transient density, spectral spread, and high frequency activatio are encoded in audio alone and can be learned computationally, even under limited data conditions.

Methods

The proposed pipeline consists of the following stages:

  1. Audio Feature Extraction Ten psychoacoustic and MIR features are extracted using Librosa, capturing: Structural complexity (e.g., pitch range, timbral variability, harmonic contrast) Performance intensity (e.g., tempo, onset density, spectral flatness, energy)
  2. Feature Normalization All features are min–max normalized to ensure comparability across heterogeneous scales.
  3. Perceptual Difficulty Index (PDI) Structural complexity and performance intensity are computed as separate subscores and combined into a single continuous PDI value. Tracks are discretized into Easy, Medium, and Hard difficulty levels using distribution-based thresholds.
  4. Spectrogram Generation and Label Transfer Each track is segmented into beginning, middle, and end excerpts. 128×128 Mel-spectrograms are generated and inherit the difficulty label of their source track.
  5. Deep Learning Classification An EfficientNet-B0 CNN is trained on the labeled spectrograms to learn difficulty-related spectro-temporal patterns.

Tools Used

  • Python (Colab)
  • Librosa
  • Matplotlib / Seaborn
  • Scikit-learn, NumPy, Pandas
  • TensorFlow/Keras

File Structure

  • notebook/: Main notebook (cleaned and annotated). Feature extraction, PDI computation, CNN training
  • data/: Extracted features and PDI labels
  • images/: Spectrograms, PCA plots, model outputs
  • models/: Trained CNN model

Key Results

  • The PDI organizes songs into interpretable regions of musical difficulty, reflecting meaningful differences in structural density and performance demand.
  • Spectrogram inspection reveals systematic acoustic differences across difficulty tiers, particularly in transient activity and spectral distribution.
  • The CNN achieves 0.35 validation accuracy, exceeding the random baseline (0.33) for three-class classification, indicating modest but genuine learning under small-data conditions.

These findings support the claim that musical difficulty is computationally learnable from audio alone, even without symbolic representations.

Reproducibility

To reproduce the analysis:

  1. Run the notebook in notebooks/
  2. Ensure audio files are placed in the expected directory structure
  3. Execute cells sequentially to regenerate features, PDI scores, figures, and model outputs

Notes

  • This repository is designed for research transparency and peer review, not for large-scale deployment.
  • Model performance is intentionally reported conservatively, emphasizing theoretical validity over benchmark optimization.
  • The framework is extensible to larger datasets, alternative feature sets, and regression-based difficulty modeling.

Author

Imas Viestawati, Independent Researcher
If you find this useful, feel free to like the repo!

About

Quantifying and Modeling Musical Hardness Using Audio Features and Spectrogram-Based Learning: Study Case NCT 127

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors