Transfer Learning Dynamics: Freezing Strategies for Audio Classification

Systematic comparison: Exploring ViT-B/16 transfer alternative freezing strategies for music spectrograms.

ABSTRACT

This repository contains the codebase and experimental results for a study investigating the transferability of ImageNet-pretrained Vision Transformers (ViT-B/16) to the domain of audio classification (GTZAN dataset).

Contrary to standard practices in Convolutional Neural Networks (CNNs) where freezing early layers is monotonic in effect, our large-scale evaluation (N=10 random seeds) reveals a non-monotonic "dip-and-recover" phenomenon in Transformers. While partial freezing generally degrades performance, we observe a statistically significant recovery in accuracy when freezing the first 50% of the network, suggesting distinct feature adaptation dynamics in attention-based architectures compared to CNNs.

KEY FINDINGS

We evaluated five distinct freezing configurations. The aggregated results (Mean ± Std Dev) indicate:

Freezing Strategy	Description	Performance Trend
Freeze None	Full Fine-Tuning (All parameters trainable)	Solid Baseline (~71.6%)
Freeze Patch	Freezing only Embeddings + Positional Encoding	Performance Dip (~69.1%)
Freeze 0.2	Freezing first 20% of Encoder Blocks	Recovery (~71.1%)
Freeze 0.5	Freezing first 50% of Encoder Blocks	Highest Accuracy (~73.2%)
Freeze 0.9	Freezing ~90% (Classifier only)	Performance Collapse (~63.1%)

REPOSITORY STRUCTURE

transfer-freezing/
├── analysis/           # Scripts for statistical aggregation and plotting (N=10)
├── outputs/            # Logs, checkpoints, and generated figures
├── scripts/            # Core model definitions and data processing modules
│   ├── models.py       # Custom ViT and ResNet wrappers with partial freezing logic
│   └── dataset.py      # GTZAN loading and stratified splitting
├── config.yaml         # Centralized hyperparameter configuration
├── train.py            # Main training loop with seed control
├── run_experiments.sh  # Shell script for batch execution across seeds
├── requirements.txt    # Python dependencies
└── README.md           # Project documentation

METHODOLOGY

1. Data Preprocessing

Audio samples are transformed into visual representations to leverage computer vision architectures:

Transformation: Log-Mel Spectrograms (128 Mel bands, 224x224 resolution).
Normalization: Per-sample Min-Max Normalization to the $$ range.[1]
Augmentation: No augmentation applied during validation/testing to ensure consistent evaluation.

2. Model Architecture

Backbone: ViT-B/16 (Pretrained on ImageNet-1k).
Modifications: The classification head is replaced to match the GTZAN classes (10 genres).
Input Adaptation: 1-channel spectrograms are replicated to 3 channels to satisfy pre-trained weight dimensions.

3. Experimental Setup

Optimizer: SGD (Learning Rate: $1e^{-4}$, Momentum: $0.9$).
Regularization: Weight Decay: $0.0$, Dropout: Default pretrained configuration.
Robustness: All experiments are repeated across 10 random seeds to verify statistical significance.
NOTE:: None of the experiments included data augmentation so far.

RUNNING:

PREREQUISTIES

Ensure you have Python 3.8+ and PyTorch installed.

pip install -r requirements.txt

REPRODUCTION

To reproduce the full N=10 study, use the provided shell script:

# Make script executable
chmod +x run_experiments.sh

# Run batch experiments
./run_experiments.sh

To run a single specific configuration (e.g., Seed 42, Freeze 50%):

python train.py --model vit_b_16 --freeze_mode freeze_patch_0_5 --seed 42 --config config.yaml

VISUALIZATION

To generate the summary plots after training is complete:

python analysis/plot.py

This will output the vit_freezing_analysis_n10.png figure visualizing the mean accuracy and standard deviation bands.

Research focus: Deep Learning, Audio Signal Processing, and Transfer Learning Dynamics.

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
analysis		analysis
outputs/results		outputs/results
scripts		scripts
testing		testing
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.yaml		config.yaml
dataset.py		dataset.py
requirements.txt		requirements.txt
results_log.txt		results_log.txt
results_summary.csv		results_summary.csv
results_summary_aug.csv		results_summary_aug.csv
results_summary_aug_freeze.csv		results_summary_aug_freeze.csv
run_experiments.sh		run_experiments.sh
testvit.sh		testvit.sh
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Transfer Learning Dynamics: Freezing Strategies for Audio Classification

ABSTRACT

KEY FINDINGS

REPOSITORY STRUCTURE

METHODOLOGY

1. Data Preprocessing

2. Model Architecture

3. Experimental Setup

RUNNING:

PREREQUISTIES

REPRODUCTION

VISUALIZATION

About

Uh oh!

Releases

Packages

Languages

License

ssrhaso/transfer-freezing

Folders and files

Latest commit

History

Repository files navigation

Transfer Learning Dynamics: Freezing Strategies for Audio Classification

ABSTRACT

KEY FINDINGS

REPOSITORY STRUCTURE

METHODOLOGY

1. Data Preprocessing

2. Model Architecture

3. Experimental Setup

RUNNING:

PREREQUISTIES

REPRODUCTION

VISUALIZATION

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages