English | 日本語
A repository for training chord transcription models. It enables high-precision inference through context interpretation using a SegmentModel.
This document describes the preprocessing pipeline required to prepare the dataset for model training. Please execute each step in order.
Separates audio files into individual instrument stems (vocals, drums, bass, and others) and resamples them to the specified sampling rate.
uv run python -m src.preprocess.separate_and_resample --input <input_dir> --out-dir <output_dir>
-
--input_dir: Directory containing the source audio files. -
Default:
./dataset/songs -
--out-dir: Destination directory for the separated stem files. -
Default:
./dataset/songs_separated
Applies pitch shifting to the separated stems to increase the volume and variety of the training data (Data Augmentation).
uv run python -m src.preprocess.pitch_shift_augment --target_dir <target_dir>
--target_dir: Directory containing the audio files to be pitch-shifted.- Default:
./dataset/songs_separated
Normalizes original chord notations (e.g., CM7, Gm) into a consistent format optimized for model training.
uv run python -m src.preprocess.normalize_chords --input_dir <input_dir> --output_dir <output_dir>
-
--input_dir: Directory containing the raw chord data. -
Default:
./dataset/chords -
--output_dir: Destination directory for the normalized chord data. -
Default:
./dataset/chords_normalize
Generates a CSV file (training/validation pair list) that maps processed audio files to their corresponding chord, key, and tempo labels.
uv run python -m src.preprocess.make_pairs_csv --chords_dir <chords_dir> --keys_dir <keys_dir> --tempos_dir <tempos_dir> --songs_separated_dir <songs_separated_dir> --validation_ratio <validation_ratio>
--chords_dir: Directory containing normalized chords.--keys_dir: Directory containing key information.--tempos_dir: Directory containing tempo information.--songs_separated_dir: Directory containing separated stems.--validation_ratio: The proportion of the dataset to be used for validation.
Calculates the frequency of each chord quality (e.g., Major, minor) across the dataset for use in the loss function during training.
uv run python -m src.preprocess.count_quality_freq --data_folder <data_folder> --quality_definition <quality_definition> --output <output>
-
--data_folder: Directory containing normalized chords. -
Default:
./dataset/chords_normalize -
--quality_definition: Definition file for chord qualities. -
Default:
./data/quality.json -
--output: Path for the output JSON file containing frequency counts. -
Default:
./data/quality_freq_count.json
uv run python -m src.train_transcription --config ./configs/train.yaml
Specify the weights from the first-stage model in the checkpoint.
uv run python -m src.train_segment_transcription --config ./configs/train.yaml --checkpoint <base_transcription.pt> --training_backbone
Specify the weights from the second-stage model in the checkpoint.
uv run python -m src.train_crf --config ./configs/train.yaml --checkpoint <segment_model.pt>
uv run python -m src.inference --config ./configs/train.yaml --checkpoint <base_transcription.pt> --audio <audio_path>
uv run python -m src.inference --config ./configs/train.yaml --checkpoint <segment_model.pt> --audio <audio_path> --use_segment_model
uv run python -m src.inference --config ./configs/train.yaml --crf_checkpoint <crf_model.pt> --audio <audio_path> --use_segment_model
Available for download here.