Skip to content

Conformer with multiscale attention for symbolic music generation.

License

Notifications You must be signed in to change notification settings

imbulana/coma-gen

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Conformer with multi-scale local attention for symbolic music generation. See coma for a similar architecture used for composer classification.

Model Architecture (see src/transformer.py):

  • Embedding: REMI token embedding + learned positional embedding

  • Decoder: Stack of conformer-like blocks1 (1/2 * FeedForward → Multi-Scale Local Attention → Conformer Conv Module → 1/2 * FeedForward) blocks with hyper-connections and residual streams:

    • Local Attention: Multi-scale local self-attention with multiple window sizes (e.g., [32, 64]).

      • Each scale uses windowed attention with optional rotary position embeddings (xpos) or dynamic position bias
      • Scales aggregated via learnable weighted sum
      • Query-Key RMSNorm with learnable scales for improved training stability
    • Conformer Conv Module:

    • LayerNorm → Pointwise conv (1D, expansion factor 2) → GLU activation → Depthwise conv (causal) → Swish → Channel LayerNorm → Pointwise conv → Dropout

    • Global Attention: Optional global attention layers can be inserted at specified positions (disabled by default)

    • Hyper-connections: Each component wrapped with residual stream expansion/reduction functions

  • Output: LayerNorm → Linear projection to vocabulary size

Todo

  • KV caching

Setup

Create a conda environment with python 3.11

conda create -n coma-gen python=3.11
conda activate coma-gen

Install requirements

pip install -r requirements.txt

Dataset

Download the Maestro 3.0 dataset2

wget https://storage.googleapis.com/magentadata/datasets/maestro/v3.0.0/maestro-v3.0.0-midi.zip
unzip 'maestro-v3.0.0-midi.zip'
rm 'maestro-v3.0.0-midi.zip'
mv 'maestro-v3.0.0' 'data/maestro-v3.0.0'

Usage

Adjust training params in config.py and begin training the transformer with

python3 train.py

Tensorboard logs will be saved in the LOG_DIR directory.

References

This repo is largely adapted from the following.

local attention: https://github.com/lucidrains/local-attention

conformer: https://github.com/jreremy/conformer, https://github.com/lucidrains/conformer

miditok: https://github.com/Natooz/MidiTok

Footnotes

  1. Gulati, A., Qin, J., Chiu, C., Parmar, N., Zhang, Y., Yu, J., Han, W., Wang, S., Zhang, Z., Wu, Y., & Pang, R. (2020). Conformer: Convolution-augmented Transformer for Speech Recognition. ArXiv, abs/2005.08100.

  2. Hawthorne, C., Stasyuk, A., Roberts, A., Simon, I., Huang, C.A., Dieleman, S., Elsen, E., Engel, J., & Eck, D. (2018). Enabling Factorized Piano Music Modeling and Generation with the MAESTRO Dataset. ArXiv, abs/1810.12247.

About

Conformer with multiscale attention for symbolic music generation.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages