This repository contains the PyTorch implementation of SPTVMod used for the December 2025 pre-print paper Time-Varying Audio Effect Modeling By End-to-End Adversarial Training authored by Y. Bourdin, P. Legrand and F. Roche.
Link to the paper: https://inria.hal.science/hal-05413743
Accompanying website: https://ybourdin.github.io/sptvmod/
| (1a) | (1b) |
Figure 1. Block diagram of the architecture of SPTVMod. The generator has an orange background, and the discriminator a yellow one.
Figure 2b. Composition of the processing blocks: ModBlock (left), FXBlock (upper right) and FeatBlock (lower right).
- Use the
set_target_length()method to compute:- the expected input length for the specified target length;
- the index intervals to slice the tensors given to different parts of the model, here the modulation path and audio path;
- the cropping sizes of the cropping layers.
- To obtain the generator's output, call
forward()withpaddingmode = CachedPadding1d.NoPadding, anduse_spn = Trueif using state prediction. - To obtain the discriminator's outputs, compute the feature lists of the true and predicted outputs with
disc_featuresand usedisc_from_features()method to obtain the discriminator's outputs.
This computation is unchanged from SPTMod: https://github.com/ybourdin/sptmod?tab=readme-ov-file#computing-the-slicing-indices-and-the-cropping-sizes
The Fast-LFO and Slow-LFO datasets we used for our experiments are available in the folders dp4_fast_lfo_dataset and dp4_slow_lfo_dataset.