Sparse-Dense Side-Tuner for efficient Video Temporal Grounding

This repository contains the official implementation of the paper Sparse-Dense Side-Tuner for efficient Video Temporal Grounding

Overview

Video Temporal Grounding (VTG) involves Moment Retrieval (MR) and Highlight Detection (HD) based on textual queries. For this, most methods rely solely on finallayer features of frozen large pre-trained backbones, limiting their adaptability to new domains. While full finetuning is often impractical, parameter-efficient fine-tuning–and particularly side-tuning (ST)– has emerged as an effective alternative. However, prior ST approaches this problem from a frame-level refinement perspective, overlooking the inherent sparse nature of MR. To address this, we propose the Sparse-Dense Side-Tuner (SDST), the first anchor-free ST architecture for VTG. We also introduce the Reference-based Deformable Self-Attention, a novel mechanism that enhances the context modeling of the deformable attention–a key limitation of existing anchor-free methods. Additionally, we present the first effective integration of InternVideo2 backbone into an ST framework, showing its profound implications in performance. Overall, our method significantly improves existing ST methods, achieving highly competitive or SOTA results on QVHighlights, TACoS, and Charades-STA, while reducing up to a 73% the parameter count w.r.t. the existing SOTA methods

Data preparation

Download the pre-extracted features from here and modify the link of the docker container initialization from below.

Installation

Docker Setup

To set up the environment using Docker, follow these steps:

Build the Docker Image:
```
docker build -t sdst_image:latest .
```

Run the Docker Container:

docker run --gpus 'all' -it --rm --shm-size 200gb -v ./:/SDST -v ./model_results:/SDST/model_results  -v <path_to_data>:/data sdst_image

Modify the data path for the path that you saved the data into (see the Data Preparation below)

Installing additional dependencies

This part refers to the installation of additional dependencies like RoiAlign. See the original repository for more details.

cd models/ops; python setup.py build_ext --inplace; cd ../..

Training from scratch

To train the model from scratch, run the following command, where CONFIG_PATH with the path to your desired experiment configuration file:

python tools/launch.py -c ./configs/CONFIG_PATH --exp_name <experiment_name>

Concretely, to train on QVHighlights:

python tools/launch.py ./configs/qvhighlights/sdst_qvhighlights.py --exp_name debug

To train on Charades-STA:

python tools/launch.py ./configs/charades/sdst_charades.py --exp_name debug

Or to train on TACOS:

python tools/launch.py ./configs/tacos/sdst_tacos.py --exp_name debug

Evaluate

To evaluate the performance of a given model, run the following command:

python tools/launch.py <path-to-config> --checkpoint <path-to-checkpoint> --eval

For QVHighlights:

python tools/launch.py configs/qvhighlights/sdst_qvhighlights.py  --checkpoint /SDST/checkpoints_sdst/checkpoint_qvhighlights.pth --eval

For Charades-STA:

python tools/launch.py configs/charades/sdst_charades.py  --checkpoint /SDST/checkpoints_sdst/checkpoint_charades_sta.pth --eval

For Tacos:

python tools/launch.py configs/tacos/sdst_tacos.py  --checkpoint /SDST/checkpoints_sdst/checkpoint_tacos.pth --eval

Generate a submission

To generate a submission give a trained model, run the following command:

python tools/launch.py <path-to-config> --checkpoint <path-to-checkpoint> --dump

For instance, to do so for QVHighlgihts:

python tools/launch.py configs/qvhighlights/sdst_qvhighlights.py  --checkpoint /SDST/checkpoints_sdst/checkpoint_qvhighlights.pth --dump

Contact

For any questions or inquiries, please contact david dot pujolperich at gmail dot com

Acknowledgments:

This implementation is based on the excellent work of R2-Tuning.

Citation

If you find this work useful, please cite our paper:

@inproceedings{pujol2025sparse,
  title={Sparse-dense side-tuner for efficient video temporal grounding},
  author={Pujol-Perich, David and Escalera, Sergio and Clap{\'e}s, Albert},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={21515--21524},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
checkpoints_sdst		checkpoints_sdst
configs		configs
datasets		datasets
images		images
models		models
tools		tools
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sparse-Dense Side-Tuner for efficient Video Temporal Grounding

Overview

Data preparation

Installation

Docker Setup

Installing additional dependencies

Training from scratch

Evaluate

Generate a submission

Contact

Acknowledgments:

Citation

About

Uh oh!

Releases

Packages

Languages

License

davidpujol/SDST

Folders and files

Latest commit

History

Repository files navigation

Sparse-Dense Side-Tuner for efficient Video Temporal Grounding

Overview

Data preparation

Installation

Docker Setup

Installing additional dependencies

Training from scratch

Evaluate

Generate a submission

Contact

Acknowledgments:

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages