Skip to content

This is the official implementation of the paper "Sparse-Dense Side-Tuner for efficient Video Temporal Grounding"

License

Notifications You must be signed in to change notification settings

davidpujol/SDST

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Sparse-Dense Side-Tuner for efficient Video Temporal Grounding

This repository contains the official implementation of the paper Sparse-Dense Side-Tuner for efficient Video Temporal Grounding

Overview

Video Temporal Grounding (VTG) involves Moment Retrieval (MR) and Highlight Detection (HD) based on textual queries. For this, most methods rely solely on finallayer features of frozen large pre-trained backbones, limiting their adaptability to new domains. While full finetuning is often impractical, parameter-efficient fine-tuning–and particularly side-tuning (ST)– has emerged as an effective alternative. However, prior ST approaches this problem from a frame-level refinement perspective, overlooking the inherent sparse nature of MR. To address this, we propose the Sparse-Dense Side-Tuner (SDST), the first anchor-free ST architecture for VTG. We also introduce the Reference-based Deformable Self-Attention, a novel mechanism that enhances the context modeling of the deformable attention–a key limitation of existing anchor-free methods. Additionally, we present the first effective integration of InternVideo2 backbone into an ST framework, showing its profound implications in performance. Overall, our method significantly improves existing ST methods, achieving highly competitive or SOTA results on QVHighlights, TACoS, and Charades-STA, while reducing up to a 73% the parameter count w.r.t. the existing SOTA methods

SDST Overview

Data preparation

Download the pre-extracted features from here and modify the link of the docker container initialization from below.

Installation

Docker Setup

To set up the environment using Docker, follow these steps:

  1. Build the Docker Image:

    docker build -t sdst_image:latest .
  2. Run the Docker Container:

    docker run --gpus 'all' -it --rm --shm-size 200gb -v ./:/SDST -v ./model_results:/SDST/model_results  -v <path_to_data>:/data sdst_image

    Modify the data path for the path that you saved the data into (see the Data Preparation below)

Installing additional dependencies

This part refers to the installation of additional dependencies like RoiAlign. See the original repository for more details.

cd models/ops; python setup.py build_ext --inplace; cd ../..

Training from scratch

To train the model from scratch, run the following command, where CONFIG_PATH with the path to your desired experiment configuration file:

python tools/launch.py -c ./configs/CONFIG_PATH --exp_name <experiment_name>

Concretely, to train on QVHighlights:

python tools/launch.py ./configs/qvhighlights/sdst_qvhighlights.py --exp_name debug

To train on Charades-STA:

python tools/launch.py ./configs/charades/sdst_charades.py --exp_name debug

Or to train on TACOS:

python tools/launch.py ./configs/tacos/sdst_tacos.py --exp_name debug

Evaluate

To evaluate the performance of a given model, run the following command:

python tools/launch.py <path-to-config> --checkpoint <path-to-checkpoint> --eval

For QVHighlights:

python tools/launch.py configs/qvhighlights/sdst_qvhighlights.py  --checkpoint /SDST/checkpoints_sdst/checkpoint_qvhighlights.pth --eval

For Charades-STA:

python tools/launch.py configs/charades/sdst_charades.py  --checkpoint /SDST/checkpoints_sdst/checkpoint_charades_sta.pth --eval

For Tacos:

python tools/launch.py configs/tacos/sdst_tacos.py  --checkpoint /SDST/checkpoints_sdst/checkpoint_tacos.pth --eval

Generate a submission

To generate a submission give a trained model, run the following command:

python tools/launch.py <path-to-config> --checkpoint <path-to-checkpoint> --dump

For instance, to do so for QVHighlgihts:

python tools/launch.py configs/qvhighlights/sdst_qvhighlights.py  --checkpoint /SDST/checkpoints_sdst/checkpoint_qvhighlights.pth --dump

Contact

For any questions or inquiries, please contact david dot pujolperich at gmail dot com

Acknowledgments:

This implementation is based on the excellent work of R2-Tuning.

Citation

If you find this work useful, please cite our paper:

@inproceedings{pujol2025sparse,
  title={Sparse-dense side-tuner for efficient video temporal grounding},
  author={Pujol-Perich, David and Escalera, Sergio and Clap{\'e}s, Albert},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={21515--21524},
  year={2025}
}

About

This is the official implementation of the paper "Sparse-Dense Side-Tuner for efficient Video Temporal Grounding"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages