This repository contains the official implementation for the paper A Semi-Self-Supervised Approach for Dense-Pattern Video Object Segmentation. We tackle the challenging task of Dense Video Object Segmentation (DVOS) in agricultural settings, particularly wheat head segmentation, where objects are numerous, small, occluded, and move unpredictably. Our approach uses a semi-self-supervised method leveraging synthetic data and pseudo-labels, significantly reducing the need for costly manual video annotations. The core of our method is a multi-task UNet-style architecture enhanced with diffusion and spatiotemporal attention mechanisms.
Figure 1: The proposed UNet-style architecture, highlighting the multi-task heads (segmentation, reconstruction) and the spatiotemporal attention blocks with diffusion integration.
git clone https://github.com/KeyhanNajafian/DVOS.git
cd DVOSWe recommend using Conda:
conda env create -f environment.yaml
conda activate DVOSEnvAlternatively, use pip:
pip install -r requirements.txtThis repository is primarily driven by YAML configuration files:
- Frame and Object Extraction: Config files are located in
extraction/configs/ - Video Synthesis: Config files are located in
simulation/configs/
Note: The sample CSV files can be found in the data/ directory.
You can use the following scripts in the extraction module along with their corresponding config files to extract video frames and objects:
python frames_extractor.py --config configs/frames_extractor.yaml
python objects_extractor.yaml --config configs/objects_extractor.yamlThe extracted frames and objects will be organized into CSV files and can be used for data synthesis. To do this, modify the config file simulation/configs/simulation_pipeline.yaml and run the following command in the simulation module:
python simulator.py --config configs/simulator.yamlThis will automatically generate the corresponding metadata, organized into CSV files, which can be directly used for model training with DVOSCode.
Pretrained model weights are available at this link.
-
DVOSCode Pipeline:
For this pipeline, the frames and masks are stored in CSV files. You can find the CSV metadata inside thedata/folder, which contains all the necessary references for the frames and masks. -
DVOSXMem Pipeline:
For this pipeline, you need to organize your data in aroot folderwith two subfolders:frames/andmasks. Inside these subfolders, create identical short video clip subfolders, each containing the corresponding frames and masks for each video.
Training DVOS model Set the config file properly from the DVOSCode pipeline.
python3 ddp_experiment.py --config configs/configs.yamlNote: Example CSV files required for simulation and training are included in the data directory for reference.
Training XMem model To train the XMem model, run the following command inside the XMemCode pipeline:
torchrun --master_port 25763 --nproc_per_node=2 train.py \
--stage 2 \
--s2_batch_size 16 \
--s2_iterations 30000 \
--s2_finetune 10000 \
--s2_lr 0.0001 \
--s2_num_ref_frames 4 \
--save_network_interval 5000 \
--load_network model_path.pth \
--wheat_root data_root_dir_path \
--exp_id experiment_nameNote: The XMem pipeline used in this project is a slightly modified version of the original XMem repository. You can still access the original version at the provided link.
Evaluating DVOSXMem To run the evaluation, make sure to:
- Set the configuration file to the
TESTphase. - Specify the path to the best-pretrained model you want to evaluate.
python3 ddp_experiment.py --config configs/configs.yamlEvaluating DVOSXMem To evaluate a trained DVOSXMem model on a test dataset, run the following command:
python eval.py \
--model best_model_path.pth \
--dataset test_set_root_dir_path \
--split test \
--size 384 \
--output prediction_dir_pathThis will generate the predicted masks and save them inside the prediction_dir_path folder.
Next, run the following command to calculate the scores and overlay the predictions onto the samples in the test set, using the specified overlay_interval:
python scoring.py --gt_dir base_test_root_dir --pr_dir prediction_dir_path --overlay_interval 1@inproceedings{najafian2025semi,
title = {A Semi-Self-Supervised Approach for Dense-Pattern Video Object Segmentation},
author = {Najafian, Keyhan and Maleki, Farhad and Jin, Lingling and Stavness, Ian},
booktitle = {Proceedings of the Computer Vision and Pattern Recognition (CVPR) Conference},
pages = {5412--5421},
year = {2025}
}
