This work evaluted the robustness of video-language models on text-to-video retreival using a variety of video and/or text perturbations. For more information, check out our site.
Different real-world perturbations used in this study.
To generate text perturbations, code is available in generate_noisy_text.py.
You can call this script from the command line, for example:
python generate_noisy_text.py msrvtt --meta_pth msvrtt_eval.csv --text_style --textflintThis will call perturbations to run for those generated by the TextStyle and TextFlint packages for the MSRVTT dataset using the csv file that has (at minimum) columns for video_id and text.
This is the same procedure for the MC VideoQA on MSRVTT in generate_noisy_mc_videoqa.py
We provide both the on-the-fly generation of perturbations in video_perturbations.py which is useful for processing
pre-extracted features and generating noisy video copies in generate_noisy_videos.py.
To run generate_noisy_videos.py, an example is:
python generate_noisy_videos.py msrvtt data/msrvtt/videos data/msrvtt/noisy_videos blurThis will run generating videos for MSRVTT where the original videos are stored in data/msrvtt/videos, perturbing with
blur and saving the copies in data/msrvtt/noisy_videos.
Before running this command, you need to generate a file for the MSRVTT and YouCook2 dataset with a mapping of
the original video for one column and the target file for the second. This should be stored
as datasets/{youcook2, msrvtt}_videolist.csv. Example:
YouCook2/validation/226/videos/xHr8X2Wpmno.mkv,robustness/youcook2/xHr8X2Wpmno.mkv
YouCook2/validation/105/videos/V53XmPeyjIU.mkv,robustness/youcook2/V53XmPeyjIU.mkv
YouCook2/validation/201/videos/mZwK0TBI1iY.mkv,robustness/youcook2/mZwK0TBI1iY.mkv
YouCook2/validation/310/videos/gEYyWqs1oL0.mp4,robustness/youcook2/gEYyWqs1oL0.mp4
Use video_perturbations.py by creating a VideoPerturbation object by initializing the perturbation and severity.
This is useful when modifying video feature extractor code from
fairseq
and VideoFeatureExtractor.
The file robustness_scores.py provides sample code on how to calculate the robustness score for perturbation combinations. This is done by collecting model retreival scores for R@5, R@10, R@25 for different perturbation scores. This particular function requires a pandas.dataframe as the results of models and their runs were collected in csv files. An example of what this file may look like is:
| R@1 | R@5 | Median-R | Model | Dataset | Perturbation | Severity | Type | PerturbModality | Name | Train | R@1 Error | R@5 Error |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.103 | 0.227 | 41 | VideoClip | MSRVTT | shuffle_order | 0 | Positional | Text | ShuffleOrder | zs | 0 | 0 |
| 0.072 | 0.181 | 59 | VideoClip | MSRVTT | shuffle_order | 1 | Positional | Text | ShuffleOrder | zs | -0.031 | -0.046 |
| 0.103 | 0.227 | 41 | VideoClip | MSRVTT | shot_noise | 0 | Noise | Video | ShotNoise | zs | 0 | 0 |
| 0.063 | 0.153 | 63.5 | VideoClip | MSRVTT | shot_noise | 1 | Noise | Video | ShotNoise | zs | -0.04 | -0.074 |
Each perturbation will have a severity of 0 that represents the baseline scores for easier calculation. Any severity greater than 0 indicates a perturbation was applied.
@inproceedings{
schiappa2022robustness,
title={Robustness Analysis of Video-Language Models Against Visual and Language Perturbations},
author={Madeline Chantry Schiappa and Shruti Vyas and Hamid Palangi and Yogesh S Rawat and Vibhav Vineet},
booktitle={Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
year={2022},
url={https://openreview.net/forum?id=A79jAS4MeW9}
}For examples, please see EXAMPLES.md.
