This repository implements Multimodal Adversarial Prompt Tuning, a technique for improving the adversarial robustness of pre-trained Vision-Language models.
To set up the required environment, please follow the installation instructions provided in the CoOp repository.
Before training or evaluating the models, you'll need to prepare the necessary datasets. Detailed instructions on downloading, preprocessing, and organizing the data can be found in DATASETS.md.
This project provides scripts for training and evaluating various prompt designs. You can find all scripts in the ./scripts directory.
Here are examples of how to train and evaluate different Multimodal Adversarial Prompt Tuning using a ViT-B/16 backbone in a zero-shot setting:
-
AdvIVLP (Adversarial V-L Independent Prompt):
./scripts/AdvIVLP/zs_vit16_train_AdvIVLP.sh
-
AdvMaple (Adversarial V-L Joint Prompt):
./scripts/AdvMaple/zs_vit16_train_AdvMaple.sh
-
AdvVP (Adversarial Visual Prompt):
./scripts/AdvVPT/zs_vit16_train_AdvVPT.sh
-
AdvCoOp (Adversarial Textual Prompt):
./scripts/AdvCoOp/zs_vit16_train_AdvCoOp.sh
This repository is built upon MaPLe and CoOp. Thanks for those well-organized codebases.
