Skip to content

georgong/VLG-CBM

 
 

Repository files navigation

Result Replication of Vision-Language-Guided Concept Bottleneck Model (VLG-CBM) (Nerulips 2024)

This is the repository reproducing the paper VLG-CBM: Training Concept Bottleneck Models with Vision-Language Guidance, NeurIPS 2024. [Original Paper] [Original Repository]

Our Medium article: Medium Article

  • VLG-CBM provides a novel method to train Concept Bottleneck Models(CBMs) with guidance from both vision and language domain.
  • VLG-CBM provides concise and accurate concept attribution for the decision made by the model. The following figure compares decision explanation of VLG-CBM with existing methods by listing top-five contributions for their decisions.

Decision Explanation

Table of Contents

Setup

  1. Setup conda environment and install dependencies
  conda create -n vlg-cbm python=3.12
  conda activate vlg-cbm
  pip install -r requirements.txt
  1. (optional) Install Grounding DINO for generating annotations on custom datasets
git clone https://github.com/IDEA-Research/GroundingDINO
cd GroundingDINO
pip install -e .
wget https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha2/groundingdino_swinb_cogcoor.pth
cd ..

Quick Start

We provide scripts to download and evaluate pretrained models for CIFAR10, CIFAR100, CUB200, Places365, and ImageNet. To quickly evaluate the pretrained models, follow the steps below:

  1. Download pretrained models from here, unzip them, and place them in the saved_models folder.
  2. Run evaluation script to evaluate the pretrained models under different NEC and obtain Accuracy at different NEC (ANEC) for each dataset.
python sparse_evaluation.py --load_path <path-to-model-dir>

For example, to evaluate the pretrained model for CUB200, run

python sparse_evaluation.py --load_path saved_models/cub

Training

Overview

VLG-CBM Overview

Annotation Generation (Optional)

To train VLG-CBM, images must be annotated with concepts from a Vision-Language model, and this work uses Grounding-DINO for annotation generation. Use the following command to generate annotations for a dataset:

python -m scripts.generate_annotations --dataset <dataset-name> --device cuda --batch_size 32 --text_threshold 0.15 --output_dir annotations

Note: Supported datasets include cifar10, cifar100, cub, places365, and imagenet. The generated annotations will be saved under annotations folder.

Training Pipeline

  1. Download annotated data from here, unzip them, and place it in the annotations folder or generate it using Grounding DINO as described in the previous section.

  2. All datasets must be placed in a single folder specified by the environment variable $DATASET_FOLDER. By default, $DATASET_FOLDER is set to datasets.

Note: To download and process CUB dataset, please run bash download_cub.sh and move the folder under $DATASET_FOLDER. To use ImageNet dataset, you need to download the ImageNet dataset yourself and put it under $DATASET_FOLDER. The other datasets could be downloaded automatically by Torchvision.

  1. Train a concept bottleneck model using the config files in ./configs. For instance, to train a CUB model, run the following command:
  python train_cbm.py --config configs/cub.json --annotation_dir annotations

Evaluate trained models

Number of Effective Concepts (NEC) needs to be controlled to enable a fair comparison of model performance. To evaluate a trained model under different NEC, run the following command:

python sparse_evaluation.py --load_path <path-to-model-dir> --lam <lambda-value>

Scripts for replication

Run the scripts here Scripts for reproducing some of the result in this paper.

Results

Accuracy at NEC=5 (ANEC-5) for non-CLIP backbone models

Dataset CIFAR10 CIFAR100 CUB200 Places365 ImageNet
Random 67.55% 29.52% 68.91% 17.57% 41.49%
LF-CBM 84.05% 56.52% 53.51% 37.65% 60.30%
LM4CV 53.72% 14.64% N/A N/A N/A
LaBo 78.69% 44.82% N/A N/A N/A
VLG-CBM(Ours) 88.55% 65.73% 75.79% 41.92% 73.15%

Accuracy at NEC=5 (ANEC-5) for CLIP backbone models

Dataset CIFAR10 CIFAR100 ImageNet CUB
Random 67.55% 29.52% 18.04% 25.37%
LF-CBM 84.05% 56.52% 52.88% 31.35%
LM4CV 53.72% 14.64% 3.77% 3.63%
LaBo 78.69% 44.82% 24.27% 41.97%
VLG-CBM (Ours) 88.55% 65.73% 59.74% 60.38%

Explainable Decisions

Visualization of activated images

Result Replication

To replicate our results, we ran the following experiments:

ANEC-5 and ANEC-avg across CIFAR-10, CIFAR-100, and CUB

Result Replication

Weight Prunning

Result Replication

Sources

About

[NeurIPS 24] A new training and evaluation framework for learning interpretable deep vision models and benchmarking different interpretable concept-bottleneck-models (CBMs)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Jupyter Notebook 90.2%
  • Python 9.8%