code base : official MaskCLIP repo, mmsegmentation
This repository contains the implementation and results of two improved version of MaskCLIP
Improve method 1 : incorporates a new classifier that places greater weight on classes predicted by CLIP.
Improve method 2 : uses the Segformer backbone instead of DeepLabv2-ResNet101.
| MaskCLIP(RN50) | mIoU | config | json |
|---|---|---|---|
| base | 18.46 | config | json |
| + class weight (tau=0.25) | 20.54 | config | json |
| MaskCLIP(ViT16) | mIoU | config | json |
|---|---|---|---|
| base | 21.68 | config | json |
| + class weight (tau=1) | 24.96 | config | json |
| MaskCLIP+(RN50) | mIoU | config | log |
|---|---|---|---|
| base | 24.82 | config | log |
| + class weight (tau=0.25) | 25.96 | config | json |
| MaskCLIP+(ViT16) | mIoU | config | log |
|---|---|---|---|
| base | 31.56 | config | log |
| + class weight (tau=1) | 32.42 | config | json |
| CLIP backbone | Segmentor | mIoU | Total Params | config | log |
|---|---|---|---|---|---|
| CLIP(ResNet50) | DeepLabv2-ResNet101 | 24.82 | 156M | config | log |
| SegFormer-b5 | 22.87 | 125M | config | log | |
| CLIP(ViT16) | DeepLabv2-ResNet101 | 31.56 | 166M | config | log |
| SegFormer-b5 | 33.88 | 169M | config | log |
Step 0. Make a conda environment
bash env_install.shStep 1. Dataset Preparation (ref : dataset_prepare.md)
bash pascal_context_preparation.shStep 2. Download and convert the CLIP models & Prepare the text embeddings
bash download_weights.shStep 3. Download the SegFormer weights pretrained on ImageNet-1 at here and locate them in pretrain folder
Step 4. Convert pretrained mit models to MMSegmentation style
python tools/model_converters/mit2mmseg.py pretrain/mit_b0.pth pretrain/mit_b0_weight.pthONLY Inference.
Get quantitative results (mIoU):
python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} --eval mIoUGet qualitative results:
python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} --show-dir ${OUTPUT_DIR}MaskCLIP+ trains another segmentation model(SegFormer) with pseudo labels extracted from MaskCLIP.
Train. (please refer to train.md
# if single GPUs, (examples in exp_1.sh)
python tools/train.py ${CONFIG_FILE}
# if multiple GPUs, (examples in exp_2.sh)
bash tools/dist_train.sh ${CONFIG_FILE} ${GPU_NUM}Inference.
Get quantitative results (mIoU):
python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} --eval mIoUGet qualitative results:
python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} --show-dir ${OUTPUT_DIR}Error 0. ImportError: libGL.so.1: cannot open shared object file: No such file or directory
sudo apt-get update
sudo apt-get install libgl1Error 1. ImportError: MagickWand shared library not found.
sudo apt-get update
sudo apt-get install libmagickwand-devError 2. ImportError: /usr/lib/x86_64-linux-gnu/libstdc++.so.6: version GLIBCXX_3.4.29 not found
sudo add-apt-repository ppa:ubuntu-toolchain-r/test
sudo apt-get install --only-upgrade libstdc++6the code base is MaskCLIP
@InProceedings{zhou2022maskclip,
author = {Zhou, Chong and Loy, Chen Change and Dai, Bo},
title = {Extract Free Dense Labels from CLIP},
booktitle = {European Conference on Computer Vision (ECCV)},
year = {2022}
}

