Srikumar Sastry*, Aayush Dhakal, Eric Xing, Subash Khanal, Nathan Jacobs (*Corresponding Author)
ICCV 2025
Radial Cross-Modal Embeddings (RCME) is a state-of-the-art hierarchical image-text ordering and retrieval method in the embedding space.
| Model | Architecture | HuggingFace |
|---|---|---|
| CLIP | ViT-B/16 | MVRL/rcme-vit-base-patch16 |
| CLIP | ViT-L/14 | MVRL/rcme-vit-large-patch14 |
| TreeofLife | ViT-B/16 | MVRL/rcme-tol-vit-base-patch16 |
- Clone this repository:
git clone https://github.com/mvrl/RCME.git- Install dependencies:
cd RCME && pip install -r requirements.txt- Use BioCLIP's scripts to download TreeofLife-10M dataset:
rcme/data/bioclip/scripts/setup_download_tol-10m_components.bash && \
rcme/data/bioclip/scripts/submit_download_tol-10m_components.bashHint: Setup paths and other variables in setup_download_tol-10m_components.bash script.
- Use our script to convert TreeofLife-10M dataset into iNaturalist-2021 style naming:
python rcme/data/bioclip/write_imgs.pyHint: Setup paths and other variables in our script.
Hint: Currently only supports num_workers=1
- Setup all hyperparameters in
rcme/config.pyfile. - Run training by specifying the model:
python rcme/train.py --model="rcme"Hint: Currently supports rcme, radial, atmg and meru.
Scripts and documentation coming soon...
📑 Citation
@inproceedings{sastry2025global,
title={Global and Local Entailment Learning for Natural World Imagery},
author={Sastry, Srikumar and Dhakal, Aayush and Xing, Eric and Khanal, Subash and Jacobs, Nathan},
booktitle={International Conference on Computer Vision},
year={2025},
organization={IEEE/CVF}
}Check out our lab website for other interesting works on geospatial understanding and mapping:

