Dynamic-Threshold-for-Image-Retrieval

Code for 'Dynamic Threshold for Image Retrieval' (CSoNet 2024). We propose a dynamic threshold method for content-based image retrieval that adapts to each image gallery, enhancing accuracy over static thresholds using CLIP and FAISS.

Abstract

This repository contains the implementation of the paper "Dynamic Threshold for Image Retrieval," accepted at CSoNet 2024. The paper introduces a dynamic threshold method for content-based image retrieval (CBIR) that adapts the retrieval threshold for each image gallery based on the distribution of similar images in the feature space. Unlike static thresholds, which struggle to accommodate varying query characteristics, our approach leverages the CLIP model for feature extraction and FAISS for efficient similarity search to achieve superior retrieval accuracy. Evaluations on the ROxford and LogoSearch datasets demonstrate significant improvements over traditional static threshold methods.

Colab Notebook

Access the notebook here:
Colab Notebook Link

The notebook includes everything you need:

Feature extraction using CLIP
Distance calculation with FAISS
Dynamic threshold determination
Evaluation on the ROxford dataset

Method Overview

The proposed method enhances CBIR through a dynamic thresholding approach. Here's how it works:

Feature Extraction:
The CLIP model encodes both query and gallery images into a 1024-dimensional latent space, capturing rich semantic representations.
Distance Calculation:
FAISS computes Euclidean distances between the latent vectors of the query and gallery images, enabling efficient similarity search.
Dynamic Threshold Determination:
- For each gallery, the mean distance to its top n n n most similar images is calculated.
- A global mean distance is computed across all galleries.
- The dynamic threshold for each gallery is derived by adjusting a static threshold with a weighted difference between the gallery's mean distance and the global mean.
Retrieval:
Images are retrieved if their distance to the query falls below the gallery-specific dynamic threshold, ensuring adaptability to diverse gallery characteristics.

This approach improves retrieval precision by tailoring the threshold to the unique properties of each gallery.

Results

The dynamic thresholding method was evaluated against a static threshold baseline on the ROxford and LogoSearch datasets. Key improvements in macro F1 scores include:

ROxford (medium): 12.4% improvement (from 0.302 to 0.3394)
ROxford (hard): 6.41% improvement (from 0.2620 to 0.2788)
LogoSearch: 2.72% improvement (from 0.9488 to 0.9746)

These gains highlight the method's ability to enhance CBIR performance across datasets of varying complexity.

Citation

If you use this code in your research, please cite our paper:

waitting

License

This project is released under the MIT License. See the file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Dynamic_Threshold_for_Image_Retrieval.ipynb		Dynamic_Threshold_for_Image_Retrieval.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dynamic-Threshold-for-Image-Retrieval

Abstract

Colab Notebook

Method Overview

Results

Citation

License

About

Uh oh!

Releases

Packages

Languages

VarinPond/Dynamic-Threshold-for-Image-Retrieval

Folders and files

Latest commit

History

Repository files navigation

Dynamic-Threshold-for-Image-Retrieval

Abstract

Colab Notebook

Method Overview

Results

Citation

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages