Vision-Language Retrieval Project

Usage

Run the training script from the project root using one of the configuration files:

# Baseline with frozen backbones
python src/main.py --config configs/baseline_frozen.yaml

# LoRA fine-tuning (Rank 16)
python src/main.py --config configs/lora_r16.yaml

# Full fine-tuning
python src/main.py --config configs/full_finetune.yaml

Configuration Parameters

Configuration is managed via YAML files. Key parameters include:

Model

Parameter	Description	Default/Example
`image_model_name`	Name of the image backbone (torchvision or HF)	`resnet50`
`text_model_name`	Name of the text backbone (HF)	`bert-base-uncased`
`embed_dim`	Dimension of the shared embedding space	`512`
`freeze_backbones`	Whether to freeze pre-trained weights	`true`/`false`
`use_lora`	Enable LoRA fine-tuning	`true`/`false`
`lora_r`	LoRA rank	`16`
`lora_target_modules`	Modules to apply LoRA to	`["query", "value"]`
`load_in_4bit`	Enable 4-bit quantization (QLoRA)	`false`
`load_in_8bit`	Enable 8-bit quantization	`false`

Training

Parameter	Description	Default/Example
`loss`	Loss function to use	`contrastive`
`mixed_precision`	Enable FP16 mixed-precision training	`true`
`batch_size`	Training batch size	`64`
`num_epochs`	Number of training epochs	`5`
`optimizer.name`	Optimizer name	`AdamW`
`optimizer.params.lr`	Learning rate	`0.0001`

Supported Loss Functions

Set the training.loss parameter in the config to one of the following:

contrastive: Standard symmetric cross-entropy loss (CLIP-style).
contrastive_semihard: Contrastive loss with semi-hard negative mining.
siglip: Sigmoid Loss for Language Image Pre-Training (requires model.use_bias: true).

Embedding Space Visualization

embedding_analysis.py visualizes the learned embedding space of the trained model using T-SNE. It fetches a trained model from Weights & Biases and runs inference on a balanced subset of the COCO dataset.

Usage

python src/analysis/embedding_analysis.py --run-path <entity>/<project>/<run_id> [options]

Arguments

--run-path: Required. The W&B run path (e.g., username/semantic-image-search/123456).
--model-filename: Filename of the model in W&B artifacts (default: main.pth).
--coco-annotation-file: Path to COCO annotations JSON (default: data/annotations/captions_val2017.json).
--coco-image-dir: Path to COCO images directory (default: data/val2017).
--samples-per-category: Number of samples to select per category (default: 20).
--categories: List of categories to visualize (default: cat dog car pizza).
--output-dir: Directory to save the plot (default: analysis_results).

Example

python src/analysis/embedding_analysis.py \
    --run-path ebrahimpichka/semantic-image-search/3x8j9k2l \
    --categories cat dog car \
    --samples-per-category 50

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
analysis_results		analysis_results
configs		configs
docs		docs
notebooks		notebooks
src		src
.DS_Store		.DS_Store
.gitignore		.gitignore
README.MD		README.MD
requirements.txt		requirements.txt
run_lora_experiments.sh		run_lora_experiments.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Vision-Language Retrieval Project

Usage

Configuration Parameters

Model

Training

Supported Loss Functions

Embedding Space Visualization

Usage

Arguments

Example

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Vision-Language Retrieval Project

Usage

Configuration Parameters

Model

Training

Supported Loss Functions

Embedding Space Visualization

Usage

Arguments

Example

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages