Run the training script from the project root using one of the configuration files:
# Baseline with frozen backbones
python src/main.py --config configs/baseline_frozen.yaml
# LoRA fine-tuning (Rank 16)
python src/main.py --config configs/lora_r16.yaml
# Full fine-tuning
python src/main.py --config configs/full_finetune.yamlConfiguration is managed via YAML files. Key parameters include:
| Parameter | Description | Default/Example |
|---|---|---|
image_model_name |
Name of the image backbone (torchvision or HF) | resnet50 |
text_model_name |
Name of the text backbone (HF) | bert-base-uncased |
embed_dim |
Dimension of the shared embedding space | 512 |
freeze_backbones |
Whether to freeze pre-trained weights | true/false |
use_lora |
Enable LoRA fine-tuning | true/false |
lora_r |
LoRA rank | 16 |
lora_target_modules |
Modules to apply LoRA to | ["query", "value"] |
load_in_4bit |
Enable 4-bit quantization (QLoRA) | false |
load_in_8bit |
Enable 8-bit quantization | false |
| Parameter | Description | Default/Example |
|---|---|---|
loss |
Loss function to use | contrastive |
mixed_precision |
Enable FP16 mixed-precision training | true |
batch_size |
Training batch size | 64 |
num_epochs |
Number of training epochs | 5 |
optimizer.name |
Optimizer name | AdamW |
optimizer.params.lr |
Learning rate | 0.0001 |
Set the training.loss parameter in the config to one of the following:
contrastive: Standard symmetric cross-entropy loss (CLIP-style).contrastive_semihard: Contrastive loss with semi-hard negative mining.siglip: Sigmoid Loss for Language Image Pre-Training (requiresmodel.use_bias: true).
embedding_analysis.py visualizes the learned embedding space of the trained model using T-SNE. It fetches a trained model from Weights & Biases and runs inference on a balanced subset of the COCO dataset.
python src/analysis/embedding_analysis.py --run-path <entity>/<project>/<run_id> [options]--run-path: Required. The W&B run path (e.g.,username/semantic-image-search/123456).--model-filename: Filename of the model in W&B artifacts (default:main.pth).--coco-annotation-file: Path to COCO annotations JSON (default:data/annotations/captions_val2017.json).--coco-image-dir: Path to COCO images directory (default:data/val2017).--samples-per-category: Number of samples to select per category (default: 20).--categories: List of categories to visualize (default:cat dog car pizza).--output-dir: Directory to save the plot (default:analysis_results).
python src/analysis/embedding_analysis.py \
--run-path ebrahimpichka/semantic-image-search/3x8j9k2l \
--categories cat dog car \
--samples-per-category 50