A deep learning system built to classify dog breeds using convolutional neural networks (CNNs). The project explores transfer learning, data augmentation, and model interpretability techniques.
This project tackles the real-world challenge of fine-grained visual classification with limited labeled data. Starting with a CNN trained from scratch, the system evolves through transfer learning from multi-class breed recognition, and further benefits from a custom-designed data augmentation pipeline. A variety of architectural designs and training optimizations were tested to maximize generalization performance.
Key components include:
- Custom CNN architecture with multiple convolutional and pooling layers
- Transfer learning from a multi-breed classifier to a binary classifier
- Automated data augmentation system (rotation, grayscale, etc.)
- Grad-CAM visualizations for model interpretability
- AUROC-based model evaluation and early stopping
- Training/inference automation with GPU offloading and batch size tuning
- CNN Design: Designed and trained convolutional neural networks from scratch, starting with a baseline 3-layer CNN (Conv -> Pool -> Conv -> Pool -> Conv -> FC) and iteratively improving performance through deeper and wider architectures. Explored the impact of increasing filter counts, adjusting layer configurations, and fine-tuning versus freezing layers in transfer learning.
- Transfer Learning: Leveraged knowledge from a source classifier trained on 8 other breeds to enhance binary classification on Collies vs. Golden Retrievers.
- Data Augmentation: Implemented rotation, grayscale transformations, and custom augmentation combinations to improve generalization.
- Model Evaluation: Early stopping based on validation loss, AUROC as primary evaluation metric, and comprehensive training/test curve analysis.
- Interpretability: Applied Grad-CAM to visualize what features the model focuses on during classification.
- Workflow Automation: Developed a fully modular and GPU-optimized training pipeline that streamlines challenge predictions and evaluation.
The final model achieved a significant performance boost on held-out test data through a combination of transfer learning and carefully tuned augmentation strategies.
| Model Variant | Train AUROC | Val AUROC | Test AUROC |
|---|---|---|---|
| CNN (from scratch) | 0.9793 | 0.9308 | 0.6552 |
| Transfer Learning (FC layer only) | 0.8732 | 0.8782 | 0.8776 |
| Grayscale Augmentation Only | 0.8844 | 0.7929 | 0.7776 |
| Rotation + Grayscale Augmentation | 0.9764 | 0.9198 | 0.7260 |
Grad-CAM confirmed early hypotheses that background elements (like grass) were driving predictions. Augmentation helped the model shift its focus toward more meaningful features.
Dozens of configurations were tested and benchmarked. Highlights include:
- Model depth vs. width tradeoffs
- Filter scaling and receptive field tuning
- Batch size scaling for training stability
- Custom learning rate schedules
- Modular architecture with script files for flexible experimentation
- GPU memory usage benchmarking for model variants
- Transfer learning can dramatically improve performance even on a binary classification task.
- Backgrounds in training images can bias CNNs - visualizations and data augmentation are key to overcoming this.
- Grayscale augmentation, despite reducing color variance, forced the model to focus on shape and structure, improving test generalization.
- With a modular pipeline and thoughtful experimentation, significant gains are possible even with limited data.
The entire pipeline for model training, evaluation, and challenge prediction is wrapped into a single customizable script, train_transfer_learning_custom.py, enabling rapid iteration and experimentation.
Built with PyTorch, pandas, and scikit-learn.