This repository presents a research-focused, end-to-end deep learning pipeline for object detection, using face mask classification as a case study. It aims to deepen understanding of:
- Manual implementation versus transfer learning approaches with ResNet34
- Effects of data characteristics (class imbalance, bounding box dimensions) on model performance
- Statistics-driven EDA, model evaluation, and interpretability techniques in PyTorch
- Introduction
- Dataset and EDA
- Model Architectures
- Manual ResNet34
- Pretrained ResNet34
- Training Protocol
- Evaluation and Metrics
- Visualization and Interpretability
- How to Run
- References & Credits
This project investigates the learning dynamics, generalization, and statistical properties of both manually built and pretrained deep convolutional models on an image dataset annotated for face mask detection. Key research questions include:
- How do hand-built and pretrained models compare in feature extraction and convergence?
- What dataset properties most strongly influence detection accuracy?
- Are advanced interpretability/visualization methods needed to understand misclassifications and model decisions?
- Face mask dataset composed of paired image (.png) and annotation (.xml) files.
- Scientific EDA includes:
- Dataset audit (file integrity, image-annotation parity)
- Quantitative statistics of image dimensions, aspect ratios, class distributions, and bounding box coverage
- Advanced visualization: violin plots, bounding box spatial heatmaps, class imbalance metrics
- Manual ResNet34: Complete from-scratch implementation for didactic purposes. Includes residual block definitions, custom initialization, and in-line explanations.
- Pretrained ResNet34: Utilizes
torchvision.models.resnet34pretrained on ImageNet and fine-tuned for the face mask dataset. - Both models evaluated with identical preprocessing, losses, and optimization schemes for a rigorous side-by-side benchmark.
- Stratified train/val/test splits for robust evaluation
- Standard data augmentation, normalization, and reproducible seed management
- Early stopping, checkpoint management, and live statistics collection
- Tracking of training and validation loss curves with statistical diagnostics (mean, variance, moving average)
- Accuracy, precision, recall, F1-score, and confusion matrices reported per epoch and on final test split
- Advanced: Per-class ROC/AUC, gradient flow visualization, and model explainability
- Grad-CAM and feature map visualization for model interpretability
- Error analysis: Identify hard samples, visualize misclassified/badly localized cases
- Comparison of feature representations using PCA/t-SNE plots
- Install requirements (Kaggle/Colab: pip install as outlined in the notebook).
- Download and organize the dataset as specified in the notebook.
- Run each notebook section sequentially, starting from EDA to model training and evaluation.
- Results, metrics, and plots are saved into
/results/and/plots/folders.
- He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep Residual Learning for Image Recognition. [CVPR 2016]
- Dataset and annotation: SandhyaKrishnan02
- TorchVision, PyTorch community for deep learning utilities
For academic and non-commercial use. See LICENSE file (if applicable).