Skip to content

This project explores the development of a machine learning model for classifying hand-drawn images and their digital equivalents.

Notifications You must be signed in to change notification settings

xVarmondx/computer-vision-research

Repository files navigation

👁️‍🗨️ Hand-Drawn Image Classification (ResNet-18 + XAI)

Modular Computer Vision Pipeline for symbol classification, with advanced Grad-CAM analysis.

PyTorch Python scikit-learn

1. About the Project

This project implements a Computer Vision (CV) pipeline designed to build and evaluate a model capable of classifying 10 different graphical symbols (e.g., anchor, bicycle, spiral) based on a dataset containing both hand-drawn and digital (stamp) images.

Main Objective: Achieve high classification accuracy and conduct an in-depth Explainable AI (XAI) analysis to understand why the model sometimes makes mistakes, especially when dealing with heterogeneous data (Hand-Drawn vs. Stamp).

This project demonstrates:

  • Implementation of Transfer Learning using the ResNet-18 architecture.
  • Development of a fully modular ML pipeline (aligned with best software engineering practices).
  • Application of Explainable AI (XAI) tools, particularly Grad-CAM, to diagnose and interpret model errors.

2. Key Technologies and Methodologies

Understanding the core concepts is essential for interpreting the results of this project.

🧠 What is Transfer Learning (ResNet-18)?

Transfer Learning is a technique where a model trained for one task (e.g., recognizing millions of objects in ImageNet) is reused as a starting point for a new task (e.g., classifying 10 simple symbols).

Benefit: Instead of learning from scratch, the model leverages its existing knowledge about edges, shapes, and textures. This makes training faster and more efficient, especially on small datasets.

Architecture: ResNet-18 is a deep convolutional neural network (CNN) known for its efficiency and the use of residual connections, which facilitate gradient flow during training and help prevent vanishing gradients.


🔍 What is Grad-CAM (XAI)?

Grad-CAM (Gradient-weighted Class Activation Mapping) is an Explainable AI (XAI) technique that helps answer the question: “What is the model looking at when making a decision?”

How it works: Grad-CAM generates a heatmap over the input image. High-intensity areas (e.g., red/yellow) indicate regions that had the most influence on the final classification.

Application in the project: We use Grad-CAM to analyze why the model confuses a smiley symbol with a spiral—whether it focuses on the outline or the internal elements.


3. Architektura Systemu (Flow Diagram)

graph TD
    A[Main Script: run_pipeline.py] --> B(1. Load Data);
    A --> C(2. Create Model);
    A --> D(3. Training and Validation);
    A --> E(4. Evaluation and Reports);

    B --> F(src/data_loader.py);
    C --> G(src/model.py);
    D --> H(src/train.py);
    E --> I(src/evaluate.py);
    
    H -- Saves Artifacts --> J[best_model_weights.pth / train_history.json];
    J --> E;
    
    style A fill:#f9f,stroke:#333
    style E fill:#ccf,stroke:#333
Loading
flowchart TD
    subgraph Data Loading
        A[1. Raw JPG Images] --> B(2. Metadata Extraction / data_loader.py);
        B --> C(3. Pandas DataFrame);
    end
    
    subgraph Preprocessing
        C --> D{4. Data Split: Train, Val, Test};
        D --> E(5. Preprocessing & Augmentation);
    end
    
    subgraph Model Training
        E -- Train/Val Loaders --> F(6. ResNet-18 Model);
        F --> G(7. Optimization and Learning);
        G --> H(8. Artifacts: weights.pth);
    end
    
    subgraph Evaluation
        H --> I(9. Test DataLoader);
        I --> J(10. Final Classification);
        J --> K(11. Final Reports: Confusion Matrix);
        J --> L(12. XAI Analysis: Grad-CAM);
    end

    style F fill:#ADD8E6, stroke:#333
    style K fill:#C9F4C9, stroke:#333
    style L fill:#FFEAA7, stroke:#333
Loading

4. Run Instructions (Step by Step)

Step 1: Download and Installation

# 1. Clone the repository
git clone https://github.com/xVarmondx/computer-vision-research

# 2. Navigate to the project directory
cd computer-vision-research

# 3. Create and activate a virtual environment
python -m venv .venv 

# 4. Activate the environment (for Windows):
.venv\Scripts\activate

# 4. Activate the environment (for macOS/Linux):
source .venv/bin/activate

# 5. Install all dependencies (including PyTorch, torchvision, seaborn)
pip install -r requirements.txt

Step 2: Run pipeline

# While in the main folder (computer-vision-research):
# Use this command to train the model and save new weights
python -m src.run_pipeline --force-train

Step 5: Visual Analysis (Jupyter Notebook)

After the pipeline has been run at least once (and the best_model_weights.pth and train_history.json files exist), you can run the visual presentation.

This notebook is designed to load the trained model and artifacts to perform a deep-dive analysis, including EDA, metric visualization, and Grad-CAM.

1. Start the Jupyter Server

In your terminal (from the project's root directory), run:

jupyter notebook

If your browser does not open automatically, the console will display an address to copy.

http://localhost:8888/?token=... (a very long string of characters)
  1. Copy one of these links — the entire URL, including ?token=...
  2. Paste it into your browser's address bar and press Enter.

This will open the Jupyter dashboard. From there, you can click the src folder, and then open the file presentation_notebook.ipynb.

About

This project explores the development of a machine learning model for classifying hand-drawn images and their digital equivalents.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published