- Project Overview
- What are Brain Tumors?
- Types of Tumors Used
- Dataset
- Tech Stack
- Project Structure
- Modular Pipeline & CI/CD
- Model Architecture
- Training & Evaluation
- Visualizations
- How to Run
- Results
- Web App
- Contact
-## Live Demo π Try the Live Brain Tumor MRI Classifier Here Alternate Link
This project leverages deep learning to classify brain tumors from MRI scans. It uses a modular, production-grade pipeline with DVC, CI/CD, and a modern tech stack. The goal is to assist radiologists and clinicians in early and accurate tumor detection, improving patient outcomes.
A brain tumor is an abnormal growth of cells within the brain or the central spinal canal. Tumors can be benign (non-cancerous) or malignant (cancerous). Early detection is crucial for effective treatment. There are many rumors and misconceptions about brain tumors, but this project focuses on scientific, data-driven classification.
This project classifies the following tumor types (as per the dataset):
- Glioma: Tumors that originate from glial cells in the brain or spine.
- Meningioma: Tumors that arise from the meninges, the membranes that surround the brain and spinal cord.
- Pituitary Tumor: Tumors that develop in the pituitary gland.
- No Tumor: MRI scans with no evidence of tumor.
- Source: Kaggle - Brain Tumor MRI Dataset
- Structure: Images are organized into folders by tumor type and split into Training/Testing sets.
- Sample Data:
- Python 3.10+
- TensorFlow / Keras
- scikit-learn
- Pandas, NumPy, Matplotlib, Seaborn
- DVC (Data Version Control)
- Flask (for web app)
- CI/CD: GitHub Actions
- Jupyter Notebooks for research and prototyping
- KaggleHub for dataset download
Brain-Tumor-MRI-Classification/
β
βββ artifacts/ # All generated artifacts (data, models, logs)
β βββ data_ingestion/ # Downloaded and processed data
β βββ prepare_base_model/ # Saved base models
β βββ prepare_callbacks/ # Checkpoints and TensorBoard logs
β βββ training/ # Final trained models
β
βββ config/ # YAML configuration files for all stages
β βββ config.yaml # Main config for paths and pipeline
β βββ ...
β
βββ research/ # Jupyter notebooks for experiments, EDA, prototyping
β βββ 01_data_ingestion.ipynb
β βββ 02_prepare_base_model.ipynb
β βββ 03_prepare_callbacks.ipynb
β βββ 04_model_training.ipynb
β βββ 05_model_evaluation.ipynb
β βββ trials.ipynb
β
βββ src/brainTumorMRIClassification/
β βββ components/ # Modular pipeline components
β β βββ data_ingestion.py # Download and extract data
β β βββ prepare_base_model.py # Build and save model
β β βββ prepare_callback.py # Callbacks for training
β β βββ training.py # Training logic
β β βββ evaluation.py # Model evaluation
β βββ config/ # Configuration management (ConfigurationManager)
β βββ constants/ # Project-wide constants (paths, etc.)
β βββ entity/ # Data classes for configs (using @dataclass)
β βββ pipeline/ # Pipeline scripts for each stage (stage_01_data_ingestion.py, ...)
β βββ utils/ # Utility functions (YAML, logging, etc.)
β βββ __init__.py # Logger setup
β
βββ templates/ # Web app HTML templates (index.html)
βββ dvc.yaml # DVC pipeline definition (all stages, params, outs, metrics)
βββ params.yaml # Model and training parameters (batch size, epochs, etc.)
βββ requirements.txt # Python dependencies
βββ main.py # Orchestrates the full pipeline (runs all stages)
βββ setup.py # Package setup for pip install
βββ README.md # Project documentation
- Separation of Concerns: Each pipeline stage is a separate script and component, making the codebase easy to maintain and extend.
- Entity-Driven Configs: All configuration is handled via dataclasses and YAML, making experiments reproducible and parameter changes easy.
- Research Notebooks: All experiments, EDA, and visualizations are versioned in the
research/folder for transparency. - Production-Ready: The
src/folder is organized for scalable, testable, and production-grade ML code. - Web App Ready: The
templates/folder and Flask app allow for easy deployment as a web service. - CI/CD & DVC: The project is ready for continuous integration and reproducible ML with DVC and GitHub Actions.
The pipeline is split into four main stages, each with its own script, config, and component:
-
Data Ingestion (
stage_01_data_ingestion.py)- Downloads the dataset from Kaggle using KaggleHub
- Extracts and organizes data into train/test folders
- Logs all actions for traceability
-
Prepare Base Model (
stage_02_prepare_base_model.py)- Builds a custom CNN using Keras Sequential API
- Saves the base model for reproducibility
- All hyperparameters are configurable via YAML
-
Training (
stage_03_training.py)- Loads the base model and prepares data generators with augmentation
- Uses modular callbacks (EarlyStopping, ReduceLROnPlateau)
- Trains the model and saves the best checkpoint
- Logs training progress and metrics
-
Evaluation (
stage_04_evaluation.py)- Loads the trained model and evaluates on the validation/test set
- Saves metrics (loss, accuracy) to
scores.jsonfor DVC tracking - Supports confusion matrix and advanced metrics
Each stage is fully decoupled and can be run independently or as part of the full pipeline via main.py or DVC.
- TensorFlow/Keras: For deep learning model building and training
- scikit-learn: For data shuffling, metrics, and utility functions
- DVC: For data and model versioning, pipeline orchestration, and experiment tracking
- Flask: For serving the model as a web app
- PyYAML & python-box: For robust config management
- Logging: Centralized logging to both file and console for all stages
- CI/CD: GitHub Actions for automated testing and deployment
- KaggleHub: For seamless dataset download from Kaggle
- Jupyter Notebooks: For EDA, prototyping, and visualization
- Modular OOP: All pipeline logic is encapsulated in classes for reusability
- Entity-Driven Design: All configs are strongly typed using Python dataclasses
- Modular Coding: Each pipeline stage (data ingestion, model prep, training, evaluation) is a separate, reusable component.
- DVC: Ensures reproducibility and versioning of data, models, and experiments.
- CI/CD: Automated testing and deployment using GitHub Actions.
- Logging: All stages log to both console and file for traceability.
Click to see DVC Pipeline
stages:
data_ingestion:
cmd: python3 src/brainTumorMRIClassification/pipeline/stage_01_data_ingestion.py
outs: [artifacts/data_ingestion/brain-mri]
prepare_base_model:
cmd: python3 src/brainTumorMRIClassification/pipeline/stage_02_prepare_base_model.py
outs: [artifacts/prepare_base_model]
training:
cmd: python3 src/brainTumorMRIClassification/pipeline/stage_03_training.py
outs: [artifacts/training/brain_model.h5]
evaluation:
cmd: python3 src/brainTumorMRIClassification/pipeline/stage_04_evaluation.py
metrics: [scores.json]
Custom CNN architecture for brain tumor classification
Layers:
- 4 Convolutional layers (with ReLU, MaxPooling)
- Flatten, Dense(512, relu), Dropout(0.4)
- Output: Dense(4, softmax)
Optimizer: Adam (lr=0.001, beta_1=0.869, beta_2=0.995)
- Augmentation: Rotation, brightness, shift, shear, flip, etc.
- EarlyStopping and ReduceLROnPlateau callbacks
- Batch size: 32, Epochs: 40
- Validation split: 20-30%
True values vs. model predictions
Train vs Validation Loss over epochs
Model Architecture Visualization
- Clone the repo:
git clone https://github.com/lakshitcodes/Brain-Tumor-MRI-Classification.git cd Brain-Tumor-MRI-Classification - Install dependencies:
pip install -r requirements.txt
- Configure Kaggle API (for dataset download):
- Place your
kaggle.jsonin the appropriate location.
- Place your
- Run the pipeline:
dvc repro # or run main.py for all stages python main.py - Launch the web app:
python app.py
- Accuracy: Achieved high accuracy on the test set (see confusion matrix and metrics above).
- Robustness: Model generalizes well to unseen MRI scans.
- Reproducibility: All experiments are tracked and reproducible via DVC.
Upload MRI scans and get instant predictions!
- Author: Lakshit Jain
- LinkedIn: Lakshit Jain
- GitHub: lakshitcodes


