🧠 Overview
This project implements a complete end-to-end data science workflow, from data ingestion and processing to model training, evaluation, and deployment. It includes:
✔ Version control with DVC
✔ Data storage & notebook experimentation
✔ Production Python modules under src/
✔ A web or CLI interface via app.py / main.py
✔ A Docker container for reproducible builds
| Folder / File | Description |
|---|---|
.dvc/ |
DVC versioning for datasets & models |
Dataset/ |
Raw and processed data storage |
Notebooks/ |
Jupyter notebooks for experimentation & EDA |
catboost_info/ |
CatBoost training metadata and logs |
src/ |
Python modules containing core pipeline logic |
Dockerfile |
Instructions for containerizing the application |
app.py |
Application entry point (API / UI interface) |
main.py |
Main script for training and running the ML pipeline |
requirements.txt |
List of required Python dependencies |
setup.py |
Project packaging and installation configuration |
template.py |
Utility or base template code |
readme.md |
Project documentation file |
Clone the repo
git clone https://github.com/Devgan79/EndtoendDS.git
cd EndtoendDS
pip install -r requirements.txt
Under Notebooks/ you will find step-by-step experimentation such as:
✔ Data visualization
✔ Feature engineering
✔ Model evaluation
A typical pipeline steps through:
✔ Data loading & cleaning (from Dataset folder)
✔ Feature engineering
✔ Training model (e.g., CatBoost, sklearn, etc.)
✔ Evaluating metrics
✔ Saving outputs to model folders
will be updated