Skip to content

End-to-end data science project that demonstrates the full workflow from data preprocessing and exploratory analysis to model training, evaluation, and deployment with Docker and DVC. It includes reproducible code, modular Python packages, and interactive notebooks to help users understand and apply real-world machine

Notifications You must be signed in to change notification settings

Devgan79/EndtoendDS

Repository files navigation

End-to-End Data Science Project


🧠 Overview

This project implements a complete end-to-end data science workflow, from data ingestion and processing to model training, evaluation, and deployment. It includes:

✔ Version control with DVC
✔ Data storage & notebook experimentation
✔ Production Python modules under src/
✔ A web or CLI interface via app.py / main.py
✔ A Docker container for reproducible builds

📂 Project Structure

Folder / File Description
.dvc/ DVC versioning for datasets & models
Dataset/ Raw and processed data storage
Notebooks/ Jupyter notebooks for experimentation & EDA
catboost_info/ CatBoost training metadata and logs
src/ Python modules containing core pipeline logic
Dockerfile Instructions for containerizing the application
app.py Application entry point (API / UI interface)
main.py Main script for training and running the ML pipeline
requirements.txt List of required Python dependencies
setup.py Project packaging and installation configuration
template.py Utility or base template code
readme.md Project documentation file

Installation


Clone the repo
git clone https://github.com/Devgan79/EndtoendDS.git
cd EndtoendDS

Install packages


pip install -r requirements.txt

Exploratory Data Analysis


Under Notebooks/ you will find step-by-step experimentation such as:

✔ Data visualization
✔ Feature engineering
✔ Model evaluation

Model Training & Evaluation


A typical pipeline steps through:

✔ Data loading & cleaning (from Dataset folder)
✔ Feature engineering
✔ Training model (e.g., CatBoost, sklearn, etc.)
✔ Evaluating metrics
✔ Saving outputs to model folders


will be updated

About

End-to-end data science project that demonstrates the full workflow from data preprocessing and exploratory analysis to model training, evaluation, and deployment with Docker and DVC. It includes reproducible code, modular Python packages, and interactive notebooks to help users understand and apply real-world machine

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors