GitHub - Devgan79/EndtoendDS: End-to-end data science project that demonstrates the full workflow from data preprocessing and exploratory analysis to model training, evaluation, and deployment with Docker and DVC. It includes reproducible code, modular Python packages, and interactive notebooks to help users understand and apply real-world machine

End-to-End Data Science Project

🧠 Overview

This project implements a complete end-to-end data science workflow, from data ingestion and processing to model training, evaluation, and deployment. It includes:

✔ Version control with DVC
✔ Data storage & notebook experimentation
✔ Production Python modules under src/
✔ A web or CLI interface via app.py / main.py
✔ A Docker container for reproducible builds

📂 Project Structure

Folder / File	Description
`.dvc/`	DVC versioning for datasets & models
`Dataset/`	Raw and processed data storage
`Notebooks/`	Jupyter notebooks for experimentation & EDA
`catboost_info/`	CatBoost training metadata and logs
`src/`	Python modules containing core pipeline logic
`Dockerfile`	Instructions for containerizing the application
`app.py`	Application entry point (API / UI interface)
`main.py`	Main script for training and running the ML pipeline
`requirements.txt`	List of required Python dependencies
`setup.py`	Project packaging and installation configuration
`template.py`	Utility or base template code
`readme.md`	Project documentation file

Installation

Clone the repo
git clone https://github.com/Devgan79/EndtoendDS.git
cd EndtoendDS

Install packages

pip install -r requirements.txt

Exploratory Data Analysis

Under Notebooks/ you will find step-by-step experimentation such as:

✔ Data visualization
✔ Feature engineering
✔ Model evaluation

Model Training & Evaluation

A typical pipeline steps through:

✔ Data loading & cleaning (from Dataset folder)
✔ Feature engineering
✔ Training model (e.g., CatBoost, sklearn, etc.)
✔ Evaluating metrics
✔ Saving outputs to model folders

will be updated

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

End-to-End Data Science Project

📂 Project Structure

Installation

Install packages

Exploratory Data Analysis

Model Training & Evaluation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
.dvc		.dvc
Dataset		Dataset
Notebooks		Notebooks
catboost_info		catboost_info
endtoendDS.egg-info		endtoendDS.egg-info
src		src
.dvcignore		.dvcignore
.gitignore		.gitignore
Dockerfile		Dockerfile
app.py		app.py
readme.md		readme.md
requirements.txt		requirements.txt
setup.py		setup.py
template.py		template.py

Devgan79/EndtoendDS

Folders and files

Latest commit

History

Repository files navigation

End-to-End Data Science Project

📂 Project Structure

Installation

Install packages

Exploratory Data Analysis

Model Training & Evaluation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages