A toolset for collaborative development and reproducible results in data science and machine learning projects.
Make sure you have uv installed.
For example via pip:
pip install uv
You can initialize a project from the command line.
Just replace my_new_project with the name of the folder that should be created for the project.
uvx copier copy --trust gh:Excidion/reproML my_new_project
You wil then be guided through a short questionaire. Depending on your choices, it will generate a structure that looks something like this:
├── data <- All data files belong into one of this folders subfolder
│ ├── raw <- The original, unedited data dump
│ ├── interim <- Intermediate data that has been or is being transformed
│ └── processed <- The data sets used for modeling
│
├── docs <- Project documentation
│ ├── index.md <- Landing page, describe the project and team.
│ ├── context.md <- Document context and goals.
│ ├── model.md <- Document modeling from data to ML.
│ ├── notebooks/ <- Your most polished notebooks, integrated into the docs
│ ├── code/ <- Automatically generated code documentation
│ └── structure.md <- Document tools and technical organization.
│
├── models <- Trained and serialized models and other artifacts
│ └── logs <- Logfiles from training and prediction
│
├── notebooks <- Jupyter notebooks
│
├── references <- Data dictionaries, manuals, and helper materials.
│
├── reports <- Generated analysis as HTML, PDF, etc.
│ └── figures <- Generated graphics and figures to be used in reports
│
├── src <- Source code for use in this project.
│ ├── data <- Scripts to download, process or generate data
│ ├── features <- Functions to turn data into features
│ ├── model <- Scripts for training and prediction
│ └── visualization <- Scripts to create visualizations
│
├── pyproject.toml <- Project configuration and dependencies.
│
└── README.md <- The top-level README for developers using this project.