Repository Overview

This repository contains various scripts and notebooks for data understanding, processing, and modeling, as well as for managing the MLflow and Jupyter container setups. Below is a detailed description of each component.

Data Understanding

Exploratory Data Analysis (EDA):
- Refer to the eda.ipynb notebook for initial data exploration and understanding.

Data Processing

Scripts:
- data_processing.py: Script for general data processing tasks.
- external_data.py: Script for handling external data sources.

Repository Structure

The repository is organized into three main folders, each serving a different purpose:

Modeling:
- Imputation and Encoding:
  - train_script.py: Contains functions for data imputation and encoding, as well as the complete training pipeline.
- Clustering:
  - clustering.ipynb: Notebook dedicated to clustering tasks.
  - train_script.py: Also used for clustering purposes.
- Bootstrapping:
  - bootstrap.py: Script for bootstrapping.
  - cluster_evaluation.py: Contains bootstrapping logic for cluster evaluation and the assessment of clustering results.
- Baseline:
  - baseline_boot.py: Contains the baseline with the bootstrap
MLflow:
- This folder contains all files and configurations related to MLflow, which is used for tracking experiments and managing model deployments.
Jupyter Container:
- Contains the setup and configuration files necessary for running the Jupyter notebook environment within a container.

Parameter Management

JSON Files:
- These files store parameters for various algorithms. Parameters can be modified as needed for different runs.
- Running the Scripts:
  - The train_script.py can be executed with different arguments to run various experiments. An example of how to run this script with different parameters is provided within the file.

Pipeline Overview

The train_script.py encapsulates the entire pipeline from data processing to model training and evaluation.

Name		Name	Last commit message	Last commit date
Latest commit History 129 Commits
jupyter		jupyter
mlflow		mlflow
modeling		modeling
Dockerfile		Dockerfile
README.md		README.md
data_imputation.py		data_imputation.py
data_processing.py		data_processing.py
db_connection.py		db_connection.py
docker-run.sh		docker-run.sh
eda.ipynb		eda.ipynb
encoding.py		encoding.py
external_data.py		external_data.py
requirements.txt		requirements.txt
run-mlflow.sh		run-mlflow.sh
run-ssh.sh		run-ssh.sh
run-vm.sh		run-vm.sh
testing.py		testing.py
vm-to-pc.sh		vm-to-pc.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Repository Overview

Data Understanding

Data Processing

Repository Structure

Parameter Management

Pipeline Overview

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 5

Uh oh!

Languages

brunovdias/LCED

Folders and files

Latest commit

History

Repository files navigation

Repository Overview

Data Understanding

Data Processing

Repository Structure

Parameter Management

Pipeline Overview

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 5

Uh oh!

Languages

Packages