FlowQ

This repository contains code for MLOps experiments using DVC for data tracking and MLflow for model management.

What's MLOps

Building machine learning systems at scale requires more than a notebook and a saved model. You must track parameters, monitor performance, and maintain quality over time. MLOps defines practices that make this process efficient, such as running data processing, training, and evaluation pipelines. Each step contributes a distinct and essential role.

The development of an ML system is cyclical, with frequent iteration between steps (source: ML Systems – Chip Huyen).

DVC

DVC tracks every version of your data. In this project, datasets like data_00 and data_01 are combined into combined_data, followed by a preprocessed version of the Yahoo Questions dataset.

MLflow

MLflow tracks all experiments, making it possible to review metrics, parameters, and model versions through its UI.

NLP

Exploratory Data Analysis

The initial data exploration included identifying unusual characters, emojis, and missing values. This step helps ensure the dataset is well understood before modeling.

Modeling

The first model used was MultinomialNB, a strong baseline for text classification. Before modeling, the project followed standard NLP steps. Class distribution analysis showed no significant imbalance.

data cleaning
tokenization
lemmatization or stemming

Improving data quality led to better results. The final output included a classification report similar to the one below:

The model was built with scikit-learn, and hyperparameters were tuned using GridSearch. Tuning was kept minimal, focusing instead on data quality rather than intensive model tweaking.

Citation

@misc{Carlos2025FlowQ,
  author = {Lima, Carlos},
  title = {FlowQ},
  year = {2025},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/CllsPy/FlowQ}},
}

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
.dvc		.dvc
.github/workflows		.github/workflows
data		data
experiments		experiments
model		model
src		src
tests		tests
.dvcignore		.dvcignore
.env		.env
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FlowQ

What's MLOps

DVC

MLflow

NLP

Exploratory Data Analysis

Modeling

Citation

About

Uh oh!

Releases

Packages

Languages

CllsPy/FlowQ

Folders and files

Latest commit

History

Repository files navigation

FlowQ

What's MLOps

DVC

MLflow

NLP

Exploratory Data Analysis

Modeling

Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages