Ever looked at your lifestyle and wondered “Am I cooked?”
Well, now there’s a web app that can tell you based on ML model trained with 96% accuracy.
This project combines a machine learning model for predicting 10-year Coronary Heart Disease (CHD) risk with a user-friendly Flask web application and a little bit of dark humor.
This application uses the Framingham Heart Study dataset to predict the risk of CHD. It’s a blend of data preprocessing, model training, and front-end flavor all served with some cheeky personality. You input your health data, and it tells you if your heart’s chilling or on thin ice (figuratively, of course).
The improved best_model achieves an accuracy of 96% in predictions.
The dataset, framingham.csv, is sourced from Kaggle. It includes the following features:
male: Male (1) or Female (0)age: Age of the patientcurrentSmoker: Currently smoking (1 = yes, 0 = no)cigsPerDay: Avg. cigarettes smoked per dayBPMeds: On blood pressure medicationprevalentStroke: History of strokeprevalentHyp: Hypertensiondiabetes: Diabetes statustotChol: Total cholesterolsysBP: Systolic BPdiaBP: Diastolic BPBMI: Body Mass IndexheartRate: Heart rateglucose: Glucose leveleducation: Education levelTenYearCHD: Target variable (1 = CHD in 10 years, 0 = no)
Flask: Web frameworknumpy: Numerical computationpandas: Data manipulationscikit-learn: Machine learningxgboost: Gradient boostingmatplotlib: For visualizations (if needed)pickle: For model serialization
All dependencies can be installed using the requirements.txt.
git clone https://github.com/yourusername/am-i-cooked.git
cd am-i-cooked
Over the past week, I’ve learned several key lessons about model behavior in small datasets like here. Even with 4240 rows, dataset size can be “small” relative to feature complexity, making models more sensitive to noise, imbalance, and random splits. Unrealistic or extreme test inputs often lead to unreliable predictions, while realistic values produce more stable results. I also noticed inconsistent outputs between runs, which was due to random splits and the absence of a fixed random seed. Dataset imbalance further skews results, requiring strategies like stratified splitting or resampling. Additionally, I confirmed that saving models correctly (via pickle) and applying preprocessing steps, such as scaling, is good practice these steps typically improve model stability and performance without harming accuracy. Overall, I moved from 50% to 98% accuracy while avoiding overfitting, and now understand the critical role of balanced, realistic data, reproducibility, and preprocessing in building reliable machine learning pipelines.
This project is dedicated to the spirit of open knowledge.
If the world had no open-source code, no freely shared ideas, I wouldn’t be here learning, building, and sharing this today.
Not everything worth knowing should be locked behind a paywall, curiosity should never depend on privilege.
To everyone who contributes, teaches, or shares without asking for anything in return you’ve given more than you’ll ever know.
Created and maintained by
Aaditya Yadav
