This project focuses on predicting the likelihood of heart disease using Machine Learning models. The dataset contains various health-related attributes (such as age, cholesterol, blood pressure, etc.), and the goal is to build an accurate classifier that can assist in early detection of heart-related issues.
Preprocessing of the dataset (handling missing values, scaling, and encoding).
Implementation of multiple ML models:
Logistic Regression
Random Forest
Gradient Boosting
Model comparison based on Accuracy, Precision, Recall, F1-score, and ROC-AUC.
Visualization of results using:
Confusion Matrix
ROC Curve
Best performing model is saved for future predictions.
Predictions are exported into a CSV file.
0 → No Heart Disease
1 → Presence of Heart Disease
Python
Jupyter Notebook
Libraries: pandas, numpy, matplotlib, seaborn, scikit-learn, joblib
The Random Forest Classifier achieved the best performance with 100% accuracy on the test set (in this dataset).
Generated plots: Confusion Matrix & ROC Curve.
Hyperparameter tuning for more robust models.
Deploying the model using Flask / FastAPI.
Building an interactive web app with Streamlit.