Welcome to the Diabetes Prediction project! π The goal of this project is to predict whether a person has diabetes using various machine learning algorithms. We focus on applying data cleaning, visualization, and modeling techniques to build accurate prediction models.
- π§Ή Clean the data to ensure high-quality inputs.
- π Visualize the data to better understand patterns and correlations.
- π€ Train machine learning models to predict diabetes.
- π‘ Evaluate model performance using several evaluation metrics.
- Data Cleaning π§Ό: Removing missing values, handling outliers, and preparing the data for modeling.
- Data Visualization π: Analyzing and visualizing the data to understand patterns and trends.
- Machine Learning Modeling π€: Training multiple machine learning models to predict diabetes.
- Logistic Regression π§βπΌ
- Support Vector Machine (SVM) π²
- K-Nearest Neighbors (KNN) π
- Random Forest Classifier π³
- Naive Bayes π§βπ¬
- Gradient Boosting π₯
- Accuracy Score β : Measures how often the model makes correct predictions.
- ROC AUC Curve π: Evaluates the trade-off between true positive rate and false positive rate.
- Cross-Validation π: Splitting the data into different subsets to ensure the model performs well on unseen data.
- Confusion Matrix π: Provides a breakdown of prediction errors, including false positives, false negatives, true positives, and true negatives.
To run this project, you will need the following libraries:
pandasπnumpyπ’matplotlibπseabornπ¨scikit-learnπ§βπ»
You can install the dependencies by running:
pip install pandas numpy matplotlib seaborn scikit-learn