This project focuses on addressing the challenge of imbalanced data classification in predicting patient survival during hospital stays. The task is to accurately predict whether a patient will survive or succumb based on various diagnostic data and patient characteristics.
The primary goal of this project is to develop and evaluate machine learning models capable of predicting in-hospital mortality. By leveraging data from patient diagnoses and health-related features, we aim to create a predictive model that aids healthcare professionals in early decision-making.
The dataset consists of 80,000 patient records, with each patient described by 337 features. These features encompass a wide range of information, including:
Health status indicators (vital signs, lab results, diagnostic information, etc.) Given the imbalanced nature of the dataset, with far fewer instances of mortality, special attention will be paid to balancing techniques and model performance metrics tailored to handle this discrepancy.
Handling missing values, feature engineering, and dealing with class imbalance.
Building machine learning models (e.g., logistic regression, random forests, XGBoost, etc.).
Using metrics such as AUC-ROC, precision, recall, F1-score to evaluate the effectiveness of the models, particularly with imbalanced data.