This project aims to predict driver churn at Ola Cabs using historical driver data. By analyzing demographic, performance, and tenure attributes, the project builds predictive models to identify drivers likely to leave and support proactive retention strategies.
Ola faces high driver churn, which negatively impacts:
- Driver morale
- Customer experience
- Driver acquisition and training costs
This project seeks to:
- Identify key factors influencing driver departures
- Build a predictive model for driver attrition
- Provide data-driven insights for retention strategies
The project uses the dataset: ola_driver.csv
It contains monthly driver information for 2019 and 2020 with attributes grouped as follows:
-
Demographics:
- City, Age, Gender (Male: 0, Female: 1)
-
Tenure:
- Joining Date, Last Working Date
-
Performance:
- Quarterly Rating, Monthly Business Value, Grade, Income
-
Additional:
- Education Level, Joining Designation
The project consists of Python scripts that perform the following tasks:
-
Data Exploration & Cleaning
- Inspect dataset structure and characteristics
- Handle missing values using KNN imputation
-
Feature Engineering
- Aggregate driver data (e.g., income and rating growth)
- Encode categorical variables (one-hot encoding)
-
Data Balancing
- Address class imbalance in the churn variable
-
Modeling
- Implement Ensemble Learning (Bagging, Boosting)
- Apply hyperparameter tuning for optimization
-
Evaluation
- Generate classification reports
- Plot ROC-AUC curves
-
Insights
- Interpret results
- Provide actionable recommendations for reducing churn
- Experiment with deep learning approaches
- Incorporate real-time churn prediction pipelines
- Deploy model as an API service
notebooks/β Exploratory data analysis & model buildingscripts/β Python scripts for preprocessing, modeling, evaluationREADME.mdβ Project documentation
Contributions, issues, and feature requests are welcome!
Feel free to fork the repo and submit a pull request.
This project is licensed under the MIT License.