This project aims to predict customer churn using a Random Forest Classifier, helping businesses identify customers who are likely to leave their service. By understanding these patterns, companies can take proactive actions to retain valuable customers and reduce churn.
The dataset used includes customer demographics, account details, and usage data, such as age, gender, balance, estimated salary, and whether the customer has churned. The goal is to use this data to predict if a customer will churn (1) or stay (0).
-
Data Preprocessing:
- Unnecessary columns, such as
CustomerId,Surname, andRowNumber, are removed. - Categorical features like
GenderandGeographyare encoded using label encoding. - A new feature,
Balance_salary_ratio, is created by dividing the customer balance by their estimated salary. - The dataset is then split into training and testing sets to train and evaluate the model.
- Unnecessary columns, such as
-
Model Training:
- A Random Forest Classifier is chosen for its ability to handle both numerical and categorical data effectively. The classifier is trained on the processed data to learn patterns related to customer churn.
-
Model Evaluation:
- The model is evaluated on the test set using various performance metrics like accuracy, precision, recall, and F1 score to measure its effectiveness in predicting churn.
-
Prediction:
- Once trained, the model predicts whether a customer is likely to churn, providing businesses with actionable insights on customer retention.
- Pandas for data manipulation.
- Seaborn/Matplotlib for data visualization.
- Scikit-learn for machine learning model building, training, and evaluation.
- Clone or download this repository.
- Install the required libraries:
pip install pandas seaborn matplotlib scikit-learn
- Make sure you have your dataset (e.g., customer_churn_data.csv) in the correct directory.
- Run the script
python churn_prediction.py
5.Check the output for model evaluation and predictions.
This project demonstrates how a Random Forest Classifier can be used to predict customer churn based on demographic and account data. By identifying customers at risk of leaving, businesses can take proactive actions to retain valuable customers.
Future enhancements could include:
- Hyperparameter tuning to optimize model performance.
- Exploring other machine learning models (e.g., Gradient Boosting, XGBoost) for better accuracy.
- Expanding the dataset to include more features and improve predictions.
This project provides a foundational approach to customer retention and predictive analytics using machine learning.