Skip to content

tamobach/credit-risk-ML

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Risky Business

Credit Risk

Background

In order to mitigate risk, I built and evaluated several machine-learning models to predict credit risk using free data from LendingClub. I employed the imbalanced-learn and Scikit-learn libraries to build and evaluate models using the two following techniques:

  1. Resampling
  2. Ensemble Learning

Files


Resampling

For this approach, I used the imbalanced learn library to resample the LendingClub data; built and evaluated logistic regression classifiers using the resampled data.
Refer to: Resampling Notebook

Conclusion

  1. Which model had the best balanced accuracy score?

    SMOTEENN had the best balanced accuracy score of 0.7975462408998795
    versus 0.7752245065690078; 0.7966770207605626; 0.7856360112968401
    for Cluster Centroids, SMOTE, and Random Oversampler respectively.

  2. Which model had the best recall score?

    SMOTE had the best recall score: 0.88.

  3. Which model had the best geometric mean score?

    SMOTEENN had the best geometric mean score: 0.79.


Ensemble Learning

For this method, I trained and compared two different ensemble classifiers to predict loan risk and evaluate each model. I used the Balanced Random Forest Classifier and the Easy Ensemble Classifier. For the ensemble learners, I used 100 estimators (n_estimators=100) for both models.
Refer to: Ensemble Notebook

Conclusion

  1. Which model had the best balanced accuracy score?

    Easy Ensemble Classifier had the best balanced accuracy score: 0.931601605553446
    versus 0.7855345052746622 for Balanced Random Forest Classifier.

  2. Which model had the best recall score?

    Easy Ensemble Classifier had the best recall score: 0.94 versus
    0.90for Balanced Random Forest Classifier.

  3. Which model had the best geometric mean score?

    Easy Ensemble Classifier had the best geometric mean score: 0.93 versus
    0.78 for Balanced Random Forest Classifier.

  4. What are the top three features?

    Top three features are the following: (0.09175752102205247, 'total_rec_prncp'), (0.06410003199501778, 'total_pymnt_inv'), (0.05764917485461809, 'total_pymnt')

About

Employing machine learning algorithms for training, building, and evaluating logistic regression and ensemble classifiers with imbalanced data.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors