Skip to content

Dmvinedata/Capstone

Repository files navigation

Finding Homes for the Pet Homies

Predicting Animal Adoptions with Binary Classification

Data Scientist: Deztany Jackson

I am partnering with Malaysia's Ministry of Tourism to refine their country's pet adoption process. "Will a stray be adopted or not?" is the driving question. An advanced decision tree model is used as the binary classification model predictor. This model has ~81% (precsion) predicting if a Malaysian stray (cat or dog) will be adopted.

Real world data was imported from PetFinder.my (via Kaggle). The training dataset had over 11000 entries with about 19 attributes descrbing the animal. It will surely help protect Malaysian states health (locals and tourists) while making the adoption process more cost and labor efficient.

Wallstreet Journal GettyImages,Cat&Dog

Business Understanding

Malaysian Strays, 2021
Poltical Animals,2021
VetFuturist,Unknown

Malaysia's Ministry of Tourism is partnering with the local pet adoption agencies to minmize stray animals in the country. They spend about RM10.3 ($2.3Million) managing pounds and euthanization cost.

The ministry cares about correctly predicting adoptable animals well. The adoptable animals will be prioritized in a shelter or pound. It is not determined what will happen with ther other animals. The Ministry of Tourism has another phase to find the best solutions that utilize the animals and minimizes Euthanasia.

Ministry of Tourism (Main Stakeholder)

  • Want to understand which animals are the most adoptable. (Phase 1:Current Model)
  • Their goal is to improve the safety, attraction of area and soothe political upheival about all the strays.
    Adoption Agency (Secondary Stakeholder)
  • Minimize Euthenasia and maximize holding animals as long as possible

Date Understanding

Pet Finder supplied data for about 19,000 adoption entries for dogs and cats in each of Malaysias states. Kaggle PetFinder, 2018

  • Initial Target Classes Distributions:
    • One Day Adoption: 410
    • One Week Adoption: 3090
    • One Month Adoption: 3259
    • Two/Three Month Adoption: 4037
    • No Adoption: 4197
  • Modified Target Classes Distributions:
    • Adopted: 10796
    • Not Adopted: 4197

Chosen Metrics:

  • Primarily want Precision to maximize TP and minimize FP of adoptees.
  • Secondary we want Recall and F1 to minimize FN and because of imbalance.

F1 Score Metric, Joos Kortanje, 2021

** Classificaiton Distribution


Modeling & Evaluation

Model Iteration

  • Dummy Model:
  • Baseline:
    • Logistic Regression
    • KNN
    • XGBoost
    • Randon Forest
    • Neural Net
  • Pipeline (Scaled and Smoted)and GridsearchCV
    • Logistic Regression
    • KNN
    • XGBoost
    • Randon Forest
    • Neural Net

Best Estimator

XGBoost with GridsearchCV Tuning

  • StandardScaler()
  • SMOTE (random_state=42, sampling_strategy= Minority)
  • XGBClassifier (criterion='entropy',max_depth=6, learning_rate=0.01,n_estimators=120,gamma=3,random_state=42)

Confusion Matrix


Plot Matrix


Results Summary

  • They both have an average of ~13% FP rate from the Confusion Matrix. This is a decreased FP from baseline.
  • Test precision metrics of ~81%.
  • The model performs better (87%) than the base Dummy Classifier (~72%)
  • The best modeol based on performance and FP and FN percentage. Slightly better than the Random Forest model.

Best Features

The top 5 featureswere:

  • Age
  • Breed1
  • Color1
  • PhotoAmt
  • Gender

  • The top two features are the same as the results from the initial correlation
  • PhotoAmt and Color were initial weakly correlated to the Adoption Speed
  • The Breed would help understand the animal without the Type explicitly known
  • Knowing the photoAmt may help us to know that animals with photos and possibly multiple may have an easier time being adopted

Conclusion

Limitations

  • Deeper explantion of attributes
  • Target Class imbalance
  • Synthetic Data used
  • HW Resources: Hyperparamter tuning resource intensive
  • Time for more hypertuning
  • Analysis on numerical tabular attributes only

Reccomendations

  • Use model with an experienced rescuer
  • Prioritize adoption form attributes
    • The feature importance shows with are the driving factors to understand if a animal will be adopted
  • Plan cost with ~16% margin
  • Acquire photos of all rescues

Future Next Steps

  • Photo Analysis ( Image Classification)
  • Textual Analysis (Natural Language Processing )
  • Recommend specific attributes to rescue first

Repository Navigation

Repository Organization

  • .gitignore
  • License
  • data
  • images
  • reproducibility
  • notebook_capstone.ipynb
  • notebook.capston.pdf
  • presentation_capston.pdf

Presentation Link

Jupyter Notebook Link

Reproducibility Instructions

About

Animal Adoption Classification Problem Capstone Phase 5 Project

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published