The goal of this project is to create a classifier algorithm that can identify credit card fraudolent transactions.
The Dataset used to train the model contains a list of transactions labeled as fraudolent or genuine and it is avaliable on kaggle.com at the link [ https://www.kaggle.com/datasets/dhanushnarayananr/credit-card-fraud ]
dataframe.shape: (1000000, 8)
Data columns (total 8 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 distance_from_home 1000000 non-null float64
1 distance_from_last_transaction 1000000 non-null float64
2 ratio_to_median_purchase_price 1000000 non-null float64
3 repeat_retailer 1000000 non-null float64
4 used_chip 1000000 non-null float64
5 used_pin_number 1000000 non-null float64
6 online_order 1000000 non-null float64
7 fraud 1000000 non-null float64
- EDA - Exploratory Data Analysis
- Data Pre-Processing
- Model Selection
- Logistic Regression
- K-Nearest Neighbors classifier
- Neural Network - MLP
- Model Assessment
All details and results of the project are described in the Python Notebook credit_card_fraud_final_project.ipynb