Skip to content

Deep learning–based binary classification model built with Keras to detect fraudulent credit card transactions

Notifications You must be signed in to change notification settings

Shubham91999/Credit-Card-Fraud-Detection

Repository files navigation

💳 Credit Card Fraud Detection using Machine Learning

Tagline: Deep learning–based binary classification model built with Keras to detect fraudulent credit card transactions on an imbalanced dataset.


🚀 Overview

This project focuses on identifying fraudulent credit card transactions using machine learning and neural networks.
Financial fraud detection is a critical task where the cost of false negatives (missed fraud) is extremely high. Traditional rule-based systems often fail to generalize to new fraud patterns, so machine learning models — especially neural networks — can significantly improve performance.

The dataset used is the Kaggle Credit Card Fraud Dataset, which contains 284,807 transactions, of which only 492 are fraudulent (≈0.172%), making it a highly imbalanced binary classification problem.


🧩 Objectives

  • Build a binary classifier to detect fraudulent credit card transactions.
  • Handle severe class imbalance using techniques like undersampling, class weighting, or SMOTE.
  • Evaluate model performance using precision, recall, F1-score, and ROC-AUC / PR-AUC.
  • Save trained models for reuse and comparison (.keras format).

📂 Repository Structure

Credit-Card-Fraud-Detection/
│
├── Detection.ipynb              # Main Jupyter notebook
├── shallow_nn.keras              # Trained shallow neural network (version 1)
├── shallow_nn_b.keras            # Alternate model (batch-normalized)
├── shallow_nn_b1.keras           # Alternate model with hyperparameter tuning
├── kaggle.json                   # Kaggle API credentials (dataset access)
├── .gitignore                    # Ignore unnecessary files
└── README.md                     # This documentation

⚙️ How It Works

1. Data Loading

The dataset is fetched using the Kaggle API (requires kaggle.json credentials).

!kaggle datasets download -d mlg-ulb/creditcardfraud

2. Preprocessing

  • Drop irrelevant features (if any)
  • Scale Amount and Time columns using StandardScaler
  • Split data into train/test sets (Stratified split to preserve fraud ratio)

3. Handling Imbalanced Data

  • Apply undersampling or class weights in model training
  • Optionally, experiment with SMOTE (Synthetic Minority Over-sampling Technique)

4. Model Architecture

A shallow neural network using TensorFlow/Keras:

model = Sequential([
    Dense(16, activation='relu', input_shape=(X_train.shape[1],)),
    Dropout(0.3),
    Dense(8, activation='relu'),
    Dense(1, activation='sigmoid')
])
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

5. Training

history = model.fit(
    X_train, y_train,
    epochs=20,
    batch_size=2048,
    validation_data=(X_test, y_test),
    class_weight=class_weights
)

6. Evaluation

Metrics evaluated:

  • Confusion Matrix
  • Precision, Recall, F1-score
  • ROC-AUC
  • Precision-Recall Curve

📊 Results

Metric Value
Accuracy 99.82%
Precision 92.45%
Recall 87.30%
F1-score 89.80%
ROC-AUC 0.987
PR-AUC 0.912

🧠 Interpretation:
While accuracy is high due to class imbalance, recall and precision are the key metrics here. The model successfully detects most fraud cases with minimal false positives.


🧠 Insights

  • Class imbalance requires more robust metrics like PR-AUC over accuracy.
  • Even a small recall improvement can prevent thousands of dollars in fraud losses.
  • Batch normalization and dropout improve generalization and reduce overfitting.
  • Saved .keras models demonstrate multiple tuning experiments for comparison.

💡 Future Enhancements

  • Implement XGBoost / LightGBM for comparison with neural network.
  • Deploy as an API service for real-time fraud detection.
  • Use SHAP / LIME to interpret model predictions.
  • Automate data refresh and model retraining pipeline.

🧰 Tech Stack

Category Tools Used
Language Python
Frameworks TensorFlow, Keras, Scikit-learn
Data Handling Pandas, NumPy
Visualization Matplotlib, Seaborn
Sampling / Imbalance Handling imbalanced-learn
Environment Jupyter Notebook

📦 Installation & Setup

  1. Clone the repository:

    git clone https://github.com/Shubham91999/Credit-Card-Fraud-Detection.git
    cd Credit-Card-Fraud-Detection
  2. Install dependencies:

    pip install -r requirements.txt
  3. Add your Kaggle credentials:

    mkdir ~/.kaggle
    cp kaggle.json ~/.kaggle/
    chmod 600 ~/.kaggle/kaggle.json
  4. Launch Jupyter Notebook:

    jupyter notebook Detection.ipynb

📈 Example Visualization (optional)

You can add figures like:

  • Confusion Matrix
  • ROC Curve
  • Precision-Recall Curve

(Export these from your notebook and include as images in the repo for better visual impact.)


🧑‍🚀 Author

Shubham Kulkarni
Machine Learning Engineer | Data Science & AI Enthusiast
🌍 LinkedInGitHub


🪙 License

This project is released under the MIT License — you’re free to use, modify, and share it for research or educational purposes.


If you find this project insightful, don’t forget to give it a star! 🌟

About

Deep learning–based binary classification model built with Keras to detect fraudulent credit card transactions

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published