This project demonstrates a complete, end-to-end real-time fraud detection system built with a modern streaming data stack. It simulates a stream of financial transactions, processes them using a stateful machine learning model in Apache Flink, and displays detected fraudulent activity on a live dashboard.
The primary goal is to showcase a realistic, distributed architecture for stateful stream processing, anomaly detection, and real-time alerting.
Flink Job Overview: Displays the active Flink job titled
"Stateful Fraud Detection with Redis Alerts" in a stable
RUNNING state, indicating continuous stream processing.
Flink Job Details (Dataflow): Visualizes the internal operators and data transformation steps within the Flink pipeline, including source, keyed processing, model inference, and sink stages.
Fraud Detection Dashboard: The web interface for real-time monitoring of fraud alerts. It connects to the backend via WebSocket and displays incoming fraud events as they are detected.
- Flink UI (Top): Shows the "Stateful Fraud Detection with Redis Alerts" job in a stable
RUNNINGstate, processing data as it arrives. - Fraud Dashboard (Bottom): The end-user interface displays alerts in real-time as they are detected.
The system is composed of several microservices orchestrated by Docker Compose. The data flows through the pipeline as follows:
- Payment API: A Python FastAPI application that acts as a simple payment gateway, receiving transaction data via a POST request.
- Redpanda: A high-performance, Kafka-compatible streaming data platform that receives transactions from the API and stores them in the
transactionstopic. - Apache Flink Application: A stateful PyFlink job that consumes transactions from Redpanda. For each user, it maintains historical features (e.g., average transaction amount) and uses a pre-trained Isolation Forest model to classify each new transaction as fraudulent or legitimate.
- Redis: A fast, in-memory message queue. When Flink detects a fraudulent transaction, it pushes a JSON alert message into a Redis list.
- Alert Monitor: A second Python FastAPI application that serves a web dashboard. It uses a WebSocket to listen for new alerts from the Redis queue and displays them to the user in real-time.
- Technology: Python, FastAPI, Uvicorn,
kafka-python. - Role: Provides a
/transactionendpoint to ingest data into the system. It validates the incoming data and produces it as a JSON message to thetransactionstopic in Redpanda.
- Technology: PyFlink,
scikit-learn,joblib. - Role: The core processing engine. The
fraud_detector.pyscript consumes from Kafka, keys the stream byuser_id, maintains user state, and applies the trained ML model to detect anomalies. - Model: Contains the serialized
isolation_forest.joblibandscaler.joblibfiles required for inference.
- Technology: Python, FastAPI, Uvicorn, WebSockets,
redis-py. - Role: Provides the user interface. Serves an HTML dashboard and establishes a WebSocket connection with the user's browser. A background task listens to the Redis queue and broadcasts any new alerts to all connected clients.
- Technology: Jupyter, Pandas, Scikit-learn.
- Role: The
1-data_generation_and_training.ipynbnotebook is used for the offline portion: generating a synthetic dataset of transactions and training the Isolation Forest model. Crucially, this is where the model files used by the Flink app are created.
- Stream Processing: Apache Flink 1.18.1
- Message Broker: Redpanda (Kafka-compatible)
- Alerting Queue: Redis
- Web APIs/Services: Python 3.10, FastAPI, Uvicorn
- Machine Learning: Scikit-learn, Joblib, Pandas, Numpy
- Orchestration: Docker & Docker Compose
Follow these steps to build and run the entire system on your local machine.
- Docker
- Docker Compose
This is the most critical step to ensure the ML model is compatible with the Flink environment.
-
Navigate to the notebooks directory:
cd notebooks -
Create and activate a Python virtual environment:
python3 -m venv venv source venv/bin/activate -
Install pinned dependencies: This installs the exact library versions used in the Flink cluster.
pip install -r requirements.txt
-
Run the Jupyter Notebook: Start Jupyter Lab/Notebook and open
1-data_generation_and_training.ipynb. Run all cells from top to bottom. This will create/overwrite two essential files with the correct versions:../flink_app/model/isolation_forest.joblib../flink_app/model/scaler.joblib
- Navigate back to the project root directory.
- Stop any old containers and remove old volumes to ensure a clean start.
docker compose down --volumes
- Build all the service images:
This will take a few minutes the first time. The command will wait until the
docker compose up -d --build
redisservice is healthy before starting thealert-monitor.
- Wait about 30 seconds for all services to fully initialize.
- Execute the
flink runcommand to submit the Python script to the cluster:docker compose exec flink-jobmanager flink run -py /opt/flink/usrlib/fraud_detector.py - You will see a success message:
Job has been submitted with JobID ...
Your real-time fraud detection system is now running!
-
Open the Dashboards:
- Flink UI: Navigate to
http://localhost:8081. Click on "Running Jobs" to see your "Stateful Fraud Detection with Redis Alerts" job in a stable, green RUNNING state. - Alert Dashboard: Navigate to
http://localhost:8080. The page should load and the status should say "Connected to fraud alert system".
- Flink UI: Navigate to
-
Send a Normal Transaction: This first transaction establishes a baseline history for a new user. It should not trigger an alert.
curl -X POST "http://localhost:8000/transaction" \ -H "Content-Type: application/json" \ -d '{ "transaction_id": "txn_normal_001", "user_id": "user-test-123", "card_number": "4242-4242-4242-4242", "amount": 50.00, "timestamp": "2025-07-15T10:00:00Z", "merchant_id": "merchant_a", "location": "New York" }'
-
Send a Fraudulent Transaction: This second transaction for the same user has a very high amount and will be flagged as an anomaly by the model.
curl -X POST "http://localhost:8000/transaction" \ -H "Content-Type: application/json" \ -d '{ "transaction_id": "txn_fraud_002", "user_id": "user-test-123", "card_number": "4242-4242-4242-4242", "amount": 5000.00, "timestamp": "2025-07-15T10:01:00Z", "merchant_id": "merchant_b", "location": "Online" }'
-
Observe the Result: Look at the dashboard at
http://localhost:8080. A red "FRAUD ALERT" card for transactiontxn_fraud_002will appear instantly.
To stop all services and remove all containers, networks, and data volumes, run:
docker compose down --volumes

