Skip to content

Real-time deepfake detection and explanation pipeline using pretrained visual models (e.g., FakeShield) and vision-language LLMs — optimized for low-latency inference.

Notifications You must be signed in to change notification settings

Redgerd/X-DetectRT

Repository files navigation

FastAPI Celery Redis PostgreSQL TensorFlow OpenCV Docker Langchain

Real-Time Deepfake Detection and Explanation Using Lightweight ML and LLMs

Deepfake content is an increasing threat to digital media authenticity. This project introduces a real-time, explainable deepfake detection pipeline that processes video in memory, minimizing latency while providing interpretable outputs. The system leverages frame reduction techniques, pretrained visual models (e.g., FakeShield), and vision-capable LLMs for generating human-understandable explanations — all without writing any media to disk.

Objectives

  • Perform real-time deepfake detection without persistent storage.
  • Apply lightweight motion-based heuristics (e.g., optical flow, scene detection) to reduce the number of frames analyzed.
  • Use pretrained models like FakeShield for forgery detection and localization.
  • Generate coherent natural-language explanations via fine-tuned vision LLMs.
  • Serve results instantly through a FastAPI-based backend, optionally containerized with Docker.

System Modules

1. Video Input & Streaming

  • Supports both video file upload and live frame streaming via WebSocket.
  • Entire pipeline runs in-memory, avoiding any disk I/O for performance.

2. Frame Selection Pipeline

  • Optical Flow: Tracks pixel-wise motion (e.g., Farneback or RAFT).
  • Scene Change Detection: Uses histogram delta or tools like PySceneDetect.
  • Optional: Background subtraction to eliminate static regions.

Goal: Reduce frame count by 90% while maintaining semantic fidelity.

3. Deepfake Detection Engine

Model: FakeShield v1-22b
A multimodal vision-language framework for explainable deepfake detection and localization.

Input:

  • Frame as a tensor (from OpenCV → NumPy → PyTorch)
  • Modules used: DTE-FDM and MFLM

Output (per frame):

  • verdict: real or fake
  • confidence_score: e.g., 0.87
  • forgery_mask: binary/grayscale image mask (H, W) as numpy.ndarray or torch.Tensor
  • attention_map (optional): model attention visualization to highlight decision focus

These outputs are passed to the LLM for final explanation generation.

4. Explanation Engine

Model: saakshigupta/deepfake-explainer-new
A LLaVA-based adapter fine-tuned to generate deepfake analysis across multiple images.

Inputs:

  • Original frame
  • Forgery mask
  • Optional: Attention map or overlay
  • Prompt: "Explain if this frame shows signs of tampering."

Output:

  • A detailed natural language explanation, highlighting potential manipulation and justifying the verdict.

Example:

“Regions around the mouth and cheek show boundary noise and abnormal motion artifacts, indicating synthetic manipulation.”

5. Response API

Served via FastAPI as JSON:

{
  "frame_index": 45,
  "verdict": "fake",
  "confidence": 0.87,
  "explanation": "Facial boundary irregularities suggest deepfake generation.",
  "forgery_mask": "<base64-image>",
  "attention_map": "<base64-image>"
}

image

Tech Stack

Layer Stack
API Backend FastAPI
Frame Processing OpenCV, FFmpeg
Motion Detection Optical Flow, PySceneDetect
Deepfake Detection FakeShield, PyTorch
LLM Explanation Hugging Face, LangChain
Deployment Docker (optional)

About

Real-time deepfake detection and explanation pipeline using pretrained visual models (e.g., FakeShield) and vision-language LLMs — optimized for low-latency inference.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •