Deepfake content is an increasing threat to digital media authenticity. This project introduces a real-time, explainable deepfake detection pipeline that processes video in memory, minimizing latency while providing interpretable outputs. The system leverages frame reduction techniques, pretrained visual models (e.g., FakeShield), and vision-capable LLMs for generating human-understandable explanations — all without writing any media to disk.
- Perform real-time deepfake detection without persistent storage.
- Apply lightweight motion-based heuristics (e.g., optical flow, scene detection) to reduce the number of frames analyzed.
- Use pretrained models like FakeShield for forgery detection and localization.
- Generate coherent natural-language explanations via fine-tuned vision LLMs.
- Serve results instantly through a FastAPI-based backend, optionally containerized with Docker.
- Supports both video file upload and live frame streaming via WebSocket.
- Entire pipeline runs in-memory, avoiding any disk I/O for performance.
- Optical Flow: Tracks pixel-wise motion (e.g., Farneback or RAFT).
- Scene Change Detection: Uses histogram delta or tools like PySceneDetect.
- Optional: Background subtraction to eliminate static regions.
Goal: Reduce frame count by 90% while maintaining semantic fidelity.
Model: FakeShield v1-22b
A multimodal vision-language framework for explainable deepfake detection and localization.
Input:
- Frame as a tensor (from OpenCV → NumPy → PyTorch)
- Modules used:
DTE-FDMandMFLM
Output (per frame):
verdict: real or fakeconfidence_score: e.g.,0.87forgery_mask: binary/grayscale image mask (H, W) asnumpy.ndarrayortorch.Tensorattention_map(optional): model attention visualization to highlight decision focus
These outputs are passed to the LLM for final explanation generation.
Model: saakshigupta/deepfake-explainer-new
A LLaVA-based adapter fine-tuned to generate deepfake analysis across multiple images.
Inputs:
- Original frame
- Forgery mask
- Optional: Attention map or overlay
- Prompt: "Explain if this frame shows signs of tampering."
Output:
- A detailed natural language explanation, highlighting potential manipulation and justifying the verdict.
Example:
“Regions around the mouth and cheek show boundary noise and abnormal motion artifacts, indicating synthetic manipulation.”
Served via FastAPI as JSON:
{
"frame_index": 45,
"verdict": "fake",
"confidence": 0.87,
"explanation": "Facial boundary irregularities suggest deepfake generation.",
"forgery_mask": "<base64-image>",
"attention_map": "<base64-image>"
}| Layer | Stack |
|---|---|
| API Backend | FastAPI |
| Frame Processing | OpenCV, FFmpeg |
| Motion Detection | Optical Flow, PySceneDetect |
| Deepfake Detection | FakeShield, PyTorch |
| LLM Explanation | Hugging Face, LangChain |
| Deployment | Docker (optional) |
