Skip to content

hasanhalacli/video-silence-detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Video Silence Detection

Python 3.10+ FFmpeg License: MIT

Detect silence segments in video/audio files for ad break insertion, content segmentation, and video editing automation. Uses FFmpeg for fast detection and optionally Whisper Large v2 for speech-aware analysis.

Use Cases

  • Ad Break Insertion: Find optimal points to insert advertisements
  • Content Segmentation: Automatically split content at natural breaks
  • Video Editing Automation: Identify cut points for automated editing
  • Broadcast Compliance: Detect and log silence for broadcast standards

Features

  • Multiple Detection Methods: FFmpeg (fast), Whisper (accurate), Hybrid (best of both)
  • Configurable Parameters: Threshold, duration, and sensitivity settings
  • Export Formats: JSON and SRT (subtitle) output
  • REST API: FastAPI server for integration
  • CLI Tool: Command-line interface for batch processing

Related Projects

For comprehensive content break detection, combine with:

Using both together provides the most reliable ad break detection by identifying points where both audio silence AND visual black frames occur.


Quick Start

Prerequisites

  • Python 3.10+
  • FFmpeg installed and in PATH
  • (Optional) CUDA-compatible GPU for Whisper

Installation

git clone https://github.com/hasanhalacli/video-silence-detection.git
cd video-silence-detection

# Using uv (recommended)
uv sync

# Or pip
pip install -e .

Basic Usage

# Detect silence in a video
silence-detect video.mp4

# Find ad break points
silence-detect ad-breaks video.mp4 -n 4

# Export to SRT for video editing
silence-detect video.mp4 -f srt -o silences.srt

Silence Duration Guide

Different silence durations are useful for different purposes:

Duration Use Case Example
0.3s Micro-pauses Word boundaries, breathing
0.5s Speech pauses Sentence breaks (default)
1.0s Scene transitions Paragraph/topic changes
2.0s Ad breaks Ideal for advertisement insertion
3.0s+ Chapter breaks Intro/outro, major sections

Examples by Use Case

# Detect sentence-level pauses (tight editing)
silence-detect video.mp4 -m 0.5

# Find scene transitions
silence-detect video.mp4 -m 1.0

# Optimal ad insertion points (2+ second gaps)
silence-detect video.mp4 -m 2.0

# Chapter/segment detection
silence-detect video.mp4 -m 3.0

Threshold Guide

The silence threshold (dB) controls what audio level is considered "silence":

Threshold Sensitivity Use Case
-20dB Low Only detect true silence (music detected as non-silence)
-30dB Medium Standard threshold (default)
-40dB High Detect quiet moments (some ambient noise ok)
-50dB Very High Capture near-silence (background hum ok)

Examples

# Strict silence (studio recordings)
silence-detect video.mp4 -t -20

# Standard detection
silence-detect video.mp4 -t -30

# Include quiet sections (noisy recordings)
silence-detect video.mp4 -t -40

Detection Methods

1. FFmpeg (Default) - Fast

Uses FFmpeg's silencedetect filter. Best for:

  • Quick processing
  • Clean audio
  • Batch operations
silence-detect video.mp4 --method ffmpeg

2. Whisper - Accurate

Uses OpenAI Whisper Large v2 to detect speech, then finds gaps. Best for:

  • Content with background music
  • Noisy recordings
  • When speech detection matters
silence-detect video.mp4 --method whisper --language en

3. Hybrid - Best of Both

FFmpeg for speed, Whisper to verify. Best for:

  • Maximum accuracy
  • Mixed content
  • When false positives are costly
silence-detect video.mp4 --method hybrid

Export Formats

JSON Export

Comprehensive detection data for programmatic use:

silence-detect video.mp4 -f json -o output.json

Output:

{
  "file_path": "video.mp4",
  "duration": 3600.0,
  "total_segments": 45,
  "segments": [
    {
      "start": 125.5,
      "end": 128.2,
      "duration": 2.7,
      "midpoint": 126.85,
      "start_formatted": "00:02:05.500",
      "end_formatted": "00:02:08.200"
    }
  ],
  "ad_break_suggestions": [
    {"position": 450.2, "position_formatted": "00:07:30.200"},
    {"position": 920.5, "position_formatted": "00:15:20.500"}
  ]
}

SRT Export

Subtitle format for video editors (Premiere, DaVinci, etc.):

silence-detect video.mp4 -f srt -o silences.srt

Output:

1
00:02:05,500 --> 00:02:08,200
[SILENCE] Duration: 2.70s

2
00:07:30,000 --> 00:07:33,000
>>> AD BREAK 1 <<<

Python API

from silence_detection import SilenceDetector, JSONExporter, SRTExporter

# Initialize detector
detector = SilenceDetector()

# Basic detection
result = detector.detect(
    "video.mp4",
    method="ffmpeg",
    threshold_db=-30,
    min_duration=0.5,
)

print(f"Found {len(result.segments)} silence segments")

# Get ad break suggestions
break_points = result.get_ad_break_points(count=4)
for i, pos in enumerate(break_points, 1):
    print(f"Ad break {i}: {pos:.2f}s")

# Export to JSON
json_exporter = JSONExporter()
json_exporter.export(result, "output.json")

# Export to SRT
srt_exporter = SRTExporter(mode="both")  # silence + ad markers
srt_exporter.export(result, "output.srt")

Using Whisper for Speech-Aware Detection

# Whisper method (more accurate for content with music)
result = detector.detect(
    "video.mp4",
    method="whisper",
    min_duration=2.0,
    language="en",
)

Batch Processing

from pathlib import Path
from silence_detection import SilenceDetector, JSONExporter

detector = SilenceDetector()
exporter = JSONExporter()

videos = Path("videos").glob("*.mp4")

for video in videos:
    result = detector.detect(video, min_duration=2.0)
    output = video.with_suffix(".silence.json")
    exporter.export(result, output)
    print(f"Processed: {video.name} -> {len(result.segments)} segments")

REST API

Start Server

silence-detect serve --port 8000

Endpoints

Method Endpoint Description
GET /health Health check
POST /detect Detect silence in uploaded file
POST /detect/ad-breaks Find ad break points
GET /methods List detection methods

Example Request

# Detect silence
curl -X POST http://localhost:8000/detect \
  -F "file=@video.mp4" \
  -F "method=ffmpeg" \
  -F "min_duration=1.0"

# Find ad breaks
curl -X POST http://localhost:8000/detect/ad-breaks \
  -F "file=@video.mp4" \
  -F "num_breaks=4"

Combining with Black Frame Detection

For the most reliable ad break detection, combine silence detection with black frame detection:

from silence_detection import SilenceDetector

# Also install: pip install video-black-frame-detection
from blackframe_detection import BlackFrameDetector

silence_detector = SilenceDetector()
blackframe_detector = BlackFrameDetector()

# Detect both
silences = silence_detector.detect("video.mp4", min_duration=2.0)
black_frames = blackframe_detector.detect("video.mp4", min_duration=0.5)

# Find overlapping segments (both silent AND black)
optimal_breaks = []
for silence in silences.segments:
    for black in black_frames.segments:
        # Check if they overlap
        if silence.start < black.end and silence.end > black.start:
            # Use the overlap midpoint
            overlap_start = max(silence.start, black.start)
            overlap_end = min(silence.end, black.end)
            optimal_breaks.append((overlap_start + overlap_end) / 2)

print(f"Found {len(optimal_breaks)} optimal ad break points")

Configuration

Config File

# config.yaml
detection:
  method: ffmpeg
  threshold_db: -30.0
  min_duration: 0.5

whisper:
  model: large-v2
  device: auto

export:
  format: json
  output_dir: ./output

Environment Variables

# .env
FFMPEG_PATH=ffmpeg
WHISPER_MODEL=large-v2
SILENCE_THRESHOLD_DB=-30
MIN_SILENCE_DURATION=0.5

Project Structure

video-silence-detection/
├── src/silence_detection/
│   ├── core/
│   │   ├── detector.py       # Main unified detector
│   │   ├── ffmpeg.py         # FFmpeg-based detection
│   │   └── whisper_detector.py # Whisper-based detection
│   ├── exporters/
│   │   ├── json_exporter.py  # JSON export
│   │   └── srt_exporter.py   # SRT subtitle export
│   ├── api/
│   │   └── app.py            # FastAPI server
│   └── cli.py                # Command-line interface
├── configs/
│   └── default.yaml
├── tests/
├── pyproject.toml
└── README.md

Requirements

  • Python 3.10+
  • FFmpeg (must be installed and in PATH)
  • (Optional) CUDA GPU for Whisper acceleration

License

MIT License - see LICENSE

Author

Hasan Halacli - Website · GitHub

About

Detect silence in video/audio for ad breaks and content segmentation - FFmpeg + Whisper Large v2

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages