Video Silence Detection

Detect silence segments in video/audio files for ad break insertion, content segmentation, and video editing automation. Uses FFmpeg for fast detection and optionally Whisper Large v2 for speech-aware analysis.

Use Cases

Ad Break Insertion: Find optimal points to insert advertisements
Content Segmentation: Automatically split content at natural breaks
Video Editing Automation: Identify cut points for automated editing
Broadcast Compliance: Detect and log silence for broadcast standards

Features

Multiple Detection Methods: FFmpeg (fast), Whisper (accurate), Hybrid (best of both)
Configurable Parameters: Threshold, duration, and sensitivity settings
Export Formats: JSON and SRT (subtitle) output
REST API: FastAPI server for integration
CLI Tool: Command-line interface for batch processing

Related Projects

For comprehensive content break detection, combine with:

video-black-frame-detection: Detect black frames and scene transitions

Using both together provides the most reliable ad break detection by identifying points where both audio silence AND visual black frames occur.

Quick Start

Prerequisites

Python 3.10+
FFmpeg installed and in PATH
(Optional) CUDA-compatible GPU for Whisper

Installation

git clone https://github.com/hasanhalacli/video-silence-detection.git
cd video-silence-detection

# Using uv (recommended)
uv sync

# Or pip
pip install -e .

Basic Usage

# Detect silence in a video
silence-detect video.mp4

# Find ad break points
silence-detect ad-breaks video.mp4 -n 4

# Export to SRT for video editing
silence-detect video.mp4 -f srt -o silences.srt

Silence Duration Guide

Different silence durations are useful for different purposes:

Duration	Use Case	Example
0.3s	Micro-pauses	Word boundaries, breathing
0.5s	Speech pauses	Sentence breaks (default)
1.0s	Scene transitions	Paragraph/topic changes
2.0s	Ad breaks	Ideal for advertisement insertion
3.0s+	Chapter breaks	Intro/outro, major sections

Examples by Use Case

# Detect sentence-level pauses (tight editing)
silence-detect video.mp4 -m 0.5

# Find scene transitions
silence-detect video.mp4 -m 1.0

# Optimal ad insertion points (2+ second gaps)
silence-detect video.mp4 -m 2.0

# Chapter/segment detection
silence-detect video.mp4 -m 3.0

Threshold Guide

The silence threshold (dB) controls what audio level is considered "silence":

Threshold	Sensitivity	Use Case
-20dB	Low	Only detect true silence (music detected as non-silence)
-30dB	Medium	Standard threshold (default)
-40dB	High	Detect quiet moments (some ambient noise ok)
-50dB	Very High	Capture near-silence (background hum ok)

Examples

# Strict silence (studio recordings)
silence-detect video.mp4 -t -20

# Standard detection
silence-detect video.mp4 -t -30

# Include quiet sections (noisy recordings)
silence-detect video.mp4 -t -40

Detection Methods

1. FFmpeg (Default) - Fast

Uses FFmpeg's silencedetect filter. Best for:

Quick processing
Clean audio
Batch operations

silence-detect video.mp4 --method ffmpeg

2. Whisper - Accurate

Uses OpenAI Whisper Large v2 to detect speech, then finds gaps. Best for:

Content with background music
Noisy recordings
When speech detection matters

silence-detect video.mp4 --method whisper --language en

3. Hybrid - Best of Both

FFmpeg for speed, Whisper to verify. Best for:

Maximum accuracy
Mixed content
When false positives are costly

silence-detect video.mp4 --method hybrid

Export Formats

JSON Export

Comprehensive detection data for programmatic use:

silence-detect video.mp4 -f json -o output.json

Output:

{
  "file_path": "video.mp4",
  "duration": 3600.0,
  "total_segments": 45,
  "segments": [
    {
      "start": 125.5,
      "end": 128.2,
      "duration": 2.7,
      "midpoint": 126.85,
      "start_formatted": "00:02:05.500",
      "end_formatted": "00:02:08.200"
    }
  ],
  "ad_break_suggestions": [
    {"position": 450.2, "position_formatted": "00:07:30.200"},
    {"position": 920.5, "position_formatted": "00:15:20.500"}
  ]
}

SRT Export

Subtitle format for video editors (Premiere, DaVinci, etc.):

silence-detect video.mp4 -f srt -o silences.srt

Output:

1
00:02:05,500 --> 00:02:08,200
[SILENCE] Duration: 2.70s

2
00:07:30,000 --> 00:07:33,000
>>> AD BREAK 1 <<<

Python API

from silence_detection import SilenceDetector, JSONExporter, SRTExporter

# Initialize detector
detector = SilenceDetector()

# Basic detection
result = detector.detect(
    "video.mp4",
    method="ffmpeg",
    threshold_db=-30,
    min_duration=0.5,
)

print(f"Found {len(result.segments)} silence segments")

# Get ad break suggestions
break_points = result.get_ad_break_points(count=4)
for i, pos in enumerate(break_points, 1):
    print(f"Ad break {i}: {pos:.2f}s")

# Export to JSON
json_exporter = JSONExporter()
json_exporter.export(result, "output.json")

# Export to SRT
srt_exporter = SRTExporter(mode="both")  # silence + ad markers
srt_exporter.export(result, "output.srt")

Using Whisper for Speech-Aware Detection

# Whisper method (more accurate for content with music)
result = detector.detect(
    "video.mp4",
    method="whisper",
    min_duration=2.0,
    language="en",
)

Batch Processing

from pathlib import Path
from silence_detection import SilenceDetector, JSONExporter

detector = SilenceDetector()
exporter = JSONExporter()

videos = Path("videos").glob("*.mp4")

for video in videos:
    result = detector.detect(video, min_duration=2.0)
    output = video.with_suffix(".silence.json")
    exporter.export(result, output)
    print(f"Processed: {video.name} -> {len(result.segments)} segments")

REST API

Start Server

silence-detect serve --port 8000

Endpoints

Method	Endpoint	Description
GET	`/health`	Health check
POST	`/detect`	Detect silence in uploaded file
POST	`/detect/ad-breaks`	Find ad break points
GET	`/methods`	List detection methods

Example Request

# Detect silence
curl -X POST http://localhost:8000/detect \
  -F "file=@video.mp4" \
  -F "method=ffmpeg" \
  -F "min_duration=1.0"

# Find ad breaks
curl -X POST http://localhost:8000/detect/ad-breaks \
  -F "file=@video.mp4" \
  -F "num_breaks=4"

Combining with Black Frame Detection

For the most reliable ad break detection, combine silence detection with black frame detection:

from silence_detection import SilenceDetector

# Also install: pip install video-black-frame-detection
from blackframe_detection import BlackFrameDetector

silence_detector = SilenceDetector()
blackframe_detector = BlackFrameDetector()

# Detect both
silences = silence_detector.detect("video.mp4", min_duration=2.0)
black_frames = blackframe_detector.detect("video.mp4", min_duration=0.5)

# Find overlapping segments (both silent AND black)
optimal_breaks = []
for silence in silences.segments:
    for black in black_frames.segments:
        # Check if they overlap
        if silence.start < black.end and silence.end > black.start:
            # Use the overlap midpoint
            overlap_start = max(silence.start, black.start)
            overlap_end = min(silence.end, black.end)
            optimal_breaks.append((overlap_start + overlap_end) / 2)

print(f"Found {len(optimal_breaks)} optimal ad break points")

Configuration

Config File

# config.yaml
detection:
  method: ffmpeg
  threshold_db: -30.0
  min_duration: 0.5

whisper:
  model: large-v2
  device: auto

export:
  format: json
  output_dir: ./output

Environment Variables

# .env
FFMPEG_PATH=ffmpeg
WHISPER_MODEL=large-v2
SILENCE_THRESHOLD_DB=-30
MIN_SILENCE_DURATION=0.5

Project Structure

video-silence-detection/
├── src/silence_detection/
│   ├── core/
│   │   ├── detector.py       # Main unified detector
│   │   ├── ffmpeg.py         # FFmpeg-based detection
│   │   └── whisper_detector.py # Whisper-based detection
│   ├── exporters/
│   │   ├── json_exporter.py  # JSON export
│   │   └── srt_exporter.py   # SRT subtitle export
│   ├── api/
│   │   └── app.py            # FastAPI server
│   └── cli.py                # Command-line interface
├── configs/
│   └── default.yaml
├── tests/
├── pyproject.toml
└── README.md

Requirements

Python 3.10+
FFmpeg (must be installed and in PATH)
(Optional) CUDA GPU for Whisper acceleration

License

MIT License - see LICENSE

Author

Hasan Halacli - Website · GitHub

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
configs		configs
src/silence_detection		src/silence_detection
tests		tests
.env.example		.env.example
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

License

hasanhalacli/video-silence-detection

Folders and files

Latest commit

History

Repository files navigation

Video Silence Detection

Use Cases

Features

Related Projects

Quick Start

Prerequisites

Installation

Basic Usage

Silence Duration Guide

Examples by Use Case

Threshold Guide

Examples

Detection Methods

1. FFmpeg (Default) - Fast

2. Whisper - Accurate

3. Hybrid - Best of Both

Export Formats

JSON Export

SRT Export

Python API

Using Whisper for Speech-Aware Detection

Batch Processing

REST API

Start Server

Endpoints

Example Request

Combining with Black Frame Detection

Configuration

Config File

Environment Variables

Project Structure

Requirements

License

Author

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages