Detect silence segments in video/audio files for ad break insertion, content segmentation, and video editing automation. Uses FFmpeg for fast detection and optionally Whisper Large v2 for speech-aware analysis.
- Ad Break Insertion: Find optimal points to insert advertisements
- Content Segmentation: Automatically split content at natural breaks
- Video Editing Automation: Identify cut points for automated editing
- Broadcast Compliance: Detect and log silence for broadcast standards
- Multiple Detection Methods: FFmpeg (fast), Whisper (accurate), Hybrid (best of both)
- Configurable Parameters: Threshold, duration, and sensitivity settings
- Export Formats: JSON and SRT (subtitle) output
- REST API: FastAPI server for integration
- CLI Tool: Command-line interface for batch processing
For comprehensive content break detection, combine with:
- video-black-frame-detection: Detect black frames and scene transitions
Using both together provides the most reliable ad break detection by identifying points where both audio silence AND visual black frames occur.
- Python 3.10+
- FFmpeg installed and in PATH
- (Optional) CUDA-compatible GPU for Whisper
git clone https://github.com/hasanhalacli/video-silence-detection.git
cd video-silence-detection
# Using uv (recommended)
uv sync
# Or pip
pip install -e .# Detect silence in a video
silence-detect video.mp4
# Find ad break points
silence-detect ad-breaks video.mp4 -n 4
# Export to SRT for video editing
silence-detect video.mp4 -f srt -o silences.srtDifferent silence durations are useful for different purposes:
| Duration | Use Case | Example |
|---|---|---|
| 0.3s | Micro-pauses | Word boundaries, breathing |
| 0.5s | Speech pauses | Sentence breaks (default) |
| 1.0s | Scene transitions | Paragraph/topic changes |
| 2.0s | Ad breaks | Ideal for advertisement insertion |
| 3.0s+ | Chapter breaks | Intro/outro, major sections |
# Detect sentence-level pauses (tight editing)
silence-detect video.mp4 -m 0.5
# Find scene transitions
silence-detect video.mp4 -m 1.0
# Optimal ad insertion points (2+ second gaps)
silence-detect video.mp4 -m 2.0
# Chapter/segment detection
silence-detect video.mp4 -m 3.0The silence threshold (dB) controls what audio level is considered "silence":
| Threshold | Sensitivity | Use Case |
|---|---|---|
| -20dB | Low | Only detect true silence (music detected as non-silence) |
| -30dB | Medium | Standard threshold (default) |
| -40dB | High | Detect quiet moments (some ambient noise ok) |
| -50dB | Very High | Capture near-silence (background hum ok) |
# Strict silence (studio recordings)
silence-detect video.mp4 -t -20
# Standard detection
silence-detect video.mp4 -t -30
# Include quiet sections (noisy recordings)
silence-detect video.mp4 -t -40Uses FFmpeg's silencedetect filter. Best for:
- Quick processing
- Clean audio
- Batch operations
silence-detect video.mp4 --method ffmpegUses OpenAI Whisper Large v2 to detect speech, then finds gaps. Best for:
- Content with background music
- Noisy recordings
- When speech detection matters
silence-detect video.mp4 --method whisper --language enFFmpeg for speed, Whisper to verify. Best for:
- Maximum accuracy
- Mixed content
- When false positives are costly
silence-detect video.mp4 --method hybridComprehensive detection data for programmatic use:
silence-detect video.mp4 -f json -o output.jsonOutput:
{
"file_path": "video.mp4",
"duration": 3600.0,
"total_segments": 45,
"segments": [
{
"start": 125.5,
"end": 128.2,
"duration": 2.7,
"midpoint": 126.85,
"start_formatted": "00:02:05.500",
"end_formatted": "00:02:08.200"
}
],
"ad_break_suggestions": [
{"position": 450.2, "position_formatted": "00:07:30.200"},
{"position": 920.5, "position_formatted": "00:15:20.500"}
]
}Subtitle format for video editors (Premiere, DaVinci, etc.):
silence-detect video.mp4 -f srt -o silences.srtOutput:
1
00:02:05,500 --> 00:02:08,200
[SILENCE] Duration: 2.70s
2
00:07:30,000 --> 00:07:33,000
>>> AD BREAK 1 <<<from silence_detection import SilenceDetector, JSONExporter, SRTExporter
# Initialize detector
detector = SilenceDetector()
# Basic detection
result = detector.detect(
"video.mp4",
method="ffmpeg",
threshold_db=-30,
min_duration=0.5,
)
print(f"Found {len(result.segments)} silence segments")
# Get ad break suggestions
break_points = result.get_ad_break_points(count=4)
for i, pos in enumerate(break_points, 1):
print(f"Ad break {i}: {pos:.2f}s")
# Export to JSON
json_exporter = JSONExporter()
json_exporter.export(result, "output.json")
# Export to SRT
srt_exporter = SRTExporter(mode="both") # silence + ad markers
srt_exporter.export(result, "output.srt")# Whisper method (more accurate for content with music)
result = detector.detect(
"video.mp4",
method="whisper",
min_duration=2.0,
language="en",
)from pathlib import Path
from silence_detection import SilenceDetector, JSONExporter
detector = SilenceDetector()
exporter = JSONExporter()
videos = Path("videos").glob("*.mp4")
for video in videos:
result = detector.detect(video, min_duration=2.0)
output = video.with_suffix(".silence.json")
exporter.export(result, output)
print(f"Processed: {video.name} -> {len(result.segments)} segments")silence-detect serve --port 8000| Method | Endpoint | Description |
|---|---|---|
| GET | /health |
Health check |
| POST | /detect |
Detect silence in uploaded file |
| POST | /detect/ad-breaks |
Find ad break points |
| GET | /methods |
List detection methods |
# Detect silence
curl -X POST http://localhost:8000/detect \
-F "file=@video.mp4" \
-F "method=ffmpeg" \
-F "min_duration=1.0"
# Find ad breaks
curl -X POST http://localhost:8000/detect/ad-breaks \
-F "file=@video.mp4" \
-F "num_breaks=4"For the most reliable ad break detection, combine silence detection with black frame detection:
from silence_detection import SilenceDetector
# Also install: pip install video-black-frame-detection
from blackframe_detection import BlackFrameDetector
silence_detector = SilenceDetector()
blackframe_detector = BlackFrameDetector()
# Detect both
silences = silence_detector.detect("video.mp4", min_duration=2.0)
black_frames = blackframe_detector.detect("video.mp4", min_duration=0.5)
# Find overlapping segments (both silent AND black)
optimal_breaks = []
for silence in silences.segments:
for black in black_frames.segments:
# Check if they overlap
if silence.start < black.end and silence.end > black.start:
# Use the overlap midpoint
overlap_start = max(silence.start, black.start)
overlap_end = min(silence.end, black.end)
optimal_breaks.append((overlap_start + overlap_end) / 2)
print(f"Found {len(optimal_breaks)} optimal ad break points")# config.yaml
detection:
method: ffmpeg
threshold_db: -30.0
min_duration: 0.5
whisper:
model: large-v2
device: auto
export:
format: json
output_dir: ./output# .env
FFMPEG_PATH=ffmpeg
WHISPER_MODEL=large-v2
SILENCE_THRESHOLD_DB=-30
MIN_SILENCE_DURATION=0.5video-silence-detection/
├── src/silence_detection/
│ ├── core/
│ │ ├── detector.py # Main unified detector
│ │ ├── ffmpeg.py # FFmpeg-based detection
│ │ └── whisper_detector.py # Whisper-based detection
│ ├── exporters/
│ │ ├── json_exporter.py # JSON export
│ │ └── srt_exporter.py # SRT subtitle export
│ ├── api/
│ │ └── app.py # FastAPI server
│ └── cli.py # Command-line interface
├── configs/
│ └── default.yaml
├── tests/
├── pyproject.toml
└── README.md
- Python 3.10+
- FFmpeg (must be installed and in PATH)
- (Optional) CUDA GPU for Whisper acceleration
MIT License - see LICENSE