Speaker Diarization Toolkit

A comprehensive toolkit for speaker identification and diarization in audio transcripts. Maps generic speaker labels (S1, S2, ...) to known speaker profiles using embeddings, voice characteristics, and LLM-based analysis.

Overview

┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│   Audio     │───>│   catalog   │───>│   assign    │───>│   review    │
│   Input     │    │  (track)    │    │   (map)     │    │  (verify)   │
└─────────────┘    └─────────────┘    └─────────────┘    └─────────────┘
                         │                   │                  │
                         v                   v                  v
                   ┌───────────┐       ┌───────────┐     ┌───────────┐
                   │ STT/Trans │       │ Profiles  │     │  Report   │
                   │  Backend  │       │ Embeddings│     │  Status   │
                   └───────────┘       └───────────┘     └───────────┘

Tools

The toolkit consists of several specialized tools:

Tool	Purpose
`speaker-catalog`	Recording inventory and processing state management
`speaker-assign`	Map diarization labels to known speaker profiles
`speaker-review`	Interactive review workflow for assignments
`speaker-llm`	LLM-based speaker name detection from transcripts
`speaker-process`	Batch processing queue management
`speaker-report`	Pipeline status and health reporting
`speaker_detection`	Core speaker profile management
`speaker_samples`	Sample extraction and review
`speaker_segments`	Transcript segment extraction

Quick Start

# 1. Add a speaker profile
./speaker_detection add alice --name "Alice Smith" --tag team

# 2. Catalog a recording
./speaker-catalog add meeting.mp3 --context team-standup

# 3. Register transcript (from your STT backend)
./speaker-catalog register-transcript meeting.mp3 \
    --backend speechmatics --transcript meeting.json

# 4. Assign speakers
./speaker-assign meeting.mp3 --transcript meeting.json

# 5. Review assignments
./speaker-review

# 6. Check pipeline status
./speaker-report status

Installation

No installation required. Tools use uv run for automatic dependency management.

Requirements:

Python 3.11+
ffprobe (usually bundled with ffmpeg)
b3sum (optional, falls back to SHA256)
jq (for query commands)

Configuration

Set the data directory:

export SPEAKERS_EMBEDDINGS_DIR="$HOME/.config/speakers_embeddings"

Testing

Run the test suite:

# Run all tests
./run_speaker_diarization_tests.sh

# Run unit tests only (fast, no API)
./run_speaker_diarization_tests.sh unit

# Run specific collection
./run_speaker_diarization_tests.sh catalog

# View test documentation
./run_speaker_diarization_tests.sh --doc catalog

# Run in Docker
./run_speaker_diarization_tests.sh docker

See evals/TESTING.md for testing methodology.

Documentation

Each tool has detailed documentation:

speaker-catalog.README.md - Recording catalog management
speaker-assign.README.md - Speaker label assignment
speaker-review.README.md - Interactive review workflow
speaker-llm.README.md - LLM-based name detection
speaker-process.README.md - Batch processing
speaker-report.README.md - Status reporting
speaker_detection.README.md - Core profile management
speaker_samples.README.md - Sample extraction
speaker_segments.README.md - Segment extraction

Development notes are in *.DEV_NOTES.md files.

Architecture

Data Storage

All data stored in $SPEAKERS_EMBEDDINGS_DIR:

$SPEAKERS_EMBEDDINGS_DIR/
├── speakers/           # Speaker profiles (YAML)
├── embeddings/         # Voice embeddings
├── samples/            # Audio samples per speaker
├── catalog/            # Recording catalog entries
├── assignments/        # Speaker assignments
└── cache/              # LLM response cache

Backend Support

Currently supports:

Speechmatics - Primary STT backend with diarization
AssemblyAI - Alternative transcript format support

Related Projects

stt-in-batch - Batch speech-to-text pipeline that integrates with speaker-diarization-toolkit for end-to-end transcription with speaker identification

License

MIT License - See LICENSE file for details.

Contributing

Contributions welcome! Please see the development notes in *.DEV_NOTES.md files for implementation details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speaker Diarization Toolkit

Overview

Tools

Quick Start

Installation

Configuration

Testing

Documentation

Architecture

Data Storage

Backend Support

Related Projects

License

Contributing

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
docs		docs
evals		evals
ramblings		ramblings
speaker_detection_backends		speaker_detection_backends
.gitignore		.gitignore
CURRENT_WORK.md		CURRENT_WORK.md
README.md		README.md
pyproject.toml		pyproject.toml
run_speaker_diarization_tests.sh		run_speaker_diarization_tests.sh
run_speaker_diarization_tests.sh.README.md		run_speaker_diarization_tests.sh.README.md
speaker-assign		speaker-assign
speaker-assign.DEV_NOTES.md		speaker-assign.DEV_NOTES.md
speaker-assign.README.md		speaker-assign.README.md
speaker-catalog		speaker-catalog
speaker-catalog.DEV_NOTES.md		speaker-catalog.DEV_NOTES.md
speaker-catalog.README.md		speaker-catalog.README.md
speaker-llm		speaker-llm
speaker-llm.DEV_NOTES.md		speaker-llm.DEV_NOTES.md
speaker-llm.README.md		speaker-llm.README.md
speaker-process		speaker-process
speaker-process.DEV_NOTES.md		speaker-process.DEV_NOTES.md
speaker-process.README.md		speaker-process.README.md
speaker-report		speaker-report
speaker-report.DEV_NOTES.md		speaker-report.DEV_NOTES.md
speaker-report.README.md		speaker-report.README.md
speaker-review		speaker-review
speaker-review.DEV_NOTES.md		speaker-review.DEV_NOTES.md
speaker-review.README.md		speaker-review.README.md
speaker_detection		speaker_detection
speaker_detection.README.md		speaker_detection.README.md
speaker_samples		speaker_samples
speaker_samples.README.md		speaker_samples.README.md
speaker_segments		speaker_segments
speaker_segments.README.md		speaker_segments.README.md
test_stt_assemblyai_speaker_mapper.py		test_stt_assemblyai_speaker_mapper.py
uv.lock		uv.lock

CLIAI/speaker-diarization-toolkit

Folders and files

Latest commit

History

Repository files navigation

Speaker Diarization Toolkit

Overview

Tools

Quick Start

Installation

Configuration

Testing

Documentation

Architecture

Data Storage

Backend Support

Related Projects

License

Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages