A comprehensive toolkit for speaker identification and diarization in audio transcripts. Maps generic speaker labels (S1, S2, ...) to known speaker profiles using embeddings, voice characteristics, and LLM-based analysis.
┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Audio │───>│ catalog │───>│ assign │───>│ review │
│ Input │ │ (track) │ │ (map) │ │ (verify) │
└─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘
│ │ │
v v v
┌───────────┐ ┌───────────┐ ┌───────────┐
│ STT/Trans │ │ Profiles │ │ Report │
│ Backend │ │ Embeddings│ │ Status │
└───────────┘ └───────────┘ └───────────┘
The toolkit consists of several specialized tools:
| Tool | Purpose |
|---|---|
speaker-catalog |
Recording inventory and processing state management |
speaker-assign |
Map diarization labels to known speaker profiles |
speaker-review |
Interactive review workflow for assignments |
speaker-llm |
LLM-based speaker name detection from transcripts |
speaker-process |
Batch processing queue management |
speaker-report |
Pipeline status and health reporting |
speaker_detection |
Core speaker profile management |
speaker_samples |
Sample extraction and review |
speaker_segments |
Transcript segment extraction |
# 1. Add a speaker profile
./speaker_detection add alice --name "Alice Smith" --tag team
# 2. Catalog a recording
./speaker-catalog add meeting.mp3 --context team-standup
# 3. Register transcript (from your STT backend)
./speaker-catalog register-transcript meeting.mp3 \
--backend speechmatics --transcript meeting.json
# 4. Assign speakers
./speaker-assign meeting.mp3 --transcript meeting.json
# 5. Review assignments
./speaker-review
# 6. Check pipeline status
./speaker-report statusNo installation required. Tools use uv run for automatic dependency management.
Requirements:
- Python 3.11+
- ffprobe (usually bundled with ffmpeg)
- b3sum (optional, falls back to SHA256)
- jq (for query commands)
Set the data directory:
export SPEAKERS_EMBEDDINGS_DIR="$HOME/.config/speakers_embeddings"Run the test suite:
# Run all tests
./run_speaker_diarization_tests.sh
# Run unit tests only (fast, no API)
./run_speaker_diarization_tests.sh unit
# Run specific collection
./run_speaker_diarization_tests.sh catalog
# View test documentation
./run_speaker_diarization_tests.sh --doc catalog
# Run in Docker
./run_speaker_diarization_tests.sh dockerSee evals/TESTING.md for testing methodology.
Each tool has detailed documentation:
speaker-catalog.README.md- Recording catalog managementspeaker-assign.README.md- Speaker label assignmentspeaker-review.README.md- Interactive review workflowspeaker-llm.README.md- LLM-based name detectionspeaker-process.README.md- Batch processingspeaker-report.README.md- Status reportingspeaker_detection.README.md- Core profile managementspeaker_samples.README.md- Sample extractionspeaker_segments.README.md- Segment extraction
Development notes are in *.DEV_NOTES.md files.
All data stored in $SPEAKERS_EMBEDDINGS_DIR:
$SPEAKERS_EMBEDDINGS_DIR/
├── speakers/ # Speaker profiles (YAML)
├── embeddings/ # Voice embeddings
├── samples/ # Audio samples per speaker
├── catalog/ # Recording catalog entries
├── assignments/ # Speaker assignments
└── cache/ # LLM response cache
Currently supports:
- Speechmatics - Primary STT backend with diarization
- AssemblyAI - Alternative transcript format support
- stt-in-batch - Batch speech-to-text pipeline that integrates with speaker-diarization-toolkit for end-to-end transcription with speaker identification
MIT License - See LICENSE file for details.
Contributions welcome! Please see the development notes in *.DEV_NOTES.md files for implementation details.