VideoAnnotator

Automated video analysis toolkit for human interaction research - Extract comprehensive behavioral annotations from videos using AI pipelines, with an intuitive web interface for visualization and analysis.

🎯 What is VideoAnnotator?

VideoAnnotator automatically analyzes videos of human interactions and extracts rich behavioral data including:

👥 Person tracking - Multi-person detection and pose estimation with persistent IDs
😊 Facial analysis - Emotions, expressions, gaze direction, and action units
🎬 Scene detection - Environment classification and temporal segmentation
🎤 Audio analysis - Speech recognition, speaker identification, and emotion detection

Perfect for researchers studying parent-child interactions, social behavior, developmental psychology, and human-computer interaction.

🖥️ Complete Solution: Processing + Visualization

VideoAnnotator provides both automated processing and interactive visualization:

📹 VideoAnnotator (this repository)

AI-powered video processing pipeline

Processes videos to extract behavioral annotations
REST API for integration with research workflows
Supports batch processing and custom configurations
Outputs standardized JSON data

🌐 Video Annotation Viewer

Interactive web-based visualization tool

Load and visualize VideoAnnotator results
Synchronized video playback with annotation overlays
Timeline scrubbing with pose, face, and audio data
Export tools for further analysis

Complete workflow: Your Videos → [VideoAnnotator Processing] → Annotation Data → [Video Annotation Viewer] → Interactive Analysis

🚀 Get Started (Docker)

Recommended: Run VideoAnnotator in Docker for the most reliable experience (consistent dependencies, easier GPU support, fewer host setup issues).

1. Start in Docker (Recommended)

CPU (works anywhere):

docker compose up --build

GPU (faster processing; requires NVIDIA Container Toolkit):

docker compose --profile gpu up --build videoannotator-gpu

Then open the interactive API docs at http://localhost:18011/docs.

If you want to initialize the database and create an admin API key explicitly:

docker compose exec videoannotator setupdb --admin-email you@example.com --admin-username you

2. Local Install (Advanced)

# Install modern Python package manager
curl -LsSf https://astral.sh/uv/install.sh | sh  # Linux/Mac
# powershell -c "irm https://astral.sh/uv/install.ps1 | iex"  # Windows

# Clone and install
git clone https://github.com/InfantLab/VideoAnnotator.git
cd VideoAnnotator
uv sync  # Fast dependency installation (30 seconds)

# Initialize the local database (creates tables + admin user/token)
uv run videoannotator setup-db --admin-email you@example.com --admin-username you

Optional: Command Shortcuts (Containers)

If you are using the provided Docker/devcontainer images, a few convenience commands are available on PATH. These are optional shortcuts; the canonical CLI remains uv run videoannotator ....

If you are running via Docker Compose, you can use these shortcuts without “shelling in” manually:

docker compose exec videoannotator setupdb --admin-email you@example.com --admin-username you
docker compose exec videoannotator server --host 0.0.0.0 --port 18011

# If you launched the GPU service instead:
docker compose exec videoannotator-gpu setupdb --admin-email you@example.com --admin-username you
docker compose exec videoannotator-gpu server --host 0.0.0.0 --port 18011

Action	Shortcut	Equivalent
Initialize the database + create an admin token	`setupdb --admin-email you@example.com --admin-username you`	`uv run videoannotator setup-db --admin-email you@example.com --admin-username you`
Run the VideoAnnotator CLI (any subcommand)	`va ...`	`uv run videoannotator ...`
Start the API server (recommended defaults)	`va`	`uv run videoannotator`
Start the API server (explicit subcommand)	`server ...`	`uv run videoannotator server ...`
Generate a new API token	`newtoken ...`	`uv run videoannotator generate-token ...`
Run all tests (quick/quiet)	`vatest`	`uv run pytest -q`
Run some tests (quick/quiet)	`vatest tests/unit/`	`uv run pytest -q tests/unit/`

For more copy-pasteable CLI workflows, see docs/usage/demo_commands.md.

3. Start Processing Videos

# Start the API server
uv run videoannotator  # Local install (advanced). In Docker: `docker compose up`
# Use the API key printed by `setup-db` (or the server's first-start output)

# Process your first video (in another terminal)
curl -X POST "http://localhost:18011/api/v1/jobs/" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "video=@your_video.mp4" \
  -F "selected_pipelines=person,face,scene,audio"

# Check results at http://localhost:18011/docs

4. Visualize Results

# Install the companion web viewer
git clone https://github.com/InfantLab/video-annotation-viewer.git
cd video-annotation-viewer
npm install
npm run dev

Note: Ensure Node and NPM are installed. On macOS with Homebrew:
brew install node

# Open http://localhost:3000 and load your VideoAnnotator results

🎉 That's it! You now have both automated video processing and interactive visualization.

🧠 AI Pipelines & Capabilities

Authoritative pipeline metadata (names, tasks, modalities, capabilities) is generated from the registry:

Pipeline specification table: docs/pipelines_spec.md (auto-generated; do not edit by hand)
Emotion output format spec: docs/specs/emotion_output_format.md

Additional Specs:

Output Naming Conventions: docs/specs/output_naming_conventions.md (stable patterns for downstream tooling)
Emotion Validator Utility: src/validation/emotion_validator.py (programmatic validation of .emotion.json files)
CLI Validation: videoannotator validate-emotion path/to/file.emotion.json returns non-zero exit on failure Client tools (e.g. the Video Annotation Viewer) should rely on those sources or the /api/v1/pipelines endpoint rather than hard-coding pipeline assumptions.

Person Tracking Pipeline

Technology: YOLO11 + ByteTrack multi-object tracking
Outputs: Bounding boxes, pose keypoints, persistent person IDs
Use cases: Movement analysis, social interaction tracking, activity recognition

Face Analysis Pipeline

Technology: OpenFace 3.0, LAION Face, OpenCV backends
Outputs: 68-point landmarks, emotions, action units, gaze direction, head pose
Use cases: Emotional analysis, attention tracking, facial expression studies

Scene Detection Pipeline

Technology: PySceneDetect + CLIP environment classification
Outputs: Scene boundaries, environment labels, temporal segmentation
Use cases: Context analysis, setting classification, behavioral context

Audio Processing Pipeline

Technology: OpenAI Whisper + pyannote speaker diarization
Outputs: Speech transcripts, speaker identification, voice emotions
Use cases: Conversation analysis, language development, vocal behavior

💡 Why VideoAnnotator?

🎯 Built for Researchers

No coding required - Web interface and REST API
Standardized outputs - JSON formats compatible with analysis tools
Reproducible results - Version-controlled processing pipelines
Batch processing - Handle multiple videos efficiently

🔬 Research-Grade Accuracy

State-of-the-art models - YOLO11, OpenFace 3.0, Whisper
Validated pipelines - Tested on developmental psychology datasets
Comprehensive metrics - Confidence scores, validation tools
Flexible configuration - Adjust parameters for your research needs

⚡ Production Ready

Fast processing - GPU acceleration, optimized pipelines
Scalable architecture - Docker containers, API-first design
Cross-platform - Windows, macOS, Linux support
Enterprise features - Authentication, logging, monitoring

🔒 Privacy & Data Protection

100% Local Processing - All analysis runs on your hardware, no cloud dependencies
No Data Transmission - Videos and results never leave your infrastructure
GDPR Compliant - Full control over sensitive research data
Foundation Model Free - No external API calls to commercial AI services
Research Ethics Ready - Designed for studies requiring strict data confidentiality

📊 Example Output

VideoAnnotator generates rich, structured data like this:

{
  "person_tracking": [
    {
      "timestamp": 12.34,
      "person_id": 1,
      "bbox": [0.2, 0.3, 0.4, 0.5],
      "pose_keypoints": [...],
      "confidence": 0.87
    }
  ],
  "face_analysis": [
    {
      "timestamp": 12.34,
      "person_id": 1,
      "emotion": "happy",
      "confidence": 0.91,
      "facial_landmarks": [...],
      "gaze_direction": [0.1, -0.2]
    }
  ],
  "scene_detection": [
    {
      "start_time": 0.0,
      "end_time": 45.6,
      "scene_type": "living_room",
      "confidence": 0.95
    }
  ],
  "audio_analysis": [
    {
      "start_time": 1.2,
      "end_time": 3.8,
      "speaker": "adult",
      "transcript": "Look at this toy!",
      "emotion": "excited"
    }
  ]
}

🔗 Integration & Export

Direct Integration

Python: Import JSON data into pandas, matplotlib, seaborn
R: Load data with jsonlite, analyze with tidyverse
MATLAB: Process JSON with built-in functions

Annotation Tools

CVAT: Computer Vision Annotation Tool integration
LabelStudio: Machine learning annotation platform
ELAN: Linguistic annotation software compatibility

Analysis Platforms

Video Annotation Viewer: Interactive web-based analysis (recommended)
Custom dashboards: Build with our REST API
Jupyter notebooks: Examples included in repository

🛠️ Installation & Usage

Method 1: Local Installation (Advanced)

# Modern Python environment
curl -LsSf https://astral.sh/uv/install.sh | sh
git clone https://github.com/InfantLab/VideoAnnotator.git
cd VideoAnnotator
uv sync

# Start processing
uv run videoannotator

Method 2: Docker (Production)

# CPU version (lightweight)
docker build -f Dockerfile.cpu -t videoannotator:cpu .
docker run -p 18011:18011 videoannotator:cpu

# GPU version (faster processing)
docker build -f Dockerfile.gpu -t videoannotator:gpu .
docker run -p 18011:18011 --gpus all videoannotator:gpu

# Development version (pre-cached models)
docker build -f Dockerfile.dev -t videoannotator:dev .
docker run -p 18011:18011 --gpus all videoannotator:dev

Method 3: Research Platform Integration

# Python API for custom workflows
from videoannotator import VideoAnnotator

annotator = VideoAnnotator()
results = annotator.process("video.mp4", pipelines=["person", "face"])

# Analyze results
import pandas as pd
df = pd.DataFrame(results['person_tracking'])
print(f"Detected {df['person_id'].nunique()} unique people")

📚 Documentation & Resources

Resource	Description
📖 Interactive Docs	Complete documentation with examples
🎮 Live API Testing	Interactive API when server is running
🚀 Getting Started Guide	Step-by-step setup and first video
🔧 Installation Guide	Detailed installation instructions
⚙️ Pipeline Specifications	Technical pipeline documentation
🎯 Demo Commands	Example commands and workflows

👥 Research Applications

Developmental Psychology

Parent-child interaction studies with synchronized behavioral coding
Social development research with multi-person tracking
Language acquisition studies with audio-visual alignment

Clinical Research

Autism spectrum behavioral analysis with facial expression tracking
Therapy session analysis with emotion and engagement metrics
Developmental assessment with standardized behavioral measures

Human-Computer Interaction

User experience research with attention and emotion tracking
Interface evaluation with gaze direction and facial feedback
Accessibility studies with comprehensive behavioral data

🏗️ Architecture & Performance

Modern Technology Stack

FastAPI - High-performance REST API with automatic documentation
YOLO11 - State-of-the-art object detection and pose estimation
OpenFace 3.0 - Comprehensive facial behavior analysis
Whisper - Robust speech recognition and transcription
PyTorch - GPU-accelerated machine learning inference

Performance Characteristics

Processing speed: ~2-4x real-time with GPU acceleration
Memory usage: 4-8GB RAM for typical videos
Storage: ~100MB output per hour of video
Accuracy: 90%+ for person detection, 85%+ for emotion recognition

Scalability

Batch processing: Handle multiple videos simultaneously
Container deployment: Docker support for cloud platforms
Distributed processing: API-first design for microservices
Resource optimization: CPU and GPU variants available

🤝 Contributing & Community

Getting Involved

🐛 Report issues: GitHub Issues
💬 Discussions: GitHub Discussions
📧 Contact: Caspar Addyman at infantologist@gmail.com
🔬 Collaborations: Open to research partnerships

Development

Code quality: 83% test coverage, modern Python practices
Documentation: Comprehensive guides and API documentation
CI/CD: Automated testing and deployment pipelines
Standards: Following research software engineering best practices

📄 Citation & License

Citation

If you use VideoAnnotator in your research, please cite:

Addyman, C. (2025). VideoAnnotator: Automated video analysis toolkit for human interaction research.
Zenodo. https://Zenodo. doi.org/10.5281/zenodo.16961751

License

MIT License - Full terms in LICENSE

Funding & Support

The Global Parenting Initiative (Funded by The LEGO Foundation)

🙏 Acknowledgments

Research Team

Caspar Addyman (infantologist@gmail.com) - Lead Developer & Research Director

Open Source Dependencies

Built with and grateful to:

YOLO & Ultralytics - Object detection and tracking
OpenFace 3.0 - Facial behavior analysis
OpenAI Whisper - Speech recognition
FastAPI - Modern web framework
PyTorch - Machine learning infrastructure

Development Tools & AI Assistance

Development was greatly helped by:

Visual Studio Code - Primary development environment
GitHub Copilot - AI pair programming assistance
Claude Code - Architecture design and documentation
GPT-4 & Claude Models - Code generation and debugging help

This project demonstrates how AI-assisted development can accelerate research software creation while maintaining code quality and comprehensive testing.

🎥 Ready to start analyzing videos? Follow the quick start above!

Name		Name	Last commit message	Last commit date
Latest commit History 229 Commits
.claude		.claude
.devcontainer		.devcontainer
.github		.github
.specify		.specify
.vscode		.vscode
client-repo		client-repo
configs		configs
docs		docs
examples		examples
reports		reports
scripts		scripts
specs		specs
src/videoannotator		src/videoannotator
tests		tests
weights		weights
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
.hadolint_last_exit		.hadolint_last_exit
.pre-commit-config.yaml		.pre-commit-config.yaml
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CITATION.cff		CITATION.cff
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile.cpu		Dockerfile.cpu
Dockerfile.dev		Dockerfile.dev
Dockerfile.gpu		Dockerfile.gpu
Dockerfile.oldgpu		Dockerfile.oldgpu
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md
analyze_tests.py		analyze_tests.py
api_server.py		api_server.py
create_test_key.py		create_test_key.py
docker-compose.yml		docker-compose.yml
findimports.py		findimports.py
fix_imports.py		fix_imports.py
fix_imports_systematic.py		fix_imports_systematic.py
get_api_key.py		get_api_key.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock
validate_apis.py		validate_apis.py

License

InfantLab/VideoAnnotator

Folders and files

Latest commit

History

Repository files navigation

VideoAnnotator

🎯 What is VideoAnnotator?

🖥️ Complete Solution: Processing + Visualization

📹 VideoAnnotator (this repository)

🌐 Video Annotation Viewer

🚀 Get Started (Docker)

1. Start in Docker (Recommended)

2. Local Install (Advanced)

Optional: Command Shortcuts (Containers)

3. Start Processing Videos

4. Visualize Results

🧠 AI Pipelines & Capabilities

Person Tracking Pipeline

Face Analysis Pipeline

Scene Detection Pipeline

Audio Processing Pipeline

💡 Why VideoAnnotator?

🎯 Built for Researchers

🔬 Research-Grade Accuracy

⚡ Production Ready

🔒 Privacy & Data Protection

📊 Example Output

🔗 Integration & Export

Direct Integration

Annotation Tools

Analysis Platforms

🛠️ Installation & Usage

Method 1: Local Installation (Advanced)

Method 2: Docker (Production)

Method 3: Research Platform Integration

📚 Documentation & Resources

👥 Research Applications

Developmental Psychology

Clinical Research

Human-Computer Interaction

🏗️ Architecture & Performance

Modern Technology Stack

Performance Characteristics

Scalability

🤝 Contributing & Community

Getting Involved

Development

📄 Citation & License

Citation

License

Funding & Support

🙏 Acknowledgments

Research Team

Open Source Dependencies

Development Tools & AI Assistance

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Contributors 2

Languages

Packages