A modern distributed version control system designed for monorepos, massive binaries, and AI model versioning.
- Content-Addressable Storage: BLAKE3-based hashing with chunk-level deduplication
- CRDT-Based Merging: Conflict-free merges using Automerge for automatic conflict resolution
- Partial Clones: Smart filtering with semantic queries (e.g., "only final models", "checkpoints > 90% accuracy")
- AI Model Versioning: Native support for tracking experiments, hyperparameters, metrics, and model lineage
- Large Binary Support: Efficient chunking and compression for multi-GB files
- Semantic History: Query models by metrics, experiments, and lineage
- AI/ML Companies: Version control for models, datasets, and experiments
- Game Studios: Manage large binary assets (textures, models, audio)
- Infrastructure Teams: Handle monorepos with mixed content types
git clone https://github.com/bhaskarvilles/dvc.git
cd dvc
cargo build --release
cargo install --path .nexus init my-project
cd my-projectnexus add .
nexus commit -m "Initial commit"nexus branch feature/new-modelClone only specific files or models:
# Clone only Python files
nexus clone --partial --filter "*.py" https://example.com/repo.git
# Clone only final models with accuracy > 0.9
nexus clone --partial --filter "semantic:metric_threshold:accuracy:0.9" https://example.com/repo.gitTrack model metadata in commits:
# Commit with model metadata
nexus commit -m "Trained ResNet50" \
--metadata model_name=resnet50 \
--metadata accuracy=0.95 \
--metadata framework=pytorchAll objects are stored using BLAKE3 hashing, ensuring:
- Deduplication: Identical content stored once
- Integrity: Content verified on retrieval
- Efficiency: Chunk-level deduplication for large files
Nexus uses Conflict-free Replicated Data Types (CRDTs) for automatic merge resolution:
- Text files: Operational transformation
- JSON files: Map-based CRDTs
- Binary files: Three-way merge fallback
Track AI models with rich metadata:
- Hyperparameters
- Training metrics
- Dataset information
- Model lineage (fine-tuning chains)
- Experiment grouping
# Repository management
nexus init [path] # Initialize repository
nexus status # Show working directory status
nexus log [-n count] # Show commit history
# Version control
nexus add <files> # Stage files
nexus commit -m "message" # Create commit
nexus branch [name] # Create/list branches
nexus merge <branch> # Merge branches
# Remote operations
nexus clone <url> [path] # Clone repository
nexus push [remote] [branch] # Push changes
nexus pull [remote] [branch] # Pull changesβββββββββββββββββββββββββββββββββββββββββββ
β CLI Interface β
βββββββββββββββ¬ββββββββββββββββββββββββββββ
β
βββββββββββββββΌββββββββββββββββββββββββββββ
β Repository Manager β
βββββββββββββββ¬ββββββββββββββββββββββββββββ€
β ββββββββββββΌβββββββββββ β
β β Content-Addressable β β
β β Storage (CAS) β β
β βββββββββββββββββββββββ€ β
β β β’ BLAKE3 Hashing β β
β β β’ Compression β β
β β β’ Deduplication β β
β βββββββββββββββββββββββ β
β β
β ββββββββββββββββββββββββ β
β β CRDT Merge Engine β β
β ββββββββββββββββββββββββ€ β
β β β’ Text Merging β β
β β β’ JSON Merging β β
β β β’ Binary Fallback β β
β ββββββββββββββββββββββββ β
β β
β ββββββββββββββββββββββββ β
β β Partial Clone β β
β ββββββββββββββββββββββββ€ β
β β β’ Path Filters β β
β β β’ Size Filters β β
β β β’ Semantic Filters β β
β ββββββββββββββββββββββββ β
β β
β ββββββββββββββββββββββββ β
β β Semantic History β β
β ββββββββββββββββββββββββ€ β
β β β’ Model Metadata β β
β β β’ Experiment Track β β
β β β’ Lineage Graph β β
β ββββββββββββββββββββββββ β
ββββββββββββββββββββββββββββββββββββββββββββ
cargo buildcargo testcargo benchcargo tarpaulin --out HtmlContributions are welcome! Please see CONTRIBUTING.md for guidelines.
Licensed under either of:
- Apache License, Version 2.0 (LICENSE-APACHE)
- MIT license (LICENSE-MIT)
at your option.
- Core version control primitives
- Content-addressable storage
- CRDT-based merging
- Partial clone support
- Semantic history for AI models
- Real-time collaboration (Phase 2)
- WebSocket-based sync
- Distributed garbage collection
- Performance optimizations for 100GB+ repos
For questions or support, please open an issue on GitHub.