Skip to content

Nexus - Next-Generation Distributed Version Control System

Notifications You must be signed in to change notification settings

bhaskarvilles/dvc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Nexus - Next-Generation Distributed Version Control System

Rust License

A modern distributed version control system designed for monorepos, massive binaries, and AI model versioning.

πŸš€ Features

  • Content-Addressable Storage: BLAKE3-based hashing with chunk-level deduplication
  • CRDT-Based Merging: Conflict-free merges using Automerge for automatic conflict resolution
  • Partial Clones: Smart filtering with semantic queries (e.g., "only final models", "checkpoints > 90% accuracy")
  • AI Model Versioning: Native support for tracking experiments, hyperparameters, metrics, and model lineage
  • Large Binary Support: Efficient chunking and compression for multi-GB files
  • Semantic History: Query models by metrics, experiments, and lineage

🎯 Target Users

  • AI/ML Companies: Version control for models, datasets, and experiments
  • Game Studios: Manage large binary assets (textures, models, audio)
  • Infrastructure Teams: Handle monorepos with mixed content types

πŸ“¦ Installation

From Source

git clone https://github.com/bhaskarvilles/dvc.git
cd dvc
cargo build --release
cargo install --path .

πŸ”§ Quick Start

Initialize a Repository

nexus init my-project
cd my-project

Add and Commit Files

nexus add .
nexus commit -m "Initial commit"

Create Branches

nexus branch feature/new-model

Partial Clone

Clone only specific files or models:

# Clone only Python files
nexus clone --partial --filter "*.py" https://example.com/repo.git

# Clone only final models with accuracy > 0.9
nexus clone --partial --filter "semantic:metric_threshold:accuracy:0.9" https://example.com/repo.git

AI Model Versioning

Track model metadata in commits:

# Commit with model metadata
nexus commit -m "Trained ResNet50" \
  --metadata model_name=resnet50 \
  --metadata accuracy=0.95 \
  --metadata framework=pytorch

πŸ“š Documentation

Core Concepts

Content-Addressable Storage

All objects are stored using BLAKE3 hashing, ensuring:

  • Deduplication: Identical content stored once
  • Integrity: Content verified on retrieval
  • Efficiency: Chunk-level deduplication for large files

CRDT Merges

Nexus uses Conflict-free Replicated Data Types (CRDTs) for automatic merge resolution:

  • Text files: Operational transformation
  • JSON files: Map-based CRDTs
  • Binary files: Three-way merge fallback

Semantic History

Track AI models with rich metadata:

  • Hyperparameters
  • Training metrics
  • Dataset information
  • Model lineage (fine-tuning chains)
  • Experiment grouping

Commands

# Repository management
nexus init [path]              # Initialize repository
nexus status                   # Show working directory status
nexus log [-n count]           # Show commit history

# Version control
nexus add <files>              # Stage files
nexus commit -m "message"      # Create commit
nexus branch [name]            # Create/list branches
nexus merge <branch>           # Merge branches

# Remote operations
nexus clone <url> [path]       # Clone repository
nexus push [remote] [branch]   # Push changes
nexus pull [remote] [branch]   # Pull changes

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚           CLI Interface                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
              β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚       Repository Manager                β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                β”‚
β”‚  β”‚ Content-Addressable β”‚                β”‚
β”‚  β”‚     Storage (CAS)   β”‚                β”‚
β”‚  β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€                β”‚
β”‚  β”‚  β€’ BLAKE3 Hashing   β”‚                β”‚
β”‚  β”‚  β€’ Compression      β”‚                β”‚
β”‚  β”‚  β€’ Deduplication    β”‚                β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                β”‚
β”‚                                          β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”               β”‚
β”‚  β”‚   CRDT Merge Engine  β”‚               β”‚
β”‚  β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€               β”‚
β”‚  β”‚  β€’ Text Merging      β”‚               β”‚
β”‚  β”‚  β€’ JSON Merging      β”‚               β”‚
β”‚  β”‚  β€’ Binary Fallback   β”‚               β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜               β”‚
β”‚                                          β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”               β”‚
β”‚  β”‚  Partial Clone       β”‚               β”‚
β”‚  β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€               β”‚
β”‚  β”‚  β€’ Path Filters      β”‚               β”‚
β”‚  β”‚  β€’ Size Filters      β”‚               β”‚
β”‚  β”‚  β€’ Semantic Filters  β”‚               β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜               β”‚
β”‚                                          β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”               β”‚
β”‚  β”‚  Semantic History    β”‚               β”‚
β”‚  β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€               β”‚
β”‚  β”‚  β€’ Model Metadata    β”‚               β”‚
β”‚  β”‚  β€’ Experiment Track  β”‚               β”‚
β”‚  β”‚  β€’ Lineage Graph     β”‚               β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜               β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ§ͺ Development

Build

cargo build

Run Tests

cargo test

Run Benchmarks

cargo bench

Code Coverage

cargo tarpaulin --out Html

🀝 Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

πŸ“„ License

Licensed under either of:

at your option.

πŸ™ Acknowledgments

  • Automerge for CRDT implementation
  • BLAKE3 for fast hashing
  • Git for inspiration and design patterns

πŸ—ΊοΈ Roadmap

  • Core version control primitives
  • Content-addressable storage
  • CRDT-based merging
  • Partial clone support
  • Semantic history for AI models
  • Real-time collaboration (Phase 2)
  • WebSocket-based sync
  • Distributed garbage collection
  • Performance optimizations for 100GB+ repos

πŸ“§ Contact

For questions or support, please open an issue on GitHub.

About

Nexus - Next-Generation Distributed Version Control System

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages