Skip to content

Rewrite personal-graph in Rust with custom storage engine#56

Open
sutyum wants to merge 2 commits intomainfrom
claude/refactor-rust-platform-U0a3Y
Open

Rewrite personal-graph in Rust with custom storage engine#56
sutyum wants to merge 2 commits intomainfrom
claude/refactor-rust-platform-U0a3Y

Conversation

@sutyum
Copy link
Member

@sutyum sutyum commented Feb 19, 2026

Summary

This is a major architectural rewrite of personal-graph from Python to Rust. The entire codebase has been restructured around a custom storage engine and vector index implementation, replacing the previous SQLite/TursoDB-based approach.

Key Changes

  • Complete language migration: Converted from Python to Rust for better performance and type safety

  • Custom storage engine: Implemented a from-scratch page-based storage engine (pg-storage) with:

    • Fixed-size page layout with header management
    • Buffer pool manager with LRU eviction policy
    • Write-Ahead Log (WAL) for crash recovery
    • B+Tree index for NodeId → PageId mapping
  • Vector indexing: Built HNSW (Hierarchical Navigable Small World) vector index (pg-vector) from scratch with distance metrics (Euclidean/L2)

  • Core type system (pg-core): Defined fundamental data structures:

    • Node and Edge types with property maps
    • NodeId and other ID types
    • Error handling and trait definitions
  • Graph engine (pg-graph): High-level API combining storage and vector index with:

    • Graph traversal algorithms
    • Engine interface for graph operations
  • HTTP API server (pg-server): REST API handlers for graph operations using Axum web framework

  • Workspace structure: Organized as Rust workspace with modular crates for separation of concerns

  • Removed Python artifacts: Deleted all Python source files, FHIR ontology files, examples, tests, and documentation that were specific to the old implementation

Notable Implementation Details

  • Storage engine uses fixed-size pages with header metadata for efficient I/O
  • Buffer pool implements LRU eviction to manage memory efficiently
  • WAL ensures durability and crash recovery
  • HNSW implementation provides efficient approximate nearest neighbor search for vector queries
  • Property system uses ordered floats and serialization for flexible attribute storage
  • Modular crate design allows independent testing and reuse of components

https://claude.ai/code/session_01BL34LfrRWc1ciedT3asAbu

Replace the entire Python codebase with a ground-up Rust implementation.
No database wrappers — custom storage engine with page-oriented layout,
buffer pool, WAL, B+tree indexes, HNSW vector index, and graph traversal.

Architecture (5 crates):
- pg-core: Node/Edge types, PropertyMap, GraphStorage/VectorIndex traits
- pg-storage: 4KB slotted pages, LRU buffer pool, CRC32 WAL, B+tree indexes
- pg-vector: HNSW approximate nearest neighbor search (L2/cosine/inner product)
- pg-graph: PersonalGraph engine with BFS/DFS, shortest path, similarity search
- pg-server: axum HTTP API with full CRUD, batch ops, traversal, vector search

62 tests, zero warnings, all passing.

https://claude.ai/code/session_01BL34LfrRWc1ciedT3asAbu
…her engine

- pg-storage: MVCC transaction manager (TxManager, Transaction RAII guard)
- pg-storage: on-disk B+tree (disk_btree.rs) with WAL-backed writes
- pg-storage: compaction module — reclaims dead MVCC cells from pages
- pg-storage: GraphStore gains tx_manager field and compact() method
- pg-vector: HNSW index binary persistence (save/load) with magic header
- pg-cypher: full new crate — recursive-descent Cypher parser + executor
  - Supports MATCH, CREATE, MERGE, SET, DELETE, RETURN, WITH, UNWIND
  - Node/edge patterns, variable-length hops, property filters, ORDER BY
  - Executor evaluates against pg-graph's GraphBackend trait
- pg-server: POST /query endpoint for OpenCypher queries
- Fix lexer: 2..5 range token not misread as Float(2.0)
- Fix parser: Token::Dash replaced with Token::Minus (lexer never emits Dash)
- Fix parser: node variable detection for bare (n) patterns
- All 101 tests passing across workspace

https://claude.ai/code/session_01BL34LfrWc1ciedT3asAbu
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants