|
| 1 | +# ZGraph — Unified Master Plan (Reconciled) |
| 2 | + |
| 3 | +This document merges the current repository state with the idealized goals and becomes the **single source of truth**. Items are marked as: |
| 4 | +- ✅ Done |
| 5 | +- 🚧 Partial / scaffolding present |
| 6 | +- ⭕ Not started |
| 7 | + |
| 8 | +--- |
| 9 | + |
| 10 | +## A. Semantics & Types |
| 11 | +- ✅ A1. Compile-time graph semantics (`directed`, `weighted`) across storages |
| 12 | +- ✅ A2. Self-loops & multi-edges covered by tests |
| 13 | +- ⭕ A3. Attributes (node/edge KV with typed values) |
| 14 | +- ✅ A4. Numeric node IDs (u64/usize) in use |
| 15 | +- ⭕ A5. Storage-agnostic iterators (common trait iterators) |
| 16 | + |
| 17 | +## B. Storage Backends (in-memory) |
| 18 | +- ✅ B1. AdjacencyList: CRUD + neighbor ops + tests |
| 19 | +- ✅ B2. AdjacencyMatrix: CRUD + present bitset + tests |
| 20 | +- ✅ B3. IncidenceMatrix: rows, sign/weight semantics, edge_set, parallel builder, tests |
| 21 | +- ⭕ B4. CSR / ReverseCSR indices (standalone, optional overlays) |
| 22 | +- ⭕ B5. Attribute storage (typed, columnar, interning) |
| 23 | + |
| 24 | +## C. Common Trait Surface |
| 25 | +- ✅ C1. `GraphStorage(type, directed, weighted)` factory |
| 26 | +- ✅ C2. Core ops: init/deinit/add/remove/has/getNeighbors |
| 27 | +- ⭕ C3. Attribute API (node/edge getters/setters) |
| 28 | +- ⭕ C4. Iteration policy & attribute filters |
| 29 | + |
| 30 | +## D. Conversion Strategies |
| 31 | +- ⭕ D1–D7. Strategies + API + tests (`Duplicate`, `Streamed`, `Chunked`, `CopyOnWrite`) |
| 32 | + |
| 33 | +## E. File Formats |
| 34 | +- 🚧 E1. `.zgraph` text spec documented; runtime reader/writer pending |
| 35 | +- 🚧 E2. `.zgraphb` binary spec documented; runtime reader/writer pending |
| 36 | + |
| 37 | +## F. Algorithms |
| 38 | +- 🚧 F1. Traversal (BFS/DFS/CC) — source files exist; unify over trait; add tests |
| 39 | +- 🚧 F2. Shortest paths (Dijkstra/Bellman-Ford/Floyd-Warshall) — wire and test |
| 40 | +- 🚧 F3. Connectivity (SCC) — add Tarjan/Kosaraju; tests |
| 41 | +- 🚧 F4. Flow (Edmonds-Karp present; add Dinic); tests |
| 42 | +- 🚧 F5. Centrality (PageRank present; add Betweenness); tests |
| 43 | +- 🚧 F6. Spectral (Laplacian, eigen routines present); tests & trait integration |
| 44 | + |
| 45 | +## G. Indices & Acceleration |
| 46 | +- ⭕ G1. CSR build/use |
| 47 | +- ⭕ G2. Reverse CSR (directed) |
| 48 | +- ⭕ G3. Degree table |
| 49 | +- ⭕ G4. Index persistence in `.zgraphb` optional blocks |
| 50 | +- ⭕ G5. Rebuild-on-demand hooks |
| 51 | + |
| 52 | +## H. Attributes |
| 53 | +- ⭕ H1. Node & Edge KV (`int|float|bool|string`) |
| 54 | +- ⭕ H2. Storage-agnostic attribute map with typed accessors |
| 55 | +- ⭕ H3. Text↔Binary column mapping (string table) |
| 56 | +- ⭕ H4. Attribute filters in iterators |
| 57 | +- ⭕ H5. Tests for mixed types, missing values, interning |
| 58 | + |
| 59 | +## I. I/O & CLI |
| 60 | +- ⭕ I1. CLI subcommands: `convert`, `validate`, `build-index`, `stats` |
| 61 | +- ⭕ I2. Streaming I/O (chunked) for both formats |
| 62 | +- ⭕ I3. zstd dictionaries & auto-tuning |
| 63 | + |
| 64 | +## J. Concurrency & Memory |
| 65 | +- ✅ J0. IncidenceMatrix parallel builder + mutex guarding |
| 66 | +- ⭕ J1. Threading doc & invariants |
| 67 | +- ⭕ J2. Parallel builders for CSR & conversions where safe |
| 68 | +- ⭕ J3. Allocator strategy knobs (GPA/Arena/Page) + docs |
| 69 | +- ⭕ J4. Zero-copy `.zgraphb` readers (mmap) |
| 70 | +- ⭕ J5. Stress tests (OOM behavior) |
| 71 | + |
| 72 | +## K. Testing & Quality |
| 73 | +- ✅ K1. Unit tests for storages |
| 74 | +- ⭕ K2. Property tests (round-trips, cross-storage equivalence) |
| 75 | +- ⭕ K3. Fuzzers for text/binary parsers |
| 76 | +- ⭕ K4. Benchmarks (micro/macro) and tracked results |
| 77 | +- ⭕ K5. CI (Win/macOS/Linux; Zig 0.16.x) |
| 78 | +- ⭕ K6. `zig fmt` + static checks gates |
| 79 | + |
| 80 | +## L. Documentation |
| 81 | +- ✅ L1. README |
| 82 | +- 🚧 L2. Detailed specs for `.zgraph`/`.zgraphb` (needs sync with runtime) |
| 83 | +- ⭕ L3. Algorithm docs and complexity notes |
| 84 | +- ⭕ L4. Conversion strategy doc (memory math & decision table) |
| 85 | +- ⭕ L5. Perf tuning guide (allocators, cache, zstd params) |
| 86 | +- ⭕ L6. Roadmap milestones mapping to this checklist |
| 87 | + |
| 88 | +--- |
| 89 | + |
| 90 | +# Execution Order (Do This Next, Exactly) |
| 91 | + |
| 92 | +## Phase 1 — Lock the Core Interfaces |
| 93 | +1. **Define common trait surface (`GraphLike`)** with adapters over existing storages: |
| 94 | + - `neighbors(u)`, `hasEdge(u,v)`, `weight(u,v)?`, `nodeCount()`, `edgeCount()` |
| 95 | +2. **Iterator policy**: stable order + attribute filter hooks; basic `NodeIter`, `EdgeFromIter` |
| 96 | + |
| 97 | +## Phase 2 — Attributes (foundation for formats) |
| 98 | +3. **Typed attribute store** (node & edge): `int|float|bool|string` + string interning |
| 99 | +4. **Attribute API** on the trait surface + tests |
| 100 | +5. **Iterator filters** using attributes |
| 101 | + |
| 102 | +## Phase 3 — File Formats (runtime) |
| 103 | +6. **`.zgraph` reader/writer** (streaming CSV-ish with schemas; tolerant of unknown sections); round-trip tests |
| 104 | +7. **`.zgraphb` reader/writer** (chunked blocks + zstd + mmap-friendly); round-trip & parity tests |
| 105 | +8. **Text↔Binary parity**: load(text)->save(binary)->load(binary) == load(text) |
| 106 | + |
| 107 | +## Phase 4 — Conversion Strategies (memory-bounded) |
| 108 | +9. `convertStorage` API + **Duplicate** strategy |
| 109 | +10. **Streamed** strategy (destroy-as-you-go option; memory ceiling tests) |
| 110 | +11. **Chunked** strategy (node sharding; parallel) |
| 111 | +12. **Copy-On-Write** adapter (lazy migration) + background compaction |
| 112 | +13. **Invariants & ceilings** property tests |
| 113 | + |
| 114 | +## Phase 5 — Indices & Acceleration |
| 115 | +14. **CSR/ReverseCSR** build & use as optional overlays for any storage |
| 116 | +15. **Degree table** and **index persistence** blocks in `.zgraphb` |
| 117 | +16. **Rebuild-on-demand** hooks |
| 118 | + |
| 119 | +## Phase 6 — Algorithms (unify + verify) |
| 120 | +17. Traversal (BFS/DFS/CC) over trait + CSR; tests across storages |
| 121 | +18. Shortest paths (Dijkstra/BF/FW) with layout decision table; tests |
| 122 | +19. Connectivity/SCC (Tarjan/Kosaraju); tests |
| 123 | +20. Flow (finish Dinic); tests |
| 124 | +21. Centrality (PageRank + Betweenness); tests |
| 125 | +22. Spectral (laplacian, eigen); tests & perf |
| 126 | + |
| 127 | +## Phase 7 — CLI, Streaming I/O, Docs, Quality Gates |
| 128 | +23. CLI subcommands: `convert`, `validate`, `build-index`, `stats` |
| 129 | +24. Streaming I/O for both formats + zstd dictionaries |
| 130 | +25. Documentation: conversion strategies, perf tuning, thread model; keep specs synced |
| 131 | +26. Property tests, fuzzers, benches; CI on all platforms; formatting/static checks gates |
| 132 | + |
| 133 | +--- |
| 134 | + |
| 135 | +## Acceptance Criteria (per phase, condensed) |
| 136 | +- **Formats:** Round-trips preserve counts, attributes, weights; unknown sections/blocks ignored. |
| 137 | +- **Conversion:** Peak memory measured below ceiling; equivalence of neighbor sets/weights post-convert. |
| 138 | +- **Indices:** CSR neighbors == storage neighbors; persisted indices reload correctly. |
| 139 | +- **Algorithms:** Identical results across storages (given same semantics); perf within expected bounds. |
| 140 | +- **CLI:** Non-zero exit on invalid files; stats match library queries; build-index regenerates CSR. |
| 141 | +- **Docs:** Examples compile; decision tables match implemented behavior. |
0 commit comments