Skip to content

Commit cc64b75

Browse files
committed
starting to revamp ZGraph
1 parent 05c5639 commit cc64b75

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

47 files changed

+5677
-222
lines changed

.dpignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
.git
2+
.github
3+
.zig-cache

.github/ISSUE_TEMPLATE.md

Whitespace-only changes.

.github/workflows/bench.yml

Whitespace-only changes.

.github/workflows/ci.yml

Whitespace-only changes.

README.md

Lines changed: 216 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,216 @@
1+
# ZGraph
2+
3+
ZGraph is a high-performance, in-memory graph library with:
4+
5+
- **Pluggable storages**: adjacency list, adjacency matrix, incidence matrix.
6+
- **Compile-time graph semantics**: `directed` / `undirected`, `weighted` / `unweighted`.
7+
- **Unified algorithms** that operate across storages via a common trait surface.
8+
- **Portable formats**:
9+
- `.zgraph` – human-readable, human-writable
10+
- `.zgraphb` – compressed, chunked binary for fast I/O / mmap
11+
- **Robust conversion strategies** (duplicate, streamed, chunked, copy-on-write) to switch layouts under tight memory budgets.
12+
13+
ZGraphDB (paged container) builds on the same chunks and APIs; algorithms continue to live in ZGraph.
14+
15+
## Status
16+
17+
- ✅ Core storages compile and pass tests on Zig 0.16.
18+
-`AdjacencyList`, `AdjacencyMatrix`, `IncidenceMatrix` with CRUD + neighbor ops.
19+
-`.zgraph` / `.zgraphb` specifications documented; parsers/writers in progress.
20+
- 🚧 Conversion strategies module & CLI utilities.
21+
22+
See **[ZGraph.Spec.md](./ZGraph.Spec.md)** for the canonical checklist and roadmap.
23+
24+
## Install
25+
26+
```bash
27+
zig build test
28+
zig build run
29+
```
30+
31+
Zig: 0.16.x
32+
33+
## Quick start
34+
35+
```c++
36+
const GS = @import("src/lib/core/graph_storage.zig").GraphStorage;
37+
const Storage = GS(.AdjacencyList, true, true); // directed, weighted
38+
39+
var g = Storage.init(std.heap.page_allocator);
40+
defer g.deinit();
41+
42+
try g.addEdge(0, 1, 3.5);
43+
try g.addEdge(1, 1, 7.5);
44+
45+
if (g.getNeighbors(0)) |edges| {
46+
// ...
47+
}
48+
```
49+
50+
### Convert storage (e.g., list → matrix) with a memory budget
51+
52+
```zig
53+
const conv = @import("src/lib/conversion/strategies.zig");
54+
var dst = try conv.convertStorage(
55+
std.heap.page_allocator,
56+
Storage, // source type
57+
.AdjacencyMatrix, true, true, // dest type + semantics
58+
&g,
59+
.Streamed, // strategy
60+
.{ .max_bytes_in_flight = 4 * 1024 * 1024 * 1024 }, // 4 GiB
61+
);
62+
defer dst.deinit();
63+
```
64+
65+
### Import / Export
66+
67+
```bash
68+
# text -> binary
69+
zig build run -- convert --in graph.zgraph --out graph.zgraphb
70+
71+
# validate and print stats
72+
zig build run -- stats --in graph.zgraphb
73+
```
74+
75+
## Design pillars
76+
77+
- **Predictable performance**: zero-copy views, cache-friendly CSR, zstd block compression.
78+
- **Compatibility**: readers ignore unknown chunks; text & binary round-trip.
79+
- **Safety**: memory ownership is explicit; threads guarded by mutexes where needed.
80+
- **Tested**: unit, property, and fuzz tests; algorithm parity across storages.
81+
82+
## Repo layout (idealized)
83+
84+
```
85+
ZGraph/
86+
├─ README.md
87+
├─ ZGraph.Spec.md # Source of truth (feature checklist)
88+
├─ LICENSE
89+
├─ build.zig
90+
├─ zig.mod
91+
├─ docs/
92+
│ ├─ formats/
93+
│ │ ├─ zgraph-text-spec.md
94+
│ │ └─ zgraphb-binary-spec.md
95+
│ ├─ algorithms/
96+
│ │ ├─ traversal.md
97+
│ │ ├─ shortest_paths.md
98+
│ │ ├─ connectivity.md
99+
│ │ ├─ flow_cut.md
100+
│ │ ├─ centrality.md
101+
│ │ └─ spectral.md
102+
│ ├─ storage_design.md
103+
│ ├─ conversion_strategies.md
104+
│ ├─ perf_tuning.md
105+
│ ├─ safety_and_threading.md
106+
│ └─ roadmap.md
107+
├─ examples/
108+
│ ├─ tiny/
109+
│ │ ├─ triangle.zgraph
110+
│ │ ├─ weighted_digraph.zgraph
111+
│ │ └─ attributes.zgraph
112+
│ ├─ conversions/
113+
│ │ ├─ list_to_matrix.zig
114+
│ │ └─ streamed_live_convert.zig
115+
│ └─ cli/
116+
│ ├─ import_export.zig
117+
│ └─ metrics.zig
118+
├─ src/
119+
│ ├─ root.zig
120+
│ ├─ lib/
121+
│ │ ├─ core/
122+
│ │ │ ├─ node.zig
123+
│ │ │ ├─ edge.zig
124+
│ │ │ ├─ graph_storage.zig
125+
│ │ │ ├─ iterators.zig
126+
│ │ │ └─ attributes.zig
127+
│ │ ├─ data_structures/
128+
│ │ │ ├─ adjacency_list.zig
129+
│ │ │ ├─ adjacency_matrix.zig
130+
│ │ │ └─ incidence_matrix.zig
131+
│ │ ├─ formats/
132+
│ │ │ ├─ zgraph_text.zig
133+
│ │ │ ├─ zgraphb.zig
134+
│ │ │ └─ chunk_provider.zig
135+
│ │ ├─ conversion/
136+
│ │ │ ├─ strategies.zig
137+
│ │ │ ├─ duplicate_convert.zig
138+
│ │ │ ├─ streamed_convert.zig
139+
│ │ │ ├─ chunked_convert.zig
140+
│ │ │ └─ cow_convert.zig
141+
│ │ ├─ indices/
142+
│ │ │ ├─ csr.zig
143+
│ │ │ └─ degree_table.zig
144+
│ │ ├─ algorithms/
145+
│ │ │ ├─ traversal/
146+
│ │ │ │ ├─ bfs.zig
147+
│ │ │ │ └─ dfs.zig
148+
│ │ │ ├─ shortest_paths/
149+
│ │ │ │ ├─ dijkstra.zig
150+
│ │ │ │ ├─ bellman_ford.zig
151+
│ │ │ │ └─ floyd_warshall.zig
152+
│ │ │ ├─ connectivity/
153+
│ │ │ │ ├─ union_find.zig
154+
│ │ │ │ └─ strongly_connected.zig
155+
│ │ │ ├─ flow_cut/
156+
│ │ │ │ ├─ edmonds_karp.zig
157+
│ │ │ │ └─ dinic.zig
158+
│ │ │ ├─ centrality/
159+
│ │ │ │ ├─ pagerank.zig
160+
│ │ │ │ └─ betweenness.zig
161+
│ │ │ └─ spectral/
162+
│ │ │ ├─ laplacian.zig
163+
│ │ │ └─ power_iteration.zig
164+
│ │ ├─ io/
165+
│ │ │ ├─ file.zig
166+
│ │ │ ├─ csv.zig
167+
│ │ │ └─ zstd.zig
168+
│ │ ├─ mem/
169+
│ │ │ ├─ alloc.zig
170+
│ │ │ └─ pool.zig
171+
│ │ └─ util/
172+
│ │ ├─ bitset.zig
173+
│ │ ├─ hash.zig
174+
│ │ └─ time.zig
175+
│ └─ cli/
176+
│ ├─ zgraph.zig
177+
│ └─ subcommands/
178+
│ ├─ convert.zig
179+
│ ├─ validate.zig
180+
│ ├─ build_index.zig
181+
│ └─ stats.zig
182+
├─ tests/
183+
│ ├─ unit/
184+
│ │ ├─ adjacency_list_test.zig
185+
│ │ ├─ adjacency_matrix_test.zig
186+
│ │ ├─ incidence_matrix_test.zig
187+
│ │ ├─ formats_text_test.zig
188+
│ │ ├─ formats_binary_test.zig
189+
│ │ ├─ conversion_strategies_test.zig
190+
│ │ └─ algorithms_smoke_test.zig
191+
│ ├─ property/
192+
│ │ ├─ graph_idempotence_test.zig
193+
│ │ └─ random_graph_agreement_test.zig
194+
│ ├─ fuzz/
195+
│ │ └─ fuzzer.zig
196+
│ └─ data/
197+
│ ├─ tiny_graphs/*.zgraph
198+
│ └─ medium_graphs/*.zgraph
199+
├─ tools/
200+
│ ├─ scripts/
201+
│ │ ├─ gen_random_graph.zig
202+
│ │ └─ bench_matrix_vs_list.zig
203+
│ └─ perf/
204+
│ └─ flamegraph_instructions.md
205+
└─ benches/
206+
├─ micro/
207+
│ ├─ csr_iter_bench.zig
208+
│ └─ parse_zgraph_bench.zig
209+
└─ macro/
210+
├─ bfs_vs_dijkstra_bench.zig
211+
└─ convert_streamed_vs_duplicate.zig
212+
```
213+
214+
## License
215+
216+
MIT

ZGraph.MasterPlan.md

Lines changed: 141 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,141 @@
1+
# ZGraph — Unified Master Plan (Reconciled)
2+
3+
This document merges the current repository state with the idealized goals and becomes the **single source of truth**. Items are marked as:
4+
- ✅ Done
5+
- 🚧 Partial / scaffolding present
6+
- ⭕ Not started
7+
8+
---
9+
10+
## A. Semantics & Types
11+
- ✅ A1. Compile-time graph semantics (`directed`, `weighted`) across storages
12+
- ✅ A2. Self-loops & multi-edges covered by tests
13+
- ⭕ A3. Attributes (node/edge KV with typed values)
14+
- ✅ A4. Numeric node IDs (u64/usize) in use
15+
- ⭕ A5. Storage-agnostic iterators (common trait iterators)
16+
17+
## B. Storage Backends (in-memory)
18+
- ✅ B1. AdjacencyList: CRUD + neighbor ops + tests
19+
- ✅ B2. AdjacencyMatrix: CRUD + present bitset + tests
20+
- ✅ B3. IncidenceMatrix: rows, sign/weight semantics, edge_set, parallel builder, tests
21+
- ⭕ B4. CSR / ReverseCSR indices (standalone, optional overlays)
22+
- ⭕ B5. Attribute storage (typed, columnar, interning)
23+
24+
## C. Common Trait Surface
25+
- ✅ C1. `GraphStorage(type, directed, weighted)` factory
26+
- ✅ C2. Core ops: init/deinit/add/remove/has/getNeighbors
27+
- ⭕ C3. Attribute API (node/edge getters/setters)
28+
- ⭕ C4. Iteration policy & attribute filters
29+
30+
## D. Conversion Strategies
31+
- ⭕ D1–D7. Strategies + API + tests (`Duplicate`, `Streamed`, `Chunked`, `CopyOnWrite`)
32+
33+
## E. File Formats
34+
- 🚧 E1. `.zgraph` text spec documented; runtime reader/writer pending
35+
- 🚧 E2. `.zgraphb` binary spec documented; runtime reader/writer pending
36+
37+
## F. Algorithms
38+
- 🚧 F1. Traversal (BFS/DFS/CC) — source files exist; unify over trait; add tests
39+
- 🚧 F2. Shortest paths (Dijkstra/Bellman-Ford/Floyd-Warshall) — wire and test
40+
- 🚧 F3. Connectivity (SCC) — add Tarjan/Kosaraju; tests
41+
- 🚧 F4. Flow (Edmonds-Karp present; add Dinic); tests
42+
- 🚧 F5. Centrality (PageRank present; add Betweenness); tests
43+
- 🚧 F6. Spectral (Laplacian, eigen routines present); tests & trait integration
44+
45+
## G. Indices & Acceleration
46+
- ⭕ G1. CSR build/use
47+
- ⭕ G2. Reverse CSR (directed)
48+
- ⭕ G3. Degree table
49+
- ⭕ G4. Index persistence in `.zgraphb` optional blocks
50+
- ⭕ G5. Rebuild-on-demand hooks
51+
52+
## H. Attributes
53+
- ⭕ H1. Node & Edge KV (`int|float|bool|string`)
54+
- ⭕ H2. Storage-agnostic attribute map with typed accessors
55+
- ⭕ H3. Text↔Binary column mapping (string table)
56+
- ⭕ H4. Attribute filters in iterators
57+
- ⭕ H5. Tests for mixed types, missing values, interning
58+
59+
## I. I/O & CLI
60+
- ⭕ I1. CLI subcommands: `convert`, `validate`, `build-index`, `stats`
61+
- ⭕ I2. Streaming I/O (chunked) for both formats
62+
- ⭕ I3. zstd dictionaries & auto-tuning
63+
64+
## J. Concurrency & Memory
65+
- ✅ J0. IncidenceMatrix parallel builder + mutex guarding
66+
- ⭕ J1. Threading doc & invariants
67+
- ⭕ J2. Parallel builders for CSR & conversions where safe
68+
- ⭕ J3. Allocator strategy knobs (GPA/Arena/Page) + docs
69+
- ⭕ J4. Zero-copy `.zgraphb` readers (mmap)
70+
- ⭕ J5. Stress tests (OOM behavior)
71+
72+
## K. Testing & Quality
73+
- ✅ K1. Unit tests for storages
74+
- ⭕ K2. Property tests (round-trips, cross-storage equivalence)
75+
- ⭕ K3. Fuzzers for text/binary parsers
76+
- ⭕ K4. Benchmarks (micro/macro) and tracked results
77+
- ⭕ K5. CI (Win/macOS/Linux; Zig 0.16.x)
78+
- ⭕ K6. `zig fmt` + static checks gates
79+
80+
## L. Documentation
81+
- ✅ L1. README
82+
- 🚧 L2. Detailed specs for `.zgraph`/`.zgraphb` (needs sync with runtime)
83+
- ⭕ L3. Algorithm docs and complexity notes
84+
- ⭕ L4. Conversion strategy doc (memory math & decision table)
85+
- ⭕ L5. Perf tuning guide (allocators, cache, zstd params)
86+
- ⭕ L6. Roadmap milestones mapping to this checklist
87+
88+
---
89+
90+
# Execution Order (Do This Next, Exactly)
91+
92+
## Phase 1 — Lock the Core Interfaces
93+
1. **Define common trait surface (`GraphLike`)** with adapters over existing storages:
94+
- `neighbors(u)`, `hasEdge(u,v)`, `weight(u,v)?`, `nodeCount()`, `edgeCount()`
95+
2. **Iterator policy**: stable order + attribute filter hooks; basic `NodeIter`, `EdgeFromIter`
96+
97+
## Phase 2 — Attributes (foundation for formats)
98+
3. **Typed attribute store** (node & edge): `int|float|bool|string` + string interning
99+
4. **Attribute API** on the trait surface + tests
100+
5. **Iterator filters** using attributes
101+
102+
## Phase 3 — File Formats (runtime)
103+
6. **`.zgraph` reader/writer** (streaming CSV-ish with schemas; tolerant of unknown sections); round-trip tests
104+
7. **`.zgraphb` reader/writer** (chunked blocks + zstd + mmap-friendly); round-trip & parity tests
105+
8. **Text↔Binary parity**: load(text)->save(binary)->load(binary) == load(text)
106+
107+
## Phase 4 — Conversion Strategies (memory-bounded)
108+
9. `convertStorage` API + **Duplicate** strategy
109+
10. **Streamed** strategy (destroy-as-you-go option; memory ceiling tests)
110+
11. **Chunked** strategy (node sharding; parallel)
111+
12. **Copy-On-Write** adapter (lazy migration) + background compaction
112+
13. **Invariants & ceilings** property tests
113+
114+
## Phase 5 — Indices & Acceleration
115+
14. **CSR/ReverseCSR** build & use as optional overlays for any storage
116+
15. **Degree table** and **index persistence** blocks in `.zgraphb`
117+
16. **Rebuild-on-demand** hooks
118+
119+
## Phase 6 — Algorithms (unify + verify)
120+
17. Traversal (BFS/DFS/CC) over trait + CSR; tests across storages
121+
18. Shortest paths (Dijkstra/BF/FW) with layout decision table; tests
122+
19. Connectivity/SCC (Tarjan/Kosaraju); tests
123+
20. Flow (finish Dinic); tests
124+
21. Centrality (PageRank + Betweenness); tests
125+
22. Spectral (laplacian, eigen); tests & perf
126+
127+
## Phase 7 — CLI, Streaming I/O, Docs, Quality Gates
128+
23. CLI subcommands: `convert`, `validate`, `build-index`, `stats`
129+
24. Streaming I/O for both formats + zstd dictionaries
130+
25. Documentation: conversion strategies, perf tuning, thread model; keep specs synced
131+
26. Property tests, fuzzers, benches; CI on all platforms; formatting/static checks gates
132+
133+
---
134+
135+
## Acceptance Criteria (per phase, condensed)
136+
- **Formats:** Round-trips preserve counts, attributes, weights; unknown sections/blocks ignored.
137+
- **Conversion:** Peak memory measured below ceiling; equivalence of neighbor sets/weights post-convert.
138+
- **Indices:** CSR neighbors == storage neighbors; persisted indices reload correctly.
139+
- **Algorithms:** Identical results across storages (given same semantics); perf within expected bounds.
140+
- **CLI:** Non-zero exit on invalid files; stats match library queries; build-index regenerates CSR.
141+
- **Docs:** Examples compile; decision tables match implemented behavior.

0 commit comments

Comments
 (0)