From 997a648fb5a85821d121d05ecb7a746b44bece28 Mon Sep 17 00:00:00 2001 From: Christian Boos Date: Thu, 12 Feb 2026 23:39:52 +0100 Subject: [PATCH 01/23] Add DAG-based message architecture spec Design doc for replacing timestamp-based ordering with parentUuid graph traversal. Covers session trees, junction points, agent transcript splicing, and phased implementation plan (A-D). Foundation for #79, #85, #90, #91. Co-Authored-By: Claude Opus 4.6 --- dev-docs/dag.md | 287 +++++++++++++++++++++++++++++ dev-docs/rendering-architecture.md | 1 + 2 files changed, 288 insertions(+) create mode 100644 dev-docs/dag.md diff --git a/dev-docs/dag.md b/dev-docs/dag.md new file mode 100644 index 00000000..e7583d8f --- /dev/null +++ b/dev-docs/dag.md @@ -0,0 +1,287 @@ +# DAG-Based Message Architecture + +Replaces timestamp-based ordering with `parentUuid` → `uuid` graph traversal. + +Reference: [Messages as Commits: Claude Code's Git-Like DAG of Conversations](https://piebald.ai/blog/messages-as-commits-claude-codes-git-like-dag-of-conversations) + +Related issues: #79, #85, #90, #91 + +--- + +## Motivation + +Currently, messages are sorted by timestamp and then patched with post-hoc +fixups (pair reordering, sidechain reordering by `agentId`). This is fragile: + +- **Sync agents**: Works "well enough" because timestamps align with causality +- **Async agents** (#90): Agent runs in background; launch and notification + are temporally distant; agent transcript interleaves arbitrarily +- **Teammates** (#91): Multiple agents send messages concurrently +- **Resume/fork** (#85): Conversation branches share a prefix; timestamp + ordering can't express the branching structure + +The transcript data already contains the structural information we need: +each message's `parentUuid` points to its predecessor, forming a DAG. + +--- + +## Core Concepts + +### The DAG + +Every message has a `uuid` and a `parentUuid` (null for first messages). +Together they form a directed acyclic graph. The graph is the authoritative +ordering; timestamps are metadata, not structure. + +### Sessions and DAG-lines + +A **session** is the set of messages sharing a `sessionId`. Each session +forms a single contiguous chain in the DAG — its **DAG-line**. A session's +DAG-line contains only the messages unique to that session (after +deduplication). + +**Assertion**: Within a session, the `parentUuid` chain is linear (no +branching). If data violates this, we log a warning and fall back to +timestamp ordering within that session. + +### Junction Points + +A **junction point** is a message whose `uuid` is referenced as +`parentUuid` by messages from **different sessions**. This is where +resume/fork happens. + +Junction points are **annotations on messages**, not splits of DAG-lines. +A session's DAG-line remains intact; the junction point simply records +"session N forks/continues from here." + +### Session Tree + +Sessions form a tree: + +- **Root sessions**: Their first message has `parentUuid: null` (or points + to a message not in any loaded session, e.g. after a `/clear`) +- **Child sessions**: Their first unique message's `parentUuid` points into + a parent session's DAG-line + +Children are ordered chronologically (by their first message's timestamp). + +Example: + +``` +Session 1: a → b → c → d → e → f → g + ↑ ↑ + | | +Session 3: k → l → m Session 2: h → i → j +(fork from e) (continues from g) +``` + +Session tree: +``` +- Session 1 + - Session 2 (continues from g) + - Session 3 (forks from e) +``` + +Rendered message sequence (depth-first, chronological children): +``` +s1, a, b, c, d, e, f, g, s2, h, i, j, s3, k, l, m +``` + +Where `s1`, `s2`, `s3` are synthesized session header messages. + +### Navigation Links + +- **Forward links** on junction points: "Session N forks/continues here" + (shown on message `e` and `g` in the example above) +- **Backlinks** on session headers: "Continues from message X in Session Y" + (shown on `s2` and `s3`) + +### Deduplication + +When session 2 resumes session 1, Claude Code may replay prefix messages +(d', e', f', g') into session 2's file. These duplicates share the same +`uuid` but have a different `sessionId`. + +Resolution: deduplicate by `uuid`, keeping the instance from the +**earliest session** (by first message timestamp). The "new" messages in +session 2 (those with previously-unseen `uuid`) form its DAG-line. + +### Agent Transcripts + +Agent transcripts also form DAG-lines. They come in two flavors: + +1. **Continuing agents**: Their `parentUuid` chains into a previous agent's + DAG-line (same session, different `agentId`). These naturally fit the + DAG. + +2. **Top-level agents**: `parentUuid` is null. These need explicit + **parenting** — splicing them into the main session's DAG-line at the + appropriate point. + + For `x → y → z` where `y` is a Task, and agent transcript `u → v` needs + to be rooted at `y`, the result is: `x → y → u → v → z`. + +**Parenting strategies** (by agent type): + +| Agent type | Link mechanism | Parent at | +|------------|---------------|-----------| +| Sync Task | `agentId` on tool_result | Task tool_result message | +| Async Task (#90) | `agentId` on launch tool_result, `task-id` in `` | Launch tool_result | +| Teammate (#91) | `team_name` + agent name | TBD — likely TeamCreate or Task-with-team | + +--- + +## Algorithm + +### Phase 1: Load All Sessions + +Load **all** `.jsonl` files for a project directory. Build a unified message +index: + +```python +messages_by_uuid: dict[str, TranscriptEntry] # uuid → entry (oldest wins) +children_by_uuid: dict[str, list[str]] # parentUuid → [child uuids] +sessions: dict[str, list[str]] # sessionId → [uuids in chain order] +``` + +When targeting a single session, still load all files but only render +that session's subtree. Optionally warn that context from other sessions +is available. + +### Phase 2: Build DAG and Deduplicate + +1. Parse all entries, index by `uuid` +2. For duplicate `uuid`s, keep the one from the earliest `sessionId` +3. Build `children_by_uuid` from `parentUuid` links +4. Group messages by `sessionId` + +### Phase 3: Extract Session DAG-lines + +For each session: +1. Identify the session's unique messages (those whose authoritative + `sessionId` matches) +2. Order them by following `parentUuid` chains (not timestamps) +3. Verify linearity (no branching within a session) + +### Phase 4: Build Session Tree + +1. For each session, find where its DAG-line attaches to the DAG: + - Walk back from the session's first unique message via `parentUuid` + - The first message belonging to a **different** session is the + attachment point +2. The session whose message is the attachment point is the parent session +3. Root sessions have no attachment point (first message is `parentUuid: null` + or points outside loaded data) +4. Order children chronologically + +### Phase 5: Identify Junction Points + +A message is a junction point if `children_by_uuid[msg.uuid]` contains +messages from multiple sessions, or from a session different than the +message's own. + +Annotate junction points with their target sessions for forward-link +rendering. + +### Phase 6: Splice Agent Transcripts + +For each agent transcript (identified by `agentId`): +1. Determine parenting strategy (see table above) +2. Find the anchor message in the main session's DAG-line +3. Splice the agent's DAG-line after the anchor + +This replaces the current `_reorder_sidechain_template_messages` approach +with a principled graph operation. + +### Phase 7: Process and Render + +Within each DAG-line, apply existing processing: +- Pairing (tool_use+tool_result, thinking+assistant, etc.) +- Hierarchy building +- Tree construction + +Pairing should be **scoped to DAG-lines** — no pairing across session +boundaries. This is both correct and faster. + +--- + +## Assertions / Invariants + +These should be checked at runtime (log warnings, don't crash): + +1. **Session linearity**: Each session's messages form a single chain + (no branching within a `sessionId`) +2. **DAG acyclicity**: No cycles in `parentUuid` chains +3. **Unique ownership**: After deduplication, each `uuid` belongs to + exactly one session +4. **Agent parenting**: Every top-level agent transcript has an identifiable + anchor in the main session + +--- + +## Impact on Existing Code + +### What changes + +| Component | Current | After | +|-----------|---------|-------| +| `converter.py` | Load single file + agent files; timestamp sort | Load all project files; build DAG | +| `renderer.py` message ordering | Timestamp sort + pair reorder + sidechain reorder | DAG-line traversal; pairing within DAG-lines | +| Session index | Flat list sorted by timestamp | Session tree with parent/child relationships | +| Agent handling | `agentId`-based insertion after timestamp sort | Agent DAG-line splicing at anchor points | + +### What stays + +- Factory layer (transcript entry → MessageContent) +- TemplateMessage wrapper and RenderingContext +- Hierarchy building within sessions (user → assistant → tools) +- Renderer dispatch and format_* methods +- HTML templates and JavaScript (fold, timeline, filters) +- Deduplication heuristics (sidechain cleanup, etc.) — may simplify over time + +--- + +## Implementation Plan + +### Phase A: DAG Infrastructure (new module: `dag.py`) + +1. **Message indexing**: Load all session files, build `uuid` index, + deduplicate +2. **DAG construction**: Build parent→children graph +3. **Session extraction**: Group by `sessionId`, extract DAG-lines, + verify linearity +4. **Session tree**: Build parent/child session relationships, identify + junction points + +This phase is purely additive — new code alongside existing. Tests can +validate DAG construction against known transcripts. + +### Phase B: Integration with Rendering Pipeline + +1. Replace `load_transcript` / `load_directory_transcripts` with + DAG-based loading in `converter.py` +2. Pass DAG-lines (per session) into `generate_template_messages` +3. Scope pairing to DAG-lines +4. Generate session headers with navigation links (forward/back) +5. Update session index from flat to hierarchical + +### Phase C: Agent Transcript Rework + +1. Implement parenting strategies for each agent type +2. Replace `_reorder_sidechain_template_messages` with DAG-line splicing +3. Simplify or remove `_cleanup_sidechain_duplicates` (dedup now + happens at DAG level) + +### Phase D: Async Agent and Teammate Support + +1. Parse `` to extract `task-id` for async agent linking +2. Implement teammate parenting strategy (#91) +3. This is where #90 and #91 get properly resolved + +--- + +## Related Documentation + +- [rendering-architecture.md](rendering-architecture.md) — Current pipeline +- [messages.md](messages.md) — Message type reference +- [rendering-next.md](rendering-next.md) — Future rendering improvements diff --git a/dev-docs/rendering-architecture.md b/dev-docs/rendering-architecture.md index 3b8e4d54..87b8c66d 100644 --- a/dev-docs/rendering-architecture.md +++ b/dev-docs/rendering-architecture.md @@ -344,3 +344,4 @@ Note that `meta.uuid` is the original transcript entry's UUID. Since a single en - [messages.md](messages.md) - Complete message type reference - [css-classes.md](css-classes.md) - CSS class combinations and rules - [FOLD_STATE_DIAGRAM.md](FOLD_STATE_DIAGRAM.md) - Fold/unfold state machine +- [dag.md](dag.md) - DAG-based message architecture (planned replacement for timestamp ordering) From 232a73d701d75cb010b961fbb13a4b187cad9034 Mon Sep 17 00:00:00 2001 From: Christian Boos Date: Fri, 13 Feb 2026 08:54:22 +0100 Subject: [PATCH 02/23] Add DAG infrastructure module (Phase A) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit New dag.py module that builds a parentUuid→uuid graph from transcript entries, replacing timestamp-based ordering with structural traversal. Purely additive — no existing code modified. Co-Authored-By: Claude Opus 4.6 --- claude_code_log/dag.py | 394 ++++++++++++++++++++ test/test_dag.py | 635 ++++++++++++++++++++++++++++++++ test/test_data/dag_fork.jsonl | 11 + test/test_data/dag_resume.jsonl | 8 + test/test_data/dag_simple.jsonl | 5 + 5 files changed, 1053 insertions(+) create mode 100644 claude_code_log/dag.py create mode 100644 test/test_dag.py create mode 100644 test/test_data/dag_fork.jsonl create mode 100644 test/test_data/dag_resume.jsonl create mode 100644 test/test_data/dag_simple.jsonl diff --git a/claude_code_log/dag.py b/claude_code_log/dag.py new file mode 100644 index 00000000..2ef26a75 --- /dev/null +++ b/claude_code_log/dag.py @@ -0,0 +1,394 @@ +"""DAG-based message ordering for Claude Code transcripts. + +Replaces timestamp-based ordering with parentUuid → uuid graph traversal. +Works at the TranscriptEntry level (before factory/rendering). + +See dev-docs/dag.md for the full architecture spec. +""" + +import logging +from dataclasses import dataclass, field +from typing import Optional + +from .models import ( + TranscriptEntry, + SummaryTranscriptEntry, + QueueOperationTranscriptEntry, +) + +logger = logging.getLogger(__name__) + + +# ============================================================================= +# Data Structures +# ============================================================================= + + +@dataclass +class MessageNode: + """A deduplicated message in the DAG.""" + + uuid: str + parent_uuid: Optional[str] + session_id: str + timestamp: str + entry: TranscriptEntry + children_uuids: list[str] = field(default_factory=list) + + +@dataclass +class SessionDAGLine: + """A session's ordered chain of unique messages.""" + + session_id: str + uuids: list[str] # Ordered by parent→child chain traversal + first_timestamp: str + parent_session_id: Optional[str] = None + attachment_uuid: Optional[str] = None # UUID in parent where this attaches + + +@dataclass +class JunctionPoint: + """A message where other sessions fork or continue.""" + + uuid: str + session_id: str # The session this message belongs to + target_sessions: list[str] = field(default_factory=list) + + +@dataclass +class SessionTree: + """The complete session hierarchy for a project.""" + + nodes: dict[str, MessageNode] + sessions: dict[str, SessionDAGLine] + roots: list[str] # Root session IDs (no parent session) + junction_points: dict[str, JunctionPoint] + + +# ============================================================================= +# Step 1: Load and Index +# ============================================================================= + + +def build_message_index( + entries: list[TranscriptEntry], +) -> dict[str, MessageNode]: + """Build a deduplicated message index from transcript entries. + + Skips SummaryTranscriptEntry (no uuid/sessionId) and + QueueOperationTranscriptEntry (no uuid). For duplicate uuids, + keeps the entry from the earliest session (by first entry timestamp). + """ + # First pass: determine earliest timestamp per session + session_first_ts: dict[str, str] = {} + for entry in entries: + if isinstance(entry, (SummaryTranscriptEntry, QueueOperationTranscriptEntry)): + continue + sid = entry.sessionId + ts = entry.timestamp + if sid not in session_first_ts or ts < session_first_ts[sid]: + session_first_ts[sid] = ts + + # Second pass: build nodes, deduplicating by uuid (earliest session wins) + nodes: dict[str, MessageNode] = {} + for entry in entries: + if isinstance(entry, (SummaryTranscriptEntry, QueueOperationTranscriptEntry)): + continue + uuid = entry.uuid + sid = entry.sessionId + if uuid in nodes: + existing = nodes[uuid] + existing_session_ts = session_first_ts.get(existing.session_id, "") + new_session_ts = session_first_ts.get(sid, "") + if new_session_ts < existing_session_ts: + # Replace with entry from earlier session + nodes[uuid] = MessageNode( + uuid=uuid, + parent_uuid=entry.parentUuid, + session_id=sid, + timestamp=entry.timestamp, + entry=entry, + ) + else: + nodes[uuid] = MessageNode( + uuid=uuid, + parent_uuid=entry.parentUuid, + session_id=sid, + timestamp=entry.timestamp, + entry=entry, + ) + + return nodes + + +# ============================================================================= +# Step 2: Build DAG (parent→children links) +# ============================================================================= + + +def build_dag(nodes: dict[str, MessageNode]) -> None: + """Populate children_uuids on each node. Mutates nodes in place. + + Warns about orphan nodes (parentUuid points outside loaded data) + and validates acyclicity. + """ + # Clear existing children + for node in nodes.values(): + node.children_uuids = [] + + # Build parent→children links + for node in nodes.values(): + if node.parent_uuid is not None: + parent = nodes.get(node.parent_uuid) + if parent is not None: + parent.children_uuids.append(node.uuid) + else: + logger.warning( + "Orphan node %s: parentUuid %s not found in loaded data", + node.uuid, + node.parent_uuid, + ) + + # Validate: no cycles (walk parent chain for each node) + for node in nodes.values(): + visited: set[str] = set() + current: Optional[str] = node.uuid + while current is not None: + if current in visited: + logger.warning("Cycle detected in parent chain at uuid %s", current) + break + visited.add(current) + parent = nodes.get(current) + if parent is None: + break + current = parent.parent_uuid + + +# ============================================================================= +# Step 3: Extract Session DAG-lines +# ============================================================================= + + +def extract_session_dag_lines( + nodes: dict[str, MessageNode], +) -> dict[str, SessionDAGLine]: + """Extract per-session ordered chains from the DAG. + + For each session, finds the root node (parent_uuid is null or points + to a different session), then walks forward via children_uuids filtering + to same-session children. + + Verifies linearity: each node has at most one child in the same session. + Falls back to timestamp sort if violated. + """ + # Group nodes by session + session_nodes: dict[str, list[MessageNode]] = {} + for node in nodes.values(): + session_nodes.setdefault(node.session_id, []).append(node) + + sessions: dict[str, SessionDAGLine] = {} + for session_id, snodes in session_nodes.items(): + session_uuids = {n.uuid for n in snodes} + + # Find root(s): nodes whose parent_uuid is null or outside this session + roots = [ + n + for n in snodes + if n.parent_uuid is None or n.parent_uuid not in session_uuids + ] + + if not roots: + logger.warning( + "Session %s: no root found, falling back to timestamp sort", + session_id, + ) + sorted_nodes = sorted(snodes, key=lambda n: n.timestamp) + sessions[session_id] = SessionDAGLine( + session_id=session_id, + uuids=[n.uuid for n in sorted_nodes], + first_timestamp=sorted_nodes[0].timestamp, + ) + continue + + if len(roots) > 1: + # Multiple roots - pick the earliest by timestamp + roots.sort(key=lambda n: n.timestamp) + logger.warning( + "Session %s: %d roots found, using earliest (%s)", + session_id, + len(roots), + roots[0].uuid, + ) + + # Walk forward from root, following same-session children + chain: list[str] = [] + current: Optional[MessageNode] = roots[0] + linear = True + + while current is not None: + chain.append(current.uuid) + # Find children in the same session + same_session_children = [ + c for c in current.children_uuids if c in session_uuids + ] + if len(same_session_children) == 0: + current = None + elif len(same_session_children) == 1: + current = nodes[same_session_children[0]] + else: + logger.warning( + "Session %s: node %s has %d same-session children, " + "linearity violated", + session_id, + current.uuid, + len(same_session_children), + ) + linear = False + current = None + + if not linear: + # Fall back to timestamp sort + sorted_nodes = sorted(snodes, key=lambda n: n.timestamp) + chain = [n.uuid for n in sorted_nodes] + + first_ts = nodes[chain[0]].timestamp + sessions[session_id] = SessionDAGLine( + session_id=session_id, + uuids=chain, + first_timestamp=first_ts, + ) + + return sessions + + +# ============================================================================= +# Step 4: Build Session Tree +# ============================================================================= + + +def build_session_tree( + nodes: dict[str, MessageNode], + sessions: dict[str, SessionDAGLine], +) -> SessionTree: + """Build the session hierarchy and identify junction points. + + For each session's DAG-line, the first message's parent_uuid determines + the parent session: + - null → root session + - points to node in different session → child of that session + """ + roots: list[str] = [] + junction_points: dict[str, JunctionPoint] = {} + + for session_id, dag_line in sessions.items(): + if not dag_line.uuids: + roots.append(session_id) + continue + + first_uuid = dag_line.uuids[0] + first_node = nodes[first_uuid] + parent_uuid = first_node.parent_uuid + + if parent_uuid is None or parent_uuid not in nodes: + # Root session (or orphan parent) + roots.append(session_id) + dag_line.parent_session_id = None + dag_line.attachment_uuid = None + else: + parent_node = nodes[parent_uuid] + if parent_node.session_id == session_id: + # Parent is in same session - this is a root + roots.append(session_id) + dag_line.parent_session_id = None + dag_line.attachment_uuid = None + else: + # Child session: attaches to parent session at parent_uuid + dag_line.parent_session_id = parent_node.session_id + dag_line.attachment_uuid = parent_uuid + + # Record junction point + if parent_uuid not in junction_points: + junction_points[parent_uuid] = JunctionPoint( + uuid=parent_uuid, + session_id=parent_node.session_id, + ) + junction_points[parent_uuid].target_sessions.append(session_id) + + # Order roots chronologically + roots.sort(key=lambda sid: sessions[sid].first_timestamp) + + # Order junction point target_sessions chronologically + for jp in junction_points.values(): + jp.target_sessions.sort(key=lambda sid: sessions[sid].first_timestamp) + + return SessionTree( + nodes=nodes, + sessions=sessions, + roots=roots, + junction_points=junction_points, + ) + + +# ============================================================================= +# Step 5: Ordered Traversal +# ============================================================================= + + +def traverse_session_tree(tree: SessionTree) -> list[TranscriptEntry]: + """Depth-first traversal of session tree producing rendering order. + + For each session: yields its DAG-line's entries in chain order. + Children are visited in chronological order (by first_timestamp). + """ + result: list[TranscriptEntry] = [] + visited_sessions: set[str] = set() + + def _visit_session(session_id: str) -> None: + if session_id in visited_sessions: + return + visited_sessions.add(session_id) + + dag_line = tree.sessions.get(session_id) + if dag_line is None: + return + + # Build map: attachment_uuid → [child session IDs] for this session + children_at: dict[str, list[str]] = {} + for sid, sline in tree.sessions.items(): + if sline.parent_session_id == session_id and sline.attachment_uuid: + children_at.setdefault(sline.attachment_uuid, []).append(sid) + + # Emit entries, visiting child sessions at junction points + for uuid in dag_line.uuids: + node = tree.nodes[uuid] + result.append(node.entry) + # After emitting this message, visit any child sessions + # that attach here (in chronological order) + if uuid in children_at: + for child_sid in children_at[uuid]: + _visit_session(child_sid) + + # Visit root sessions in chronological order + for root_sid in tree.roots: + _visit_session(root_sid) + + return result + + +# ============================================================================= +# Convenience: Full Pipeline +# ============================================================================= + + +def build_dag_from_entries( + entries: list[TranscriptEntry], +) -> SessionTree: + """Build a complete SessionTree from raw transcript entries. + + Convenience function that runs Steps 1-4 in sequence. + """ + nodes = build_message_index(entries) + build_dag(nodes) + sessions = extract_session_dag_lines(nodes) + return build_session_tree(nodes, sessions) diff --git a/test/test_dag.py b/test/test_dag.py new file mode 100644 index 00000000..c0190b67 --- /dev/null +++ b/test/test_dag.py @@ -0,0 +1,635 @@ +"""Tests for the DAG-based message ordering module.""" + +import json +from pathlib import Path + +import pytest + +from claude_code_log.dag import ( + MessageNode, + SessionDAGLine, + SessionTree, + build_dag, + build_dag_from_entries, + build_message_index, + build_session_tree, + extract_session_dag_lines, + traverse_session_tree, +) +from claude_code_log.factories import create_transcript_entry +from claude_code_log.models import ( + SummaryTranscriptEntry, + TranscriptEntry, +) + +TEST_DATA = Path(__file__).parent / "test_data" +REAL_PROJECTS = TEST_DATA / "real_projects" + + +def load_entries_from_jsonl(path: Path) -> list[TranscriptEntry]: + """Load transcript entries from a JSONL file, skipping unparseable lines.""" + entries: list[TranscriptEntry] = [] + with open(path) as f: + for line in f: + line = line.strip() + if not line: + continue + data = json.loads(line) + entry_type = data.get("type") + if entry_type in ( + "user", + "assistant", + "summary", + "system", + "queue-operation", + ): + entries.append(create_transcript_entry(data)) + return entries + + +def load_project_entries(project_dir: Path) -> list[TranscriptEntry]: + """Load all entries from a project directory (excluding agent files).""" + entries: list[TranscriptEntry] = [] + for jsonl_file in sorted(project_dir.glob("*.jsonl")): + if jsonl_file.name.startswith("agent-"): + continue + entries.extend(load_entries_from_jsonl(jsonl_file)) + return entries + + +# ============================================================================= +# Test: Single session (dag_simple.jsonl) +# ============================================================================= + + +class TestSingleSession: + """Tests using dag_simple.jsonl: a→b→c→d→e in session s1.""" + + @pytest.fixture() + def tree(self) -> SessionTree: + entries = load_entries_from_jsonl(TEST_DATA / "dag_simple.jsonl") + return build_dag_from_entries(entries) + + def test_single_session_one_dagline(self, tree: SessionTree) -> None: + assert len(tree.sessions) == 1 + assert "s1" in tree.sessions + + def test_single_session_chain_order(self, tree: SessionTree) -> None: + dag_line = tree.sessions["s1"] + assert dag_line.uuids == ["a", "b", "c", "d", "e"] + + def test_single_session_is_root(self, tree: SessionTree) -> None: + assert tree.roots == ["s1"] + + def test_single_session_no_junction_points(self, tree: SessionTree) -> None: + assert tree.junction_points == {} + + def test_single_session_traversal(self, tree: SessionTree) -> None: + result = traverse_session_tree(tree) + uuids = [e.uuid for e in result] # type: ignore[union-attr] + assert uuids == ["a", "b", "c", "d", "e"] + + def test_single_session_first_timestamp(self, tree: SessionTree) -> None: + dag_line = tree.sessions["s1"] + assert dag_line.first_timestamp == "2025-07-01T10:00:00.000Z" + + def test_single_session_no_parent(self, tree: SessionTree) -> None: + dag_line = tree.sessions["s1"] + assert dag_line.parent_session_id is None + assert dag_line.attachment_uuid is None + + +# ============================================================================= +# Test: Resume session (dag_resume.jsonl) +# ============================================================================= + + +class TestResumeSession: + """Tests using dag_resume.jsonl: s1(a→b→c→d→e), s2(f→g→h) where f.parent=e.""" + + @pytest.fixture() + def tree(self) -> SessionTree: + entries = load_entries_from_jsonl(TEST_DATA / "dag_resume.jsonl") + return build_dag_from_entries(entries) + + def test_two_sessions(self, tree: SessionTree) -> None: + assert len(tree.sessions) == 2 + assert "s1" in tree.sessions + assert "s2" in tree.sessions + + def test_s1_chain(self, tree: SessionTree) -> None: + assert tree.sessions["s1"].uuids == ["a", "b", "c", "d", "e"] + + def test_s2_chain(self, tree: SessionTree) -> None: + assert tree.sessions["s2"].uuids == ["f", "g", "h"] + + def test_s1_is_root(self, tree: SessionTree) -> None: + assert tree.roots == ["s1"] + + def test_s2_parent_is_s1(self, tree: SessionTree) -> None: + dag_line = tree.sessions["s2"] + assert dag_line.parent_session_id == "s1" + assert dag_line.attachment_uuid == "e" + + def test_junction_at_e(self, tree: SessionTree) -> None: + assert "e" in tree.junction_points + jp = tree.junction_points["e"] + assert jp.session_id == "s1" + assert jp.target_sessions == ["s2"] + + def test_traversal_order(self, tree: SessionTree) -> None: + result = traverse_session_tree(tree) + uuids = [e.uuid for e in result] # type: ignore[union-attr] + assert uuids == ["a", "b", "c", "d", "e", "f", "g", "h"] + + +# ============================================================================= +# Test: Fork session (dag_fork.jsonl) +# ============================================================================= + + +class TestForkSession: + """Tests using dag_fork.jsonl: s1(a→e), s2(f→h from e), s3(i→k from c).""" + + @pytest.fixture() + def tree(self) -> SessionTree: + entries = load_entries_from_jsonl(TEST_DATA / "dag_fork.jsonl") + return build_dag_from_entries(entries) + + def test_three_sessions(self, tree: SessionTree) -> None: + assert len(tree.sessions) == 3 + + def test_s1_chain(self, tree: SessionTree) -> None: + assert tree.sessions["s1"].uuids == ["a", "b", "c", "d", "e"] + + def test_s2_chain(self, tree: SessionTree) -> None: + assert tree.sessions["s2"].uuids == ["f", "g", "h"] + + def test_s3_chain(self, tree: SessionTree) -> None: + assert tree.sessions["s3"].uuids == ["i", "j", "k"] + + def test_only_s1_is_root(self, tree: SessionTree) -> None: + assert tree.roots == ["s1"] + + def test_s2_attaches_at_e(self, tree: SessionTree) -> None: + dag_line = tree.sessions["s2"] + assert dag_line.parent_session_id == "s1" + assert dag_line.attachment_uuid == "e" + + def test_s3_attaches_at_c(self, tree: SessionTree) -> None: + dag_line = tree.sessions["s3"] + assert dag_line.parent_session_id == "s1" + assert dag_line.attachment_uuid == "c" + + def test_two_junction_points(self, tree: SessionTree) -> None: + assert len(tree.junction_points) == 2 + assert "c" in tree.junction_points + assert "e" in tree.junction_points + + def test_junction_c_targets_s3(self, tree: SessionTree) -> None: + jp = tree.junction_points["c"] + assert jp.session_id == "s1" + assert jp.target_sessions == ["s3"] + + def test_junction_e_targets_s2(self, tree: SessionTree) -> None: + jp = tree.junction_points["e"] + assert jp.session_id == "s1" + assert jp.target_sessions == ["s2"] + + def test_traversal_depth_first(self, tree: SessionTree) -> None: + """Depth-first: s1 entries, then at junction c visit s3 (fork), + continue s1, then at junction e visit s2 (continue).""" + result = traverse_session_tree(tree) + uuids = [e.uuid for e in result] # type: ignore[union-attr] + # s1: a,b,c → s3: i,j,k → s1: d,e → s2: f,g,h + assert uuids == ["a", "b", "c", "i", "j", "k", "d", "e", "f", "g", "h"] + + +# ============================================================================= +# Test: Deduplication +# ============================================================================= + + +class TestDeduplication: + """Test that duplicate uuids are resolved by keeping earliest session.""" + + def test_dedup_keeps_earliest_session(self) -> None: + """Same uuid in two sessions; entry from earlier session wins.""" + entries = load_entries_from_jsonl(TEST_DATA / "dag_simple.jsonl") + # Manually add a duplicate of uuid "c" with a later session + dup_data = { + "type": "user", + "timestamp": "2025-07-02T10:00:00.000Z", + "parentUuid": "b", + "isSidechain": False, + "userType": "human", + "cwd": "/tmp", + "sessionId": "s_later", + "version": "1.0.0", + "uuid": "c", + "message": {"role": "user", "content": [{"type": "text", "text": "Dup"}]}, + } + entries.append(create_transcript_entry(dup_data)) + + nodes = build_message_index(entries) + # "c" should belong to s1 (timestamp 2025-07-01) not s_later (2025-07-02) + assert nodes["c"].session_id == "s1" + + def test_dedup_replaces_with_earlier(self) -> None: + """If later-loaded entry is from an earlier session, it replaces.""" + entries = load_entries_from_jsonl(TEST_DATA / "dag_simple.jsonl") + # Add a duplicate of uuid "c" from an EARLIER session + dup_data = { + "type": "user", + "timestamp": "2025-06-30T10:00:00.000Z", + "parentUuid": "b", + "isSidechain": False, + "userType": "human", + "cwd": "/tmp", + "sessionId": "s_earlier", + "version": "1.0.0", + "uuid": "c", + "message": { + "role": "user", + "content": [{"type": "text", "text": "Earlier"}], + }, + } + entries.append(create_transcript_entry(dup_data)) + + nodes = build_message_index(entries) + # "c" should now belong to s_earlier + assert nodes["c"].session_id == "s_earlier" + + +# ============================================================================= +# Test: Junction Points +# ============================================================================= + + +class TestJunctionPoints: + """Detailed junction point tests.""" + + def test_no_junctions_single_session(self) -> None: + entries = load_entries_from_jsonl(TEST_DATA / "dag_simple.jsonl") + tree = build_dag_from_entries(entries) + assert len(tree.junction_points) == 0 + + def test_single_junction_resume(self) -> None: + entries = load_entries_from_jsonl(TEST_DATA / "dag_resume.jsonl") + tree = build_dag_from_entries(entries) + assert len(tree.junction_points) == 1 + assert "e" in tree.junction_points + + def test_multiple_junctions_fork(self) -> None: + entries = load_entries_from_jsonl(TEST_DATA / "dag_fork.jsonl") + tree = build_dag_from_entries(entries) + assert len(tree.junction_points) == 2 + # Both c and e are junctions + assert set(tree.junction_points.keys()) == {"c", "e"} + + def test_junction_target_sessions_ordered_chronologically(self) -> None: + """If multiple sessions fork from the same point, targets are ordered.""" + # Create data where two sessions both fork from the same message + base = load_entries_from_jsonl(TEST_DATA / "dag_simple.jsonl") + fork1_data = { + "type": "user", + "timestamp": "2025-07-02T10:00:00.000Z", + "parentUuid": "c", + "isSidechain": False, + "userType": "human", + "cwd": "/tmp", + "sessionId": "s_fork1", + "version": "1.0.0", + "uuid": "f1", + "message": { + "role": "user", + "content": [{"type": "text", "text": "Fork 1"}], + }, + } + fork2_data = { + "type": "user", + "timestamp": "2025-07-01T12:00:00.000Z", + "parentUuid": "c", + "isSidechain": False, + "userType": "human", + "cwd": "/tmp", + "sessionId": "s_fork2", + "version": "1.0.0", + "uuid": "f2", + "message": { + "role": "user", + "content": [{"type": "text", "text": "Fork 2"}], + }, + } + base.append(create_transcript_entry(fork1_data)) + base.append(create_transcript_entry(fork2_data)) + + tree = build_dag_from_entries(base) + jp = tree.junction_points["c"] + # s_fork2 is earlier (2025-07-01T12:00) than s_fork1 (2025-07-02T10:00) + assert jp.target_sessions == ["s_fork2", "s_fork1"] + + +# ============================================================================= +# Test: Traversal Order +# ============================================================================= + + +class TestTraversalOrder: + """Test depth-first session tree traversal produces correct order.""" + + def test_simple_traversal(self) -> None: + entries = load_entries_from_jsonl(TEST_DATA / "dag_simple.jsonl") + tree = build_dag_from_entries(entries) + result = traverse_session_tree(tree) + assert len(result) == 5 + uuids = [e.uuid for e in result] # type: ignore[union-attr] + assert uuids == ["a", "b", "c", "d", "e"] + + def test_resume_traversal(self) -> None: + entries = load_entries_from_jsonl(TEST_DATA / "dag_resume.jsonl") + tree = build_dag_from_entries(entries) + result = traverse_session_tree(tree) + uuids = [e.uuid for e in result] # type: ignore[union-attr] + assert uuids == ["a", "b", "c", "d", "e", "f", "g", "h"] + + def test_fork_traversal(self) -> None: + entries = load_entries_from_jsonl(TEST_DATA / "dag_fork.jsonl") + tree = build_dag_from_entries(entries) + result = traverse_session_tree(tree) + uuids = [e.uuid for e in result] # type: ignore[union-attr] + # Depth-first: at junction c (after emitting c), visit s3 first + # then continue s1, at junction e visit s2 + assert uuids == ["a", "b", "c", "i", "j", "k", "d", "e", "f", "g", "h"] + + def test_traversal_returns_entries(self) -> None: + """Verify traversal returns actual TranscriptEntry objects.""" + entries = load_entries_from_jsonl(TEST_DATA / "dag_simple.jsonl") + tree = build_dag_from_entries(entries) + result = traverse_session_tree(tree) + for entry in result: + assert hasattr(entry, "type") + assert hasattr(entry, "uuid") + + +# ============================================================================= +# Test: Real project data +# ============================================================================= + + +EXPERIMENTS_DIR = REAL_PROJECTS / "-src-experiments-claude_p" + + +@pytest.mark.skipif( + not EXPERIMENTS_DIR.exists(), + reason="Real project test data not available", +) +class TestRealProjectExperiments: + """Test DAG construction against real project data. + + The -src-experiments-claude_p project has 4 independent sessions + with no cross-session references, so should produce 4 root sessions. + """ + + @pytest.fixture() + def tree(self) -> SessionTree: + entries = load_project_entries(EXPERIMENTS_DIR) + return build_dag_from_entries(entries) + + def test_loads_multiple_sessions(self, tree: SessionTree) -> None: + assert len(tree.sessions) >= 4 + + def test_all_sessions_are_roots(self, tree: SessionTree) -> None: + """Independent sessions should all be roots.""" + # All sessions with DAG-lines should be root (no cross-refs) + for session_id in tree.sessions: + dag_line = tree.sessions[session_id] + assert dag_line.parent_session_id is None, ( + f"Session {session_id} unexpectedly has parent " + f"{dag_line.parent_session_id}" + ) + + def test_no_junction_points(self, tree: SessionTree) -> None: + """Independent sessions should have no junction points.""" + assert len(tree.junction_points) == 0 + + def test_each_session_has_entries(self, tree: SessionTree) -> None: + """Each session's DAG-line should have at least one message.""" + for session_id, dag_line in tree.sessions.items(): + assert len(dag_line.uuids) > 0, f"Session {session_id} has empty DAG-line" + + def test_traversal_covers_all_entries(self, tree: SessionTree) -> None: + """Traversal should include all entries from all sessions.""" + total_in_daglines = sum(len(dl.uuids) for dl in tree.sessions.values()) + result = traverse_session_tree(tree) + assert len(result) == total_in_daglines + + def test_sessions_ordered_chronologically(self, tree: SessionTree) -> None: + """Root sessions should be ordered by first_timestamp.""" + timestamps = [tree.sessions[sid].first_timestamp for sid in tree.roots] + assert timestamps == sorted(timestamps) + + +# ============================================================================= +# Test: Edge cases +# ============================================================================= + + +class TestOrphanParent: + """Test handling of parentUuid pointing to unknown uuid.""" + + def test_orphan_treated_as_root(self) -> None: + """A session whose first message has parentUuid pointing to + an unknown uuid should be treated as a root session.""" + orphan_data = { + "type": "user", + "timestamp": "2025-07-01T10:00:00.000Z", + "parentUuid": "nonexistent_uuid", + "isSidechain": False, + "userType": "human", + "cwd": "/tmp", + "sessionId": "s_orphan", + "version": "1.0.0", + "uuid": "orphan_a", + "message": { + "role": "user", + "content": [{"type": "text", "text": "Orphan"}], + }, + } + entries: list[TranscriptEntry] = [create_transcript_entry(orphan_data)] + tree = build_dag_from_entries(entries) + + assert "s_orphan" in tree.sessions + assert tree.roots == ["s_orphan"] + dag_line = tree.sessions["s_orphan"] + assert dag_line.parent_session_id is None + + def test_orphan_with_children(self) -> None: + """Orphan node still chains correctly within its session.""" + data = [ + { + "type": "user", + "timestamp": "2025-07-01T10:00:00.000Z", + "parentUuid": "nonexistent", + "isSidechain": False, + "userType": "human", + "cwd": "/tmp", + "sessionId": "s1", + "version": "1.0.0", + "uuid": "x", + "message": { + "role": "user", + "content": [{"type": "text", "text": "Start"}], + }, + }, + { + "type": "assistant", + "timestamp": "2025-07-01T10:01:00.000Z", + "parentUuid": "x", + "isSidechain": False, + "userType": "human", + "cwd": "/tmp", + "sessionId": "s1", + "version": "1.0.0", + "uuid": "y", + "requestId": "req_1", + "message": { + "id": "y", + "type": "message", + "role": "assistant", + "model": "claude-3-sonnet", + "content": [{"type": "text", "text": "Reply"}], + "stop_reason": "end_turn", + "usage": {"input_tokens": 10, "output_tokens": 5}, + }, + }, + ] + entries = [create_transcript_entry(d) for d in data] + tree = build_dag_from_entries(entries) + + assert tree.sessions["s1"].uuids == ["x", "y"] + assert tree.roots == ["s1"] + + +class TestSummaryEntriesSkipped: + """Test that SummaryTranscriptEntry entries are excluded from DAG.""" + + def test_summary_not_in_nodes(self) -> None: + entries = load_entries_from_jsonl(TEST_DATA / "dag_simple.jsonl") + # Add a summary entry + summary_data = { + "type": "summary", + "summary": "A test summary", + "leafUuid": "e", + "timestamp": "2025-07-01T10:05:00.000Z", + } + entries.append(create_transcript_entry(summary_data)) + + nodes = build_message_index(entries) + # Summary has no uuid, so it can't be in nodes + for node in nodes.values(): + assert not isinstance(node.entry, SummaryTranscriptEntry) + + def test_summary_not_in_traversal(self) -> None: + entries = load_entries_from_jsonl(TEST_DATA / "dag_simple.jsonl") + summary_data = { + "type": "summary", + "summary": "A test summary", + "leafUuid": "e", + "timestamp": "2025-07-01T10:05:00.000Z", + } + entries.append(create_transcript_entry(summary_data)) + + tree = build_dag_from_entries(entries) + result = traverse_session_tree(tree) + # Should still be 5 entries (a-e), summary excluded + assert len(result) == 5 + for entry in result: + assert not isinstance(entry, SummaryTranscriptEntry) + + +class TestQueueOperationSkipped: + """Test that QueueOperationTranscriptEntry entries are excluded from DAG.""" + + def test_queue_op_not_in_nodes(self) -> None: + entries = load_entries_from_jsonl(TEST_DATA / "dag_simple.jsonl") + queue_data = { + "type": "queue-operation", + "operation": "dequeue", + "timestamp": "2025-07-01T09:59:00.000Z", + "sessionId": "s1", + } + entries.append(create_transcript_entry(queue_data)) + + nodes = build_message_index(entries) + # queue-operation has no uuid field + assert len(nodes) == 5 # Only a,b,c,d,e + + +# ============================================================================= +# Test: Individual algorithm steps +# ============================================================================= + + +class TestBuildMessageIndex: + """Test build_message_index in isolation.""" + + def test_indexes_all_entries(self) -> None: + entries = load_entries_from_jsonl(TEST_DATA / "dag_simple.jsonl") + nodes = build_message_index(entries) + assert set(nodes.keys()) == {"a", "b", "c", "d", "e"} + + def test_preserves_entry_data(self) -> None: + entries = load_entries_from_jsonl(TEST_DATA / "dag_simple.jsonl") + nodes = build_message_index(entries) + assert nodes["a"].session_id == "s1" + assert nodes["a"].parent_uuid is None + assert nodes["b"].parent_uuid == "a" + + def test_empty_entries(self) -> None: + nodes = build_message_index([]) + assert nodes == {} + + +class TestBuildDAG: + """Test build_dag (parent→children links).""" + + def test_children_populated(self) -> None: + entries = load_entries_from_jsonl(TEST_DATA / "dag_simple.jsonl") + nodes = build_message_index(entries) + build_dag(nodes) + assert nodes["a"].children_uuids == ["b"] + assert nodes["b"].children_uuids == ["c"] + assert nodes["e"].children_uuids == [] + + def test_root_has_no_parent(self) -> None: + entries = load_entries_from_jsonl(TEST_DATA / "dag_simple.jsonl") + nodes = build_message_index(entries) + build_dag(nodes) + assert nodes["a"].parent_uuid is None + + def test_cross_session_children(self) -> None: + entries = load_entries_from_jsonl(TEST_DATA / "dag_resume.jsonl") + nodes = build_message_index(entries) + build_dag(nodes) + # "e" has child "f" which is in a different session + assert "f" in nodes["e"].children_uuids + + +class TestExtractSessionDAGLines: + """Test extract_session_dag_lines.""" + + def test_single_session_chain(self) -> None: + entries = load_entries_from_jsonl(TEST_DATA / "dag_simple.jsonl") + nodes = build_message_index(entries) + build_dag(nodes) + sessions = extract_session_dag_lines(nodes) + assert sessions["s1"].uuids == ["a", "b", "c", "d", "e"] + + def test_multi_session_chains(self) -> None: + entries = load_entries_from_jsonl(TEST_DATA / "dag_resume.jsonl") + nodes = build_message_index(entries) + build_dag(nodes) + sessions = extract_session_dag_lines(nodes) + assert sessions["s1"].uuids == ["a", "b", "c", "d", "e"] + assert sessions["s2"].uuids == ["f", "g", "h"] diff --git a/test/test_data/dag_fork.jsonl b/test/test_data/dag_fork.jsonl new file mode 100644 index 00000000..11475211 --- /dev/null +++ b/test/test_data/dag_fork.jsonl @@ -0,0 +1,11 @@ +{"type":"user","timestamp":"2025-07-01T10:00:00.000Z","parentUuid":null,"isSidechain":false,"userType":"human","cwd":"/tmp","sessionId":"s1","version":"1.0.0","uuid":"a","message":{"role":"user","content":[{"type":"text","text":"Start"}]}} +{"type":"assistant","timestamp":"2025-07-01T10:01:00.000Z","parentUuid":"a","isSidechain":false,"userType":"human","cwd":"/tmp","sessionId":"s1","version":"1.0.0","uuid":"b","requestId":"req_1","message":{"id":"b","type":"message","role":"assistant","model":"claude-3-sonnet","content":[{"type":"text","text":"Reply 1"}],"stop_reason":"end_turn","usage":{"input_tokens":10,"output_tokens":5}}} +{"type":"user","timestamp":"2025-07-01T10:02:00.000Z","parentUuid":"b","isSidechain":false,"userType":"human","cwd":"/tmp","sessionId":"s1","version":"1.0.0","uuid":"c","message":{"role":"user","content":[{"type":"text","text":"Middle"}]}} +{"type":"assistant","timestamp":"2025-07-01T10:03:00.000Z","parentUuid":"c","isSidechain":false,"userType":"human","cwd":"/tmp","sessionId":"s1","version":"1.0.0","uuid":"d","requestId":"req_2","message":{"id":"d","type":"message","role":"assistant","model":"claude-3-sonnet","content":[{"type":"text","text":"Reply 2"}],"stop_reason":"end_turn","usage":{"input_tokens":15,"output_tokens":8}}} +{"type":"user","timestamp":"2025-07-01T10:04:00.000Z","parentUuid":"d","isSidechain":false,"userType":"human","cwd":"/tmp","sessionId":"s1","version":"1.0.0","uuid":"e","message":{"role":"user","content":[{"type":"text","text":"Last in s1"}]}} +{"type":"user","timestamp":"2025-07-01T11:00:00.000Z","parentUuid":"e","isSidechain":false,"userType":"human","cwd":"/tmp","sessionId":"s2","version":"1.0.0","uuid":"f","message":{"role":"user","content":[{"type":"text","text":"Continue from e"}]}} +{"type":"assistant","timestamp":"2025-07-01T11:01:00.000Z","parentUuid":"f","isSidechain":false,"userType":"human","cwd":"/tmp","sessionId":"s2","version":"1.0.0","uuid":"g","requestId":"req_3","message":{"id":"g","type":"message","role":"assistant","model":"claude-3-sonnet","content":[{"type":"text","text":"Continued reply"}],"stop_reason":"end_turn","usage":{"input_tokens":20,"output_tokens":10}}} +{"type":"user","timestamp":"2025-07-01T11:02:00.000Z","parentUuid":"g","isSidechain":false,"userType":"human","cwd":"/tmp","sessionId":"s2","version":"1.0.0","uuid":"h","message":{"role":"user","content":[{"type":"text","text":"End of s2"}]}} +{"type":"user","timestamp":"2025-07-01T12:00:00.000Z","parentUuid":"c","isSidechain":false,"userType":"human","cwd":"/tmp","sessionId":"s3","version":"1.0.0","uuid":"i","message":{"role":"user","content":[{"type":"text","text":"Fork from c"}]}} +{"type":"assistant","timestamp":"2025-07-01T12:01:00.000Z","parentUuid":"i","isSidechain":false,"userType":"human","cwd":"/tmp","sessionId":"s3","version":"1.0.0","uuid":"j","requestId":"req_4","message":{"id":"j","type":"message","role":"assistant","model":"claude-3-sonnet","content":[{"type":"text","text":"Forked reply"}],"stop_reason":"end_turn","usage":{"input_tokens":25,"output_tokens":12}}} +{"type":"user","timestamp":"2025-07-01T12:02:00.000Z","parentUuid":"j","isSidechain":false,"userType":"human","cwd":"/tmp","sessionId":"s3","version":"1.0.0","uuid":"k","message":{"role":"user","content":[{"type":"text","text":"End of s3"}]}} diff --git a/test/test_data/dag_resume.jsonl b/test/test_data/dag_resume.jsonl new file mode 100644 index 00000000..fc6203cf --- /dev/null +++ b/test/test_data/dag_resume.jsonl @@ -0,0 +1,8 @@ +{"type":"user","timestamp":"2025-07-01T10:00:00.000Z","parentUuid":null,"isSidechain":false,"userType":"human","cwd":"/tmp","sessionId":"s1","version":"1.0.0","uuid":"a","message":{"role":"user","content":[{"type":"text","text":"Start"}]}} +{"type":"assistant","timestamp":"2025-07-01T10:01:00.000Z","parentUuid":"a","isSidechain":false,"userType":"human","cwd":"/tmp","sessionId":"s1","version":"1.0.0","uuid":"b","requestId":"req_1","message":{"id":"b","type":"message","role":"assistant","model":"claude-3-sonnet","content":[{"type":"text","text":"Reply 1"}],"stop_reason":"end_turn","usage":{"input_tokens":10,"output_tokens":5}}} +{"type":"user","timestamp":"2025-07-01T10:02:00.000Z","parentUuid":"b","isSidechain":false,"userType":"human","cwd":"/tmp","sessionId":"s1","version":"1.0.0","uuid":"c","message":{"role":"user","content":[{"type":"text","text":"Continue"}]}} +{"type":"assistant","timestamp":"2025-07-01T10:03:00.000Z","parentUuid":"c","isSidechain":false,"userType":"human","cwd":"/tmp","sessionId":"s1","version":"1.0.0","uuid":"d","requestId":"req_2","message":{"id":"d","type":"message","role":"assistant","model":"claude-3-sonnet","content":[{"type":"text","text":"Reply 2"}],"stop_reason":"end_turn","usage":{"input_tokens":15,"output_tokens":8}}} +{"type":"user","timestamp":"2025-07-01T10:04:00.000Z","parentUuid":"d","isSidechain":false,"userType":"human","cwd":"/tmp","sessionId":"s1","version":"1.0.0","uuid":"e","message":{"role":"user","content":[{"type":"text","text":"Last in s1"}]}} +{"type":"user","timestamp":"2025-07-01T11:00:00.000Z","parentUuid":"e","isSidechain":false,"userType":"human","cwd":"/tmp","sessionId":"s2","version":"1.0.0","uuid":"f","message":{"role":"user","content":[{"type":"text","text":"Resume here"}]}} +{"type":"assistant","timestamp":"2025-07-01T11:01:00.000Z","parentUuid":"f","isSidechain":false,"userType":"human","cwd":"/tmp","sessionId":"s2","version":"1.0.0","uuid":"g","requestId":"req_3","message":{"id":"g","type":"message","role":"assistant","model":"claude-3-sonnet","content":[{"type":"text","text":"Resumed reply"}],"stop_reason":"end_turn","usage":{"input_tokens":20,"output_tokens":10}}} +{"type":"user","timestamp":"2025-07-01T11:02:00.000Z","parentUuid":"g","isSidechain":false,"userType":"human","cwd":"/tmp","sessionId":"s2","version":"1.0.0","uuid":"h","message":{"role":"user","content":[{"type":"text","text":"End of s2"}]}} diff --git a/test/test_data/dag_simple.jsonl b/test/test_data/dag_simple.jsonl new file mode 100644 index 00000000..5b5152d6 --- /dev/null +++ b/test/test_data/dag_simple.jsonl @@ -0,0 +1,5 @@ +{"type":"user","timestamp":"2025-07-01T10:00:00.000Z","parentUuid":null,"isSidechain":false,"userType":"human","cwd":"/tmp","sessionId":"s1","version":"1.0.0","uuid":"a","message":{"role":"user","content":[{"type":"text","text":"Hello"}]}} +{"type":"assistant","timestamp":"2025-07-01T10:01:00.000Z","parentUuid":"a","isSidechain":false,"userType":"human","cwd":"/tmp","sessionId":"s1","version":"1.0.0","uuid":"b","requestId":"req_1","message":{"id":"b","type":"message","role":"assistant","model":"claude-3-sonnet","content":[{"type":"text","text":"Hi there"}],"stop_reason":"end_turn","usage":{"input_tokens":10,"output_tokens":5}}} +{"type":"user","timestamp":"2025-07-01T10:02:00.000Z","parentUuid":"b","isSidechain":false,"userType":"human","cwd":"/tmp","sessionId":"s1","version":"1.0.0","uuid":"c","message":{"role":"user","content":[{"type":"text","text":"How are you?"}]}} +{"type":"assistant","timestamp":"2025-07-01T10:03:00.000Z","parentUuid":"c","isSidechain":false,"userType":"human","cwd":"/tmp","sessionId":"s1","version":"1.0.0","uuid":"d","requestId":"req_2","message":{"id":"d","type":"message","role":"assistant","model":"claude-3-sonnet","content":[{"type":"text","text":"I'm doing well"}],"stop_reason":"end_turn","usage":{"input_tokens":15,"output_tokens":8}}} +{"type":"user","timestamp":"2025-07-01T10:04:00.000Z","parentUuid":"d","isSidechain":false,"userType":"human","cwd":"/tmp","sessionId":"s1","version":"1.0.0","uuid":"e","message":{"role":"user","content":[{"type":"text","text":"Goodbye"}]}} From 70425eccde0b98220fd85e913ba4d7aeded42b30 Mon Sep 17 00:00:00 2001 From: Christian Boos Date: Fri, 13 Feb 2026 18:33:35 +0100 Subject: [PATCH 03/23] Integrate DAG ordering into directory-mode loading (Phase B) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Replace timestamp-based sorting in load_directory_transcripts() with DAG traversal: partition sidechains, build DAG from main entries, traverse depth-first, re-append summaries/queue-ops/sidechains. Fix coverage bug in extract_session_dag_lines() where chain walk from root only covers one node when all parentUuid values are null — now falls back to timestamp sort when chain < total nodes. Co-Authored-By: Claude Opus 4.6 --- claude_code_log/converter.py | 24 +- claude_code_log/dag.py | 11 +- test/__snapshots__/test_snapshot_html.ambr | 40 +-- test/test_cache_integration.py | 9 +- test/test_dag.py | 62 ++++- test/test_dag_integration.py | 304 +++++++++++++++++++++ 6 files changed, 416 insertions(+), 34 deletions(-) create mode 100644 test/test_dag_integration.py diff --git a/claude_code_log/converter.py b/claude_code_log/converter.py index 2a673080..42728892 100644 --- a/claude_code_log/converter.py +++ b/claude_code_log/converter.py @@ -31,11 +31,13 @@ from .models import ( TranscriptEntry, AssistantTranscriptEntry, + QueueOperationTranscriptEntry, SummaryTranscriptEntry, SystemTranscriptEntry, UserTranscriptEntry, ToolResultContent, ) +from .dag import build_dag_from_entries, traverse_session_tree from .renderer import get_renderer, is_html_outdated @@ -331,14 +333,22 @@ def load_directory_transcripts( ) all_messages.extend(messages) - # Sort all messages chronologically - def get_timestamp(entry: TranscriptEntry) -> str: - if hasattr(entry, "timestamp"): - return entry.timestamp # type: ignore - return "" + # Partition: sidechain entries excluded from DAG (Phase C scope) + sidechain_entries = [e for e in all_messages if getattr(e, "isSidechain", False)] + main_entries = [e for e in all_messages if not getattr(e, "isSidechain", False)] - all_messages.sort(key=get_timestamp) - return all_messages + # Build DAG and traverse (entries grouped by session, depth-first) + tree = build_dag_from_entries(main_entries) + dag_ordered = traverse_session_tree(tree) + + # Re-add summaries/queue-ops (excluded from DAG since they lack uuid) + non_dag_entries: list[TranscriptEntry] = [ + e + for e in main_entries + if isinstance(e, (SummaryTranscriptEntry, QueueOperationTranscriptEntry)) + ] + + return dag_ordered + sidechain_entries + non_dag_entries # ============================================================================= diff --git a/claude_code_log/dag.py b/claude_code_log/dag.py index 2ef26a75..a3a60de1 100644 --- a/claude_code_log/dag.py +++ b/claude_code_log/dag.py @@ -247,8 +247,15 @@ def extract_session_dag_lines( linear = False current = None - if not linear: - # Fall back to timestamp sort + if not linear or len(chain) < len(snodes): + if len(chain) < len(snodes): + logger.warning( + "Session %s: chain covers %d of %d nodes, " + "falling back to timestamp sort", + session_id, + len(chain), + len(snodes), + ) sorted_nodes = sorted(snodes, key=lambda n: n.timestamp) chain = [n.uuid for n in sorted_nodes] diff --git a/test/__snapshots__/test_snapshot_html.ambr b/test/__snapshots__/test_snapshot_html.ambr index c7a8cc2b..5bcd7a9c 100644 --- a/test/__snapshots__/test_snapshot_html.ambr +++ b/test/__snapshots__/test_snapshot_html.ambr @@ -15122,7 +15122,7 @@ - @@ -20415,6 +20745,7 @@
Great! Can you also show me how to create a decorator that takes parameters?
+
@@ -20442,6 +20773,7 @@
+def repeat(times):
+ def decorator(func):
+ def wrapper(*args, **kwargs):
+ for _ in range(times):
+ result = func(*args, **kwargs)
+ return result
+ return wrapper
+ return decorator
+
+@repeat(3)
+def greet(name):
+ print(f"Hello, {name}!")
+
+greet("Alice")
+ @@ -20459,6 +20791,7 @@
File created successfully at: /tmp/decorator_example.py
+ @@ -20491,6 +20824,7 @@

When you run this code, it will print "Hello, Alice!" three times.

+ @@ -20508,6 +20842,7 @@
Can you run that example to show the output?
+
@@ -20535,6 +20870,7 @@
python /tmp/decorator_example.py
+ @@ -20554,6 +20890,7 @@ Hello, Alice! Hello, Alice! + @@ -20582,6 +20919,7 @@

The pattern is always the same: decorator factory → decorator → wrapper function.

+ @@ -20599,6 +20937,7 @@
This is really helpful! Let me try to implement a timing decorator myself. Can you help me if I get stuck?
+ diff --git a/test/test_dag.py b/test/test_dag.py index 0c9604c0..d3ec46d1 100644 --- a/test/test_dag.py +++ b/test/test_dag.py @@ -689,3 +689,257 @@ def test_all_null_traversal_returns_all_entries( assert len(result) == 5 uuids = [e.uuid for e in result] # type: ignore[union-attr] assert uuids == ["msg_0", "msg_1", "msg_2", "msg_3", "msg_4"] + + +# ============================================================================= +# Test: Within-session fork (dag_within_fork.jsonl) +# ============================================================================= + + +class TestWithinSessionFork: + """Tests using dag_within_fork.jsonl: s1(a→b→c) with fork at c. + + c has two same-session children: d→e→f (branch 1) and d'→e' (branch 2). + This produces three DAG-lines: trunk (a,b,c), branch 1 (d,e,f), branch 2 (d',e'). + """ + + @pytest.fixture() + def tree(self) -> SessionTree: + entries = load_entries_from_jsonl(TEST_DATA / "dag_within_fork.jsonl") + return build_dag_from_entries(entries) + + def test_trunk_stops_at_fork(self, tree: SessionTree) -> None: + """Trunk DAG-line stops at fork point c.""" + assert "s1" in tree.sessions + assert tree.sessions["s1"].uuids == ["a", "b", "c"] + + def test_branch_sessions_created(self, tree: SessionTree) -> None: + """Two branch pseudo-sessions are created.""" + branch_ids = [sid for sid in tree.sessions if "@" in sid] + assert len(branch_ids) == 2 + + def test_branch1_chain(self, tree: SessionTree) -> None: + """Branch 1 (d→e→f) has correct chain.""" + branch1_id = "s1@d" + assert branch1_id in tree.sessions + assert tree.sessions[branch1_id].uuids == ["d", "e", "f"] + + def test_branch2_chain(self, tree: SessionTree) -> None: + """Branch 2 (d'→e') has correct chain.""" + branch2_id = "s1@d_prime" + assert branch2_id in tree.sessions + assert tree.sessions[branch2_id].uuids == ["d_prime", "e_prime"] + + def test_branches_are_marked(self, tree: SessionTree) -> None: + """Branch DAG-lines have is_branch=True and original_session_id set.""" + for sid in tree.sessions: + if "@" in sid: + dl = tree.sessions[sid] + assert dl.is_branch is True + assert dl.original_session_id == "s1" + else: + dl = tree.sessions[sid] + assert dl.is_branch is False + assert dl.original_session_id is None + + def test_trunk_is_root(self, tree: SessionTree) -> None: + """Only trunk s1 is a root session.""" + assert tree.roots == ["s1"] + + def test_branches_parent_is_trunk(self, tree: SessionTree) -> None: + """Both branches have parent_session_id = trunk.""" + for sid in tree.sessions: + if "@" in sid: + assert tree.sessions[sid].parent_session_id == "s1" + assert tree.sessions[sid].attachment_uuid == "c" + + def test_junction_at_fork_point(self, tree: SessionTree) -> None: + """Fork point c is a junction point with both branches as targets.""" + assert "c" in tree.junction_points + jp = tree.junction_points["c"] + assert jp.session_id == "s1" + assert len(jp.target_sessions) == 2 + assert "s1@d" in jp.target_sessions + assert "s1@d_prime" in jp.target_sessions + + def test_traversal_order(self, tree: SessionTree) -> None: + """Depth-first: trunk, then branch 1 at junction, then branch 2.""" + result = traverse_session_tree(tree) + uuids = [e.uuid for e in result] # type: ignore[union-attr] + assert uuids == ["a", "b", "c", "d", "e", "f", "d_prime", "e_prime"] + + def test_node_session_ids_updated(self, tree: SessionTree) -> None: + """MessageNode.session_id is updated for branch nodes.""" + assert tree.nodes["a"].session_id == "s1" + assert tree.nodes["b"].session_id == "s1" + assert tree.nodes["c"].session_id == "s1" + assert tree.nodes["d"].session_id == "s1@d" + assert tree.nodes["e"].session_id == "s1@d" + assert tree.nodes["f"].session_id == "s1@d" + assert tree.nodes["d_prime"].session_id == "s1@d_prime" + assert tree.nodes["e_prime"].session_id == "s1@d_prime" + + def test_traversal_covers_all_entries(self, tree: SessionTree) -> None: + """All 8 entries should appear in traversal.""" + result = traverse_session_tree(tree) + assert len(result) == 8 + + +class TestNestedFork: + """Test nested within-session forks (fork within a fork).""" + + def test_nested_fork(self) -> None: + """Session with fork at b, then nested fork at d within first branch.""" + # a → b (fork) → d (fork) → f, g + # → e + data = [ + { + "type": "user", + "timestamp": "2025-07-01T10:00:00.000Z", + "parentUuid": None, + "isSidechain": False, + "userType": "human", + "cwd": "/tmp", + "sessionId": "s1", + "version": "1.0.0", + "uuid": "a", + "message": { + "role": "user", + "content": [{"type": "text", "text": "Start"}], + }, + }, + { + "type": "assistant", + "timestamp": "2025-07-01T10:01:00.000Z", + "parentUuid": "a", + "isSidechain": False, + "userType": "human", + "cwd": "/tmp", + "sessionId": "s1", + "version": "1.0.0", + "uuid": "b", + "requestId": "req_1", + "message": { + "id": "b", + "type": "message", + "role": "assistant", + "model": "claude-3-sonnet", + "content": [{"type": "text", "text": "Fork point 1"}], + "stop_reason": "end_turn", + "usage": {"input_tokens": 10, "output_tokens": 5}, + }, + }, + # Branch 1 from b: c → d (fork) + { + "type": "user", + "timestamp": "2025-07-01T10:02:00.000Z", + "parentUuid": "b", + "isSidechain": False, + "userType": "human", + "cwd": "/tmp", + "sessionId": "s1", + "version": "1.0.0", + "uuid": "c", + "message": { + "role": "user", + "content": [{"type": "text", "text": "Branch 1"}], + }, + }, + { + "type": "assistant", + "timestamp": "2025-07-01T10:03:00.000Z", + "parentUuid": "c", + "isSidechain": False, + "userType": "human", + "cwd": "/tmp", + "sessionId": "s1", + "version": "1.0.0", + "uuid": "d", + "requestId": "req_2", + "message": { + "id": "d", + "type": "message", + "role": "assistant", + "model": "claude-3-sonnet", + "content": [{"type": "text", "text": "Fork point 2"}], + "stop_reason": "end_turn", + "usage": {"input_tokens": 10, "output_tokens": 5}, + }, + }, + # Nested branch 1a from d + { + "type": "user", + "timestamp": "2025-07-01T10:04:00.000Z", + "parentUuid": "d", + "isSidechain": False, + "userType": "human", + "cwd": "/tmp", + "sessionId": "s1", + "version": "1.0.0", + "uuid": "f", + "message": { + "role": "user", + "content": [{"type": "text", "text": "Nested branch 1a"}], + }, + }, + # Nested branch 1b from d + { + "type": "user", + "timestamp": "2025-07-01T10:05:00.000Z", + "parentUuid": "d", + "isSidechain": False, + "userType": "human", + "cwd": "/tmp", + "sessionId": "s1", + "version": "1.0.0", + "uuid": "g", + "message": { + "role": "user", + "content": [{"type": "text", "text": "Nested branch 1b"}], + }, + }, + # Branch 2 from b + { + "type": "user", + "timestamp": "2025-07-01T10:06:00.000Z", + "parentUuid": "b", + "isSidechain": False, + "userType": "human", + "cwd": "/tmp", + "sessionId": "s1", + "version": "1.0.0", + "uuid": "e", + "message": { + "role": "user", + "content": [{"type": "text", "text": "Branch 2"}], + }, + }, + ] + entries = [create_transcript_entry(d) for d in data] + tree = build_dag_from_entries(entries) + + # Trunk: a, b (stops at fork) + assert tree.sessions["s1"].uuids == ["a", "b"] + + # Branch 1 from b: c, d (stops at nested fork) + branch1_id = "s1@c" + assert branch1_id in tree.sessions + assert tree.sessions[branch1_id].uuids == ["c", "d"] + + # Nested branches from d (within branch 1) + nested1a_id = f"{branch1_id}@f" + nested1b_id = f"{branch1_id}@g" + assert nested1a_id in tree.sessions + assert tree.sessions[nested1a_id].uuids == ["f"] + assert nested1b_id in tree.sessions + assert tree.sessions[nested1b_id].uuids == ["g"] + + # Branch 2 from b + branch2_id = "s1@e" + assert branch2_id in tree.sessions + assert tree.sessions[branch2_id].uuids == ["e"] + + # Traversal + result = traverse_session_tree(tree) + uuids = [e.uuid for e in result] # type: ignore[union-attr] + assert uuids == ["a", "b", "c", "d", "f", "g", "e"] diff --git a/test/test_dag_integration.py b/test/test_dag_integration.py index 5651c7a6..13b57d79 100644 --- a/test/test_dag_integration.py +++ b/test/test_dag_integration.py @@ -613,3 +613,98 @@ def test_synthetic_progress_in_directory_mode(self, tmp_path: Path) -> None: # All 4 real entries should be in DAG order assert uuids == ["a", "b", "c", "d"] + + +# ============================================================================= +# Test: Within-session fork detection in real data +# ============================================================================= + + +class TestWithinSessionForkRealData: + """Test fork detection using real session 03eb5929 which has a fork at eb84.""" + + def test_fork_detected_at_eb84(self) -> None: + """The real data has a fork at eb84 with two children (5270, 9edc).""" + from claude_code_log.dag import build_dag_from_entries + + result = load_directory_transcripts(EXPERIMENTS_IDEAS_DIR, silent=True) + dag_entries = [e for e in result if hasattr(e, "uuid")] + tree = build_dag_from_entries(dag_entries) + + # Find the fork junction at eb84 + fork_jps = [ + (uuid, jp) + for uuid, jp in tree.junction_points.items() + if uuid.startswith("eb84") and any("@" in t for t in jp.target_sessions) + ] + assert len(fork_jps) == 1, ( + f"Expected 1 fork junction at eb84, got {len(fork_jps)}" + ) + uuid, jp = fork_jps[0] + assert len(jp.target_sessions) == 2 + + def test_no_linearity_warnings(self, caplog: Any) -> None: + """Fork handling should produce no linearity violation warnings.""" + import logging + from claude_code_log.dag import build_dag_from_entries + + with caplog.at_level(logging.WARNING, logger="claude_code_log.dag"): + result = load_directory_transcripts(EXPERIMENTS_IDEAS_DIR, silent=True) + dag_entries = [e for e in result if hasattr(e, "uuid")] + build_dag_from_entries(dag_entries) + + linearity_warnings = [ + r.message for r in caplog.records if "linearity" in r.message + ] + assert linearity_warnings == [] + + def test_branch_sessions_created(self) -> None: + """Branch pseudo-sessions are created for the fork.""" + from claude_code_log.dag import build_dag_from_entries + + result = load_directory_transcripts(EXPERIMENTS_IDEAS_DIR, silent=True) + dag_entries = [e for e in result if hasattr(e, "uuid")] + tree = build_dag_from_entries(dag_entries) + + branch_sessions = [sid for sid in tree.sessions if "@" in sid] + assert len(branch_sessions) >= 2 + + for sid in branch_sessions: + dl = tree.sessions[sid] + assert dl.is_branch is True + assert dl.original_session_id is not None + assert len(dl.uuids) > 0 + + def test_end_to_end_rendering_with_fork(self) -> None: + """Full rendering pipeline produces branch headers for fork.""" + from claude_code_log.renderer import generate_template_messages + from claude_code_log.models import SessionHeaderMessage + + result = load_directory_transcripts(EXPERIMENTS_IDEAS_DIR, silent=True) + root_messages, session_nav, ctx = generate_template_messages(result) + + # Find branch headers + branch_headers = [ + tm + for tm in root_messages + if isinstance(tm.content, SessionHeaderMessage) and tm.content.is_branch + ] + assert len(branch_headers) >= 2 + + def test_within_fork_coverage(self) -> None: + """All entries are covered by DAG-lines (trunk + branches).""" + from claude_code_log.dag import build_dag_from_entries + + result = load_directory_transcripts(EXPERIMENTS_IDEAS_DIR, silent=True) + dag_entries = [e for e in result if hasattr(e, "uuid")] + tree = build_dag_from_entries(dag_entries) + + # Should have both trunk and branch pseudo-sessions + real_sessions = [sid for sid in tree.sessions if "@" not in sid] + branch_sessions = [sid for sid in tree.sessions if "@" in sid] + assert len(real_sessions) >= 1 + assert len(branch_sessions) >= 2 + + # All entries should be covered + total_in_daglines = sum(len(dl.uuids) for dl in tree.sessions.values()) + assert total_in_daglines == len(tree.nodes) diff --git a/test/test_data/dag_within_fork.jsonl b/test/test_data/dag_within_fork.jsonl new file mode 100644 index 00000000..c6faf28e --- /dev/null +++ b/test/test_data/dag_within_fork.jsonl @@ -0,0 +1,8 @@ +{"type":"user","timestamp":"2025-07-01T10:00:00.000Z","parentUuid":null,"isSidechain":false,"userType":"human","cwd":"/tmp","sessionId":"s1","version":"1.0.0","uuid":"a","message":{"role":"user","content":[{"type":"text","text":"Hello"}]}} +{"type":"assistant","timestamp":"2025-07-01T10:01:00.000Z","parentUuid":"a","isSidechain":false,"userType":"human","cwd":"/tmp","sessionId":"s1","version":"1.0.0","uuid":"b","requestId":"req_1","message":{"id":"b","type":"message","role":"assistant","model":"claude-3-sonnet","content":[{"type":"text","text":"Hi there"}],"stop_reason":"end_turn","usage":{"input_tokens":10,"output_tokens":5}}} +{"type":"user","timestamp":"2025-07-01T10:02:00.000Z","parentUuid":"b","isSidechain":false,"userType":"human","cwd":"/tmp","sessionId":"s1","version":"1.0.0","uuid":"c","message":{"role":"user","content":[{"type":"text","text":"Fork point"}]}} +{"type":"user","timestamp":"2025-07-01T10:03:00.000Z","parentUuid":"c","isSidechain":false,"userType":"human","cwd":"/tmp","sessionId":"s1","version":"1.0.0","uuid":"d","message":{"role":"user","content":[{"type":"text","text":"Branch 1 start"}]}} +{"type":"assistant","timestamp":"2025-07-01T10:04:00.000Z","parentUuid":"d","isSidechain":false,"userType":"human","cwd":"/tmp","sessionId":"s1","version":"1.0.0","uuid":"e","requestId":"req_2","message":{"id":"e","type":"message","role":"assistant","model":"claude-3-sonnet","content":[{"type":"text","text":"Branch 1 reply"}],"stop_reason":"end_turn","usage":{"input_tokens":15,"output_tokens":8}}} +{"type":"user","timestamp":"2025-07-01T10:05:00.000Z","parentUuid":"e","isSidechain":false,"userType":"human","cwd":"/tmp","sessionId":"s1","version":"1.0.0","uuid":"f","message":{"role":"user","content":[{"type":"text","text":"Branch 1 end"}]}} +{"type":"user","timestamp":"2025-07-01T10:10:00.000Z","parentUuid":"c","isSidechain":false,"userType":"human","cwd":"/tmp","sessionId":"s1","version":"1.0.0","uuid":"d_prime","message":{"role":"user","content":[{"type":"text","text":"Branch 2 start"}]}} +{"type":"assistant","timestamp":"2025-07-01T10:11:00.000Z","parentUuid":"d_prime","isSidechain":false,"userType":"human","cwd":"/tmp","sessionId":"s1","version":"1.0.0","uuid":"e_prime","requestId":"req_3","message":{"id":"e_prime","type":"message","role":"assistant","model":"claude-3-sonnet","content":[{"type":"text","text":"Branch 2 reply"}],"stop_reason":"end_turn","usage":{"input_tokens":15,"output_tokens":8}}} From 04271773c5925e5e54467a0d270c6e00446f804f Mon Sep 17 00:00:00 2001 From: Christian Boos Date: Tue, 17 Feb 2026 19:45:22 +0100 Subject: [PATCH 10/23] Add message previews to fork/branch nav items and headers Branch nav items and headers now show the first user message text instead of opaque IDs. Branch headers truncate at 80 chars for readability. Co-Authored-By: Claude Opus 4.6 --- claude_code_log/renderer.py | 37 +++++++++++++++++++++++++++++++++++-- 1 file changed, 35 insertions(+), 2 deletions(-) diff --git a/claude_code_log/renderer.py b/claude_code_log/renderer.py index c03a29b0..485f95b0 100644 --- a/claude_code_log/renderer.py +++ b/claude_code_log/renderer.py @@ -839,6 +839,23 @@ def prepare_session_navigation( # Add branch pseudo-sessions from hierarchy if session_hierarchy: + # Collect first user message preview for each branch + branch_previews: dict[str, str] = {} + for msg in ctx.messages: + rsid = msg.render_session_id + if rsid in branch_previews or not isinstance(msg.content, UserTextMessage): + continue + hier = session_hierarchy.get(rsid, {}) + if hier.get("is_branch"): + # Extract text from UserTextMessage items + preview_parts = [] + for item in msg.content.items: + if hasattr(item, "text"): + preview_parts.append(item.text) + preview = " ".join(preview_parts).strip() + if preview: + branch_previews[rsid] = create_session_preview(preview) + # Group branches by their junction point (attachment_uuid) junction_branches: dict[str, list[dict[str, Any]]] = {} for sid, hier in session_hierarchy.items(): @@ -906,7 +923,10 @@ def prepare_session_navigation( "first_timestamp": "", "last_timestamp": "", "message_count": 0, - "first_user_message": f"Branch {branch_sid.split('@')[-1][:8]}", + "first_user_message": branch_previews.get( + branch_sid, + f"Branch {branch_sid.split('@')[-1][:8]}", + ), "token_summary": "", "parent_session_id": parent_sid, "parent_message_index": fork_msg_idx, @@ -1944,7 +1964,20 @@ def _render_messages( parent_msg_idx = ctx.session_first_message.get(parent_sid) original_sid = b_hier.get("original_session_id", message.sessionId) branch_summary = (session_summaries or {}).get(original_sid) - branch_title = f"Branch • {branch_sid.split('@')[-1][:8]}" + # Extract preview from the branch's first user message + branch_preview = "" + if as_user_entry(message): + branch_text = extract_text_content(message.message.content) + if branch_text: + branch_preview = create_session_preview(branch_text) + # Truncate for header title (keep full preview for nav) + if branch_preview: + short = branch_preview[:80] + if len(branch_preview) > 80: + short += "..." + branch_title = f"Branch • {short}" + else: + branch_title = f"Branch • {branch_sid.split('@')[-1][:8]}" branch_header_meta = MessageMeta( session_id=branch_sid, From 5f14b23de2c7128366d02d45a529dac4fa1e1d10 Mon Sep 17 00:00:00 2001 From: Christian Boos Date: Tue, 17 Feb 2026 20:10:57 +0100 Subject: [PATCH 11/23] Improve fork/branch presentation with context and visual hierarchy - Fork point nav shows parent message preview (walks past system hooks) - Branch backlinks say "branched from" with fork point context - Branch headers include original session ID for orientation - Branch headers indented 2em to distinguish from true session starts Co-Authored-By: Claude Opus 4.6 --- claude_code_log/html/system_formatters.py | 16 +++- .../templates/components/message_styles.css | 5 ++ .../html/templates/transcript.html | 2 +- claude_code_log/renderer.py | 76 +++++++++++++++++-- test/__snapshots__/test_snapshot_html.ambr | 20 +++++ 5 files changed, 109 insertions(+), 10 deletions(-) diff --git a/claude_code_log/html/system_formatters.py b/claude_code_log/html/system_formatters.py index 7770fdd3..339865ea 100644 --- a/claude_code_log/html/system_formatters.py +++ b/claude_code_log/html/system_formatters.py @@ -89,13 +89,23 @@ def format_session_header_content(content: SessionHeaderMessage) -> str: """ escaped_title = html.escape(content.title) if content.is_branch and content.parent_message_index is not None: - # Branch header: backlink to fork point + # Branch header: backlink to fork point with context + fork_label = "fork point" + if content.parent_session_summary: + escaped_summary = html.escape(content.parent_session_summary) + fork_label = escaped_summary + # Show original session ID for context + orig_id = "" + if content.original_session_id: + orig_id = content.original_session_id[:8] link = ( f'' - f"↳ branch from fork point" + f"↳ branched from {fork_label}" + ) + return ( + f"{orig_id} {link}{escaped_title}" if orig_id else f"{link}{escaped_title}" ) - return f"{link}{escaped_title}" if content.parent_session_id: parent_label = content.parent_session_summary or content.parent_session_id[:8] escaped_parent = html.escape(parent_label) diff --git a/claude_code_log/html/templates/components/message_styles.css b/claude_code_log/html/templates/components/message_styles.css index abdd30c4..be543d42 100644 --- a/claude_code_log/html/templates/components/message_styles.css +++ b/claude_code_log/html/templates/components/message_styles.css @@ -565,6 +565,11 @@ font-size: 1.2em; } +/* Branch headers (within-session forks) — visually subordinate */ +.session-header.branch-header { + margin-left: 2em; +} + .session-subtitle { font-size: 0.9em; color: var(--text-muted); diff --git a/claude_code_log/html/templates/transcript.html b/claude_code_log/html/templates/transcript.html index 4240d4d9..d794ec4d 100644 --- a/claude_code_log/html/templates/transcript.html +++ b/claude_code_log/html/templates/transcript.html @@ -99,7 +99,7 @@

🔍 Search & Filter

{% for message, message_title, html_content, formatted_timestamp in messages %} {% if is_session_header(message) %}
-
+
Session: {{ html_content|safe }}
{% if message.has_children %}
diff --git a/claude_code_log/renderer.py b/claude_code_log/renderer.py index 485f95b0..506e2155 100644 --- a/claude_code_log/renderer.py +++ b/claude_code_log/renderer.py @@ -207,6 +207,11 @@ def is_session_header(self) -> bool: """Check if this message is a session header.""" return isinstance(self.content, SessionHeaderMessage) + @property + def is_branch_header(self) -> bool: + """Check if this is a branch (within-session fork) header.""" + return isinstance(self.content, SessionHeaderMessage) and self.content.is_branch + @property def has_children(self) -> bool: """Check if this message has any children.""" @@ -758,6 +763,48 @@ def prepare_session_summaries(messages: list[TranscriptEntry]) -> dict[str, str] return session_summaries +def _fork_point_preview(fork_msg: "TemplateMessage", ctx: RenderingContext) -> str: + """Get a meaningful preview for a fork point message. + + If the fork point is a system hook (common with /rewind), walk up + to the parent message to find more descriptive content. + """ + msg = fork_msg + # Walk up past system hooks to find a meaningful message + for _ in range(3): # limit walk depth + if not isinstance( + msg.content, (SystemMessage, HookSummaryMessage, SessionHeaderMessage) + ): + break + # Find parent by looking at parent_uuid + parent_uuid = msg.meta.parent_uuid + if not parent_uuid: + break + parent = next((m for m in ctx.messages if m.meta.uuid == parent_uuid), None) + if parent is None: + break + msg = parent + + # Extract text from the found message + content = msg.content + if isinstance(content, AssistantTextMessage): + parts = [item.text for item in content.items if hasattr(item, "text")] + text = " ".join(parts).strip() + elif isinstance(content, UserTextMessage): + parts = [item.text for item in content.items if hasattr(item, "text")] + text = " ".join(parts).strip() + else: + return "" + + if not text: + return "" + # Truncate for nav display + short = text[:80] + if len(text) > 80: + short += "..." + return short + + def prepare_session_navigation( sessions: dict[str, dict[str, Any]], session_order: list[str], @@ -885,13 +932,24 @@ def prepare_session_navigation( ): insert_pos += 1 - # Fork point nav item + # Fork point nav item — find the junction message and a + # meaningful preview (walk up past system hooks to find it) fork_msg_idx = ctx.session_first_message.get(parent_sid) - # Try to find the junction message index from the attachment uuid + fork_preview = "" + fork_msg = None for msg in ctx.messages: if msg.meta.uuid == attachment_uuid and msg.message_index is not None: fork_msg_idx = msg.message_index + fork_msg = msg break + if fork_msg is not None: + fork_preview = _fork_point_preview(fork_msg, ctx) + + fork_label = ( + f"Fork point • {fork_preview}" + if fork_preview + else f"Fork point ({len(branches)} branches)" + ) fork_nav = { "id": f"fork-{attachment_uuid[:12]}", @@ -901,7 +959,7 @@ def prepare_session_navigation( "first_timestamp": "", "last_timestamp": "", "message_count": 0, - "first_user_message": f"Fork point ({len(branches)} branches)", + "first_user_message": fork_label, "token_summary": "", "parent_session_id": parent_sid, "parent_message_index": ctx.session_first_message.get(parent_sid), @@ -1984,15 +2042,21 @@ def _render_messages( timestamp="", uuid="", ) + # Get fork point preview for backlink text + fork_context = "" + if attachment_uuid: + for fmsg in ctx.messages: + if fmsg.meta.uuid == attachment_uuid: + fork_context = _fork_point_preview(fmsg, ctx) + break + branch_header_content = SessionHeaderMessage( branch_header_meta, title=branch_title, session_id=branch_sid, summary=branch_summary, parent_session_id=parent_sid, - parent_session_summary=(session_summaries or {}).get(parent_sid) - if parent_sid - else None, + parent_session_summary=fork_context or None, parent_message_index=parent_msg_idx, depth=b_hier.get("depth", 0), attachment_uuid=b_hier.get("attachment_uuid"), diff --git a/test/__snapshots__/test_snapshot_html.ambr b/test/__snapshots__/test_snapshot_html.ambr index 765b66f5..4c7e2469 100644 --- a/test/__snapshots__/test_snapshot_html.ambr +++ b/test/__snapshots__/test_snapshot_html.ambr @@ -2781,6 +2781,11 @@ font-size: 1.2em; } + /* Branch headers (within-session forks) — visually subordinate */ + .session-header.branch-header { + margin-left: 2em; + } + .session-subtitle { font-size: 0.9em; color: var(--text-muted); @@ -7840,6 +7845,11 @@ font-size: 1.2em; } + /* Branch headers (within-session forks) — visually subordinate */ + .session-header.branch-header { + margin-left: 2em; + } + .session-subtitle { font-size: 0.9em; color: var(--text-muted); @@ -12996,6 +13006,11 @@ font-size: 1.2em; } + /* Branch headers (within-session forks) — visually subordinate */ + .session-header.branch-header { + margin-left: 2em; + } + .session-subtitle { font-size: 0.9em; color: var(--text-muted); @@ -18207,6 +18222,11 @@ font-size: 1.2em; } + /* Branch headers (within-session forks) — visually subordinate */ + .session-header.branch-header { + margin-left: 2em; + } + .session-subtitle { font-size: 0.9em; color: var(--text-muted); From d656d2b777c9276860938178293f4e9f88d39940 Mon Sep 17 00:00:00 2001 From: Christian Boos Date: Wed, 18 Feb 2026 19:03:12 +0100 Subject: [PATCH 12/23] Fix false forks from context compaction replays and tool-result side-branches MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Context compaction replays created hundreds of false branch pseudo-sessions (573 in clmail project) because replayed entries share the same parentUuid but get new UUIDs. Tool-result entries from parallel tool calls also created false forks by pointing back to their tool_use parent alongside the next tool_use in the chain. Two detection heuristics in _walk_session_with_forks(): - Compaction replays: same-timestamp children → follow first, skip rest - Tool-result stitching: dead-end User + single Assistant continuation → stitch into linear chain Also: orphan promotion (dangling parentUuid → root), multi-root walking with trunk merging, and coverage tracking that excludes skipped replays. Result on clmail project: 573 → 56 branches, 207 → 29 junction points. Co-Authored-By: Claude Opus 4.6 --- claude_code_log/dag.py | 161 ++++++++++++++++++++++++++++++++++++----- dev-docs/dag.md | 53 +++++++++++++- test/test_dag.py | 138 +++++++++++++++++++++++++++++++++++ 3 files changed, 333 insertions(+), 19 deletions(-) diff --git a/claude_code_log/dag.py b/claude_code_log/dag.py index 478de3bb..048ad6b0 100644 --- a/claude_code_log/dag.py +++ b/claude_code_log/dag.py @@ -14,6 +14,8 @@ TranscriptEntry, SummaryTranscriptEntry, QueueOperationTranscriptEntry, + UserTranscriptEntry, + AssistantTranscriptEntry, ) logger = logging.getLogger(__name__) @@ -147,10 +149,14 @@ def build_dag(nodes: dict[str, MessageNode]) -> None: parent.children_uuids.append(node.uuid) else: logger.warning( - "Orphan node %s: parentUuid %s not found in loaded data", + "Orphan node %s: parentUuid %s not found in loaded data" + " (promoting to root)", node.uuid, node.parent_uuid, ) + # Clear the dangling parent so this node becomes a root + # and can participate in DAG walks + node.parent_uuid = None # Validate: no cycles (walk parent chain for each node) for node in nodes.values(): @@ -172,12 +178,75 @@ def build_dag(nodes: dict[str, MessageNode]) -> None: # ============================================================================= +def _collect_descendants( + uuid: str, + session_uuids: set[str], + nodes: dict[str, MessageNode], + result: set[str], +) -> None: + """Recursively collect a node and all its same-session descendants.""" + if uuid in result: + return + result.add(uuid) + node = nodes.get(uuid) + if node is None: + return + for child in node.children_uuids: + if child in session_uuids: + _collect_descendants(child, session_uuids, nodes, result) + + +def _stitch_tool_results( + children: list[str], + session_uuids: set[str], + nodes: dict[str, MessageNode], +) -> Optional[list[str]]: + """Detect and stitch tool-result side-branches into a linear chain. + + When the assistant makes multiple tool calls in one turn, the JSONL + records both the next tool_use and the tool_result as children of the + current tool_use entry, creating a false fork. Pattern: + + A(tool_use) → U(tool_result) [dead-end side-branch] + → A(next tool_use) [main chain continues] + + This function detects the pattern and returns a stitched ordering + [U(result), A(next)] so the caller can extend the chain linearly. + Returns None if the pattern doesn't match. + """ + # Separate into user (tool_result) and assistant (continuation) children + user_children = [ + c for c in children if isinstance(nodes[c].entry, UserTranscriptEntry) + ] + assistant_children = [ + c for c in children if isinstance(nodes[c].entry, AssistantTranscriptEntry) + ] + + if not user_children or not assistant_children: + return None # Not the tool_result pattern + + # Verify user children are dead ends (no same-session descendants) + for uc in user_children: + unode = nodes[uc] + if any(c in session_uuids for c in unode.children_uuids): + return None # User child has continuation — not a dead end + + # All assistant children must form a single continuation chain + if len(assistant_children) != 1: + return None # Multiple assistant continuations — ambiguous + + # Stitch: user results first (sorted by timestamp), then the + # single assistant continuation + user_children.sort(key=lambda c: nodes[c].timestamp) + return user_children + assistant_children + + def _walk_session_with_forks( root: MessageNode, session_id: str, session_uuids: set[str], nodes: dict[str, MessageNode], -) -> list[SessionDAGLine]: +) -> tuple[list[SessionDAGLine], set[str]]: """Walk a session's DAG from root, splitting into separate DAG-lines at fork points. Uses a queue-based approach to handle nested forks: @@ -188,11 +257,13 @@ def _walk_session_with_forks( 4. Update MessageNode.session_id for branch nodes Returns: - List of SessionDAGLine objects (trunk first, then branches) + Tuple of (DAG-line list, set of UUIDs intentionally skipped as + compaction replays). """ # Queue entries: (start_uuid, dag_line_id, parent_dag_line_id) queue: list[tuple[str, str, Optional[str]]] = [(root.uuid, session_id, None)] result: list[SessionDAGLine] = [] + skipped: set[str] = set() # Compaction replay UUIDs while queue: start_uuid, line_id, parent_line_id = queue.pop(0) @@ -215,13 +286,39 @@ def _walk_session_with_forks( elif len(same_session_children) == 1: current = nodes[same_session_children[0]] else: - # Fork point: stop chain here, push each child as a branch - # Sort children chronologically for deterministic order + # Multiple same-session children. Distinguish real forks + # from artifacts (see dev-docs/dag.md caveats). same_session_children.sort(key=lambda c: nodes[c].timestamp) - for child_uuid in same_session_children: - branch_id = f"{line_id}@{child_uuid[:12]}" - queue.append((child_uuid, branch_id, line_id)) - current = None + + stitched = _stitch_tool_results( + same_session_children, session_uuids, nodes + ) + if stitched is not None: + # Tool-result side-branches were stitched into the + # chain. Extend chain and continue with the tail. + if is_branch: + for su in stitched[:-1]: + nodes[su].session_id = line_id + chain.extend(stitched[:-1]) + current = nodes[stitched[-1]] + else: + unique_timestamps = { + nodes[c].timestamp for c in same_session_children + } + if len(unique_timestamps) == 1: + # Same timestamp = compaction replay: follow only + # the first child (original chain), skip replays + # and all their descendants. + current = nodes[same_session_children[0]] + for sc in same_session_children[1:]: + _collect_descendants(sc, session_uuids, nodes, skipped) + else: + # Different timestamps = real fork (rewind). + # Stop chain here, push each child as a branch. + for child_uuid in same_session_children: + branch_id = f"{line_id}@{child_uuid[:12]}" + queue.append((child_uuid, branch_id, line_id)) + current = None if chain: first_ts = nodes[chain[0]].timestamp @@ -239,7 +336,7 @@ def _walk_session_with_forks( dag_line.attachment_uuid = parent_uuid result.append(dag_line) - return result + return result, skipped def extract_session_dag_lines( @@ -284,21 +381,34 @@ def extract_session_dag_lines( ) continue + # Sort roots by timestamp (earliest first = primary root) + roots.sort(key=lambda n: n.timestamp) if len(roots) > 1: - # Multiple roots - pick the earliest by timestamp - roots.sort(key=lambda n: n.timestamp) logger.warning( - "Session %s: %d roots found, using earliest (%s)", + "Session %s: %d roots found, walking all from earliest (%s)", session_id, len(roots), roots[0].uuid, ) - # Walk with fork detection - dag_lines = _walk_session_with_forks(roots[0], session_id, session_uuids, nodes) + # Walk from ALL roots to maximize coverage (orphan-promoted roots + # create disconnected subtrees that must each be walked) + dag_lines: list[SessionDAGLine] = [] + walked_uuids: set[str] = set() + skipped_uuids: set[str] = set() + for root in roots: + if root.uuid in walked_uuids: + continue + root_lines, root_skipped = _walk_session_with_forks( + root, session_id, session_uuids, nodes + ) + for dl in root_lines: + walked_uuids.update(dl.uuids) + skipped_uuids.update(root_skipped) + dag_lines.extend(root_lines) - # Check coverage: all session nodes should be in some DAG-line - covered = sum(len(dl.uuids) for dl in dag_lines) + # Check coverage: walked + intentionally skipped (compaction replays) + covered = len(walked_uuids) + len(skipped_uuids) if covered < len(snodes): logger.warning( "Session %s: DAG walk covers %d of %d nodes, " @@ -314,7 +424,22 @@ def extract_session_dag_lines( first_timestamp=sorted_nodes[0].timestamp, ) else: - for dag_line in dag_lines: + # Merge non-branch DAG-lines that share the same session_id + # (happens when multiple roots exist due to orphan promotion) + trunk_lines = [dl for dl in dag_lines if dl.session_id == session_id] + branch_lines = [dl for dl in dag_lines if dl.session_id != session_id] + if trunk_lines: + # Merge all trunk lines into one, ordered by first_timestamp + trunk_lines.sort(key=lambda dl: dl.first_timestamp) + merged_uuids: list[str] = [] + for tl in trunk_lines: + merged_uuids.extend(tl.uuids) + sessions[session_id] = SessionDAGLine( + session_id=session_id, + uuids=merged_uuids, + first_timestamp=trunk_lines[0].first_timestamp, + ) + for dag_line in branch_lines: sessions[dag_line.session_id] = dag_line return sessions diff --git a/dev-docs/dag.md b/dev-docs/dag.md index 2f43ec07..36b8c692 100644 --- a/dev-docs/dag.md +++ b/dev-docs/dag.md @@ -231,12 +231,63 @@ boundaries. This is both correct and faster. --- +## Caveats + +### Context Compaction Replays + +When Claude Code compacts context (inserting a `SummaryTranscriptEntry`), it +**replays** the conversation from a certain point with **new UUIDs** but the +**same `parentUuid` and timestamp** as the original entries. This creates +multiple same-session children from a single parent — structurally identical +to a user rewind (fork), but semantically a replay. + +**Distinguishing heuristic**: timestamps. + +- **Real fork (rewind)**: the user goes back and types a new message at a + different time → children have **different** timestamps. +- **Compaction replay**: the system re-emits the same turn → children share + the **same** timestamp as the original. + +When `_walk_session_with_forks()` encounters a node with multiple same-session +children that all share the same timestamp, it follows only the **first** +child (the original chain) and ignores the later replay chains. This avoids +creating hundreds of false branch pseudo-sessions in long-running sessions +with frequent compaction. + +The heuristic is validated on real data: across all fork points, forks +partition cleanly into same-timestamp (compaction) vs different-timestamp +(rewind) groups, with no mixed cases observed. + +### Tool-Result Side-Branches + +When the assistant makes **multiple tool calls** in one turn, the JSONL +records both the next `tool_use` and the previous `tool_result` as children +of the same parent entry: + +``` +A(tool_use₁) → U(tool_result₁) [dead-end side-branch] + → A(tool_use₂) [main chain continues] +``` + +This creates a false fork at each multi-tool-call point. The fix +(`_stitch_tool_results()`) detects the pattern — User (dead-end) + Assistant +(continuation) children — and stitches the tool results into the main chain: +`A(tool_use₁) → U(tool_result₁) → A(tool_use₂) → ...` + +Detection criteria: +- At least one User child and exactly one Assistant child +- All User children are dead ends (no same-session descendants) +- User children are tool_result entries inserted before the continuation + +--- + ## Assertions / Invariants These should be checked at runtime (log warnings, don't crash): 1. **Session linearity**: Each session's messages form a single chain - (no branching within a `sessionId`) + (no branching within a `sessionId`), except for explicit user rewinds + which create within-session forks rendered as branch pseudo-sessions 2. **DAG acyclicity**: No cycles in `parentUuid` chains 3. **Unique ownership**: After deduplication, each `uuid` belongs to exactly one session diff --git a/test/test_dag.py b/test/test_dag.py index d3ec46d1..dbced621 100644 --- a/test/test_dag.py +++ b/test/test_dag.py @@ -943,3 +943,141 @@ def test_nested_fork(self) -> None: result = traverse_session_tree(tree) uuids = [e.uuid for e in result] # type: ignore[union-attr] assert uuids == ["a", "b", "c", "d", "f", "g", "e"] + + +def _make_entry( + etype: str, + uuid: str, + parent: str | None, + ts: str, + session: str = "s1", + text: str = "", +) -> dict: + """Helper to build a minimal transcript entry dict.""" + base = { + "type": etype, + "timestamp": ts, + "parentUuid": parent, + "isSidechain": False, + "userType": "human", + "cwd": "/tmp", + "sessionId": session, + "version": "1.0.0", + "uuid": uuid, + } + if etype == "user": + base["message"] = { + "role": "user", + "content": [{"type": "text", "text": text or uuid}], + } + elif etype == "assistant": + base["requestId"] = f"req_{uuid}" + base["message"] = { + "id": uuid, + "type": "message", + "role": "assistant", + "model": "claude-3-sonnet", + "content": [{"type": "text", "text": text or uuid}], + "stop_reason": "end_turn", + "usage": {"input_tokens": 10, "output_tokens": 5}, + } + elif etype == "system": + base["message"] = {"content": []} + return base + + +class TestCompactionReplay: + """Context compaction replays should not create branches.""" + + def test_same_timestamp_children_not_forked(self) -> None: + """Multiple children with identical timestamps are compaction replays.""" + # a → sys → replay1, replay2, replay3 (all same ts) + data = [ + _make_entry("user", "a", None, "2025-07-01T10:00:00.000Z"), + _make_entry("system", "sys", "a", "2025-07-01T10:01:00.000Z"), + _make_entry("assistant", "r1", "sys", "2025-07-01T10:02:00.000Z"), + _make_entry("assistant", "r2", "sys", "2025-07-01T10:02:00.000Z"), + _make_entry("assistant", "r3", "sys", "2025-07-01T10:02:00.000Z"), + ] + entries = [create_transcript_entry(d) for d in data] + tree = build_dag_from_entries(entries) + + # Should be a single linear session, no branches + assert "s1" in tree.sessions + assert tree.sessions["s1"].uuids == ["a", "sys", "r1"] + branch_count = sum(1 for s in tree.sessions.values() if s.is_branch) + assert branch_count == 0 + + def test_different_timestamps_create_branches(self) -> None: + """Children with different timestamps are real forks (rewinds).""" + data = [ + _make_entry("user", "a", None, "2025-07-01T10:00:00.000Z"), + _make_entry("assistant", "b", "a", "2025-07-01T10:01:00.000Z"), + _make_entry("user", "c", "b", "2025-07-01T10:02:00.000Z"), + _make_entry("user", "d", "b", "2025-07-01T10:05:00.000Z"), + ] + entries = [create_transcript_entry(d) for d in data] + tree = build_dag_from_entries(entries) + + # Trunk stops at b, two branches + assert tree.sessions["s1"].uuids == ["a", "b"] + branch_count = sum(1 for s in tree.sessions.values() if s.is_branch) + assert branch_count == 2 + + +class TestToolResultStitching: + """Tool-result side-branches should be stitched into the main chain.""" + + def test_single_tool_result_stitched(self) -> None: + """A(tool_use) → U(result) + A(next) should become linear.""" + # a → tool_use → tool_result (dead end) + next_assistant + data = [ + _make_entry("user", "a", None, "2025-07-01T10:00:00.000Z"), + _make_entry("assistant", "tool1", "a", "2025-07-01T10:01:00.000Z"), + _make_entry("user", "result1", "tool1", "2025-07-01T10:01:00.100Z"), + _make_entry("assistant", "tool2", "tool1", "2025-07-01T10:01:00.200Z"), + ] + entries = [create_transcript_entry(d) for d in data] + tree = build_dag_from_entries(entries) + + # Should be linear: a → tool1 → result1 → tool2 + assert tree.sessions["s1"].uuids == ["a", "tool1", "result1", "tool2"] + branch_count = sum(1 for s in tree.sessions.values() if s.is_branch) + assert branch_count == 0 + + def test_multiple_tool_results_stitched(self) -> None: + """Multiple parallel tool_use with results should all be stitched.""" + # a → tool1 → result1 (dead end) + result2 (dead end) + tool2 + data = [ + _make_entry("user", "a", None, "2025-07-01T10:00:00.000Z"), + _make_entry("assistant", "tool1", "a", "2025-07-01T10:01:00.000Z"), + _make_entry("user", "res1", "tool1", "2025-07-01T10:01:00.100Z"), + _make_entry("user", "res2", "tool1", "2025-07-01T10:01:00.150Z"), + _make_entry("assistant", "tool2", "tool1", "2025-07-01T10:01:00.200Z"), + ] + entries = [create_transcript_entry(d) for d in data] + tree = build_dag_from_entries(entries) + + # Should be linear: a → tool1 → res1 → res2 → tool2 + assert tree.sessions["s1"].uuids == ["a", "tool1", "res1", "res2", "tool2"] + branch_count = sum(1 for s in tree.sessions.values() if s.is_branch) + assert branch_count == 0 + + def test_user_child_with_descendants_not_stitched(self) -> None: + """If the user child has descendants, it's not a dead-end + tool result — treat as a real fork.""" + data = [ + _make_entry("user", "a", None, "2025-07-01T10:00:00.000Z"), + _make_entry("assistant", "b", "a", "2025-07-01T10:01:00.000Z"), + _make_entry("user", "c", "b", "2025-07-01T10:02:00.000Z"), + _make_entry("assistant", "d", "b", "2025-07-01T10:03:00.000Z"), + # c has a descendant — it's NOT a dead end + _make_entry("assistant", "e", "c", "2025-07-01T10:04:00.000Z"), + ] + entries = [create_transcript_entry(d) for d in data] + tree = build_dag_from_entries(entries) + + # Should be a fork at b with two branches + assert tree.sessions["s1"].uuids == ["a", "b"] + branch_count = sum(1 for s in tree.sessions.values() if s.is_branch) + assert branch_count == 2 From b7fb95dfccac5ebb1d941c8623b64f276fa4aa57 Mon Sep 17 00:00:00 2001 From: Christian Boos Date: Thu, 19 Feb 2026 16:51:24 +0100 Subject: [PATCH 13/23] Add debug UUID toggle to show uuid/parentUuid on each message MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Adds a small "uuid" button in the floating button stack that toggles display of truncated uuid → parentUuid on every message, useful for analyzing DAG structure and diagnosing spurious forks. Co-Authored-By: Claude Opus 4.6 --- .../templates/components/global_styles.css | 15 ++ .../templates/components/message_styles.css | 14 ++ .../html/templates/transcript.html | 9 + test/__snapshots__/test_snapshot_html.ambr | 211 ++++++++++++++++++ 4 files changed, 249 insertions(+) diff --git a/claude_code_log/html/templates/components/global_styles.css b/claude_code_log/html/templates/components/global_styles.css index eed4503c..f47a393f 100644 --- a/claude_code_log/html/templates/components/global_styles.css +++ b/claude_code_log/html/templates/components/global_styles.css @@ -230,6 +230,21 @@ pre { bottom: 200px; } +.debug-toggle.floating-btn { + bottom: 260px; + border-radius: 6px; + width: 38px; + height: 28px; + font-size: 0.65em; + font-family: 'SFMono-Regular', Consolas, monospace; + font-weight: 600; +} + +.debug-toggle.floating-btn.active { + background-color: #d4e8f7; + color: #333; +} + @media (max-width: 1280px) { .header > span:first-child { flex: auto; diff --git a/claude_code_log/html/templates/components/message_styles.css b/claude_code_log/html/templates/components/message_styles.css index be543d42..fb8cb83b 100644 --- a/claude_code_log/html/templates/components/message_styles.css +++ b/claude_code_log/html/templates/components/message_styles.css @@ -897,6 +897,20 @@ details summary { .ansi-bg-cyan { background-color: #11a8cd; } .ansi-bg-white { background-color: #e5e5e5; } +/* Debug UUID info */ +.debug-info { + display: none; + font-family: 'SFMono-Regular', Consolas, 'Liberation Mono', Menlo, monospace; + font-size: 0.7em; + color: #999; + padding: 2px 0; + letter-spacing: 0.02em; +} + +.show-debug-info .debug-info { + display: block; +} + /* Bright background colors */ .ansi-bg-bright-black { background-color: #666666; } .ansi-bg-bright-red { background-color: #f14c4c; } diff --git a/claude_code_log/html/templates/transcript.html b/claude_code_log/html/templates/transcript.html index d794ec4d..f3deea51 100644 --- a/claude_code_log/html/templates/transcript.html +++ b/claude_code_log/html/templates/transcript.html @@ -141,6 +141,7 @@

🔍 Search & Filter

{% endif %}
+ {% if message.meta %}
{{ message.meta.uuid[:12] }}{% if message.meta.parent_uuid %} → {{ message.meta.parent_uuid[:12] }}{% endif %}
{% endif %}
{{ html_content | safe }}
{% if message.junction_forward_links %}