Crash-Resilient Structural Undo/Redo for Open Transactions

### 1. Summary (Required)

**What is the enhancement?**  
Implement crash-resilient, structural experience-level undo/redo for pre-commit transaction editing, backed by local SQLite persistence.

This includes:

- transaction-scoped undo/redo stacks
- structural checkpoints of staged/transient graph state
- local crash recovery of latest consistent open-transaction snapshot
- cleanup of persisted recovery state on commit/rollback

Checkpoint persistence will be triggered by command metadata policy (`CommandDescriptor.snapshot_after`), with interim trigger mocking allowed until Phase 3 command ingress cutover is complete.

---

### 2. Problem Statement (Required)

**Why is this needed?**  
Current editing sessions risk losing provisional transaction state after crash/restart. Undo/redo is not yet guaranteed as deterministic, structural, and crash-resilient across process restarts.

We need a predictable human-first editing model where:

- successful commands create deterministic structural history
- undo/redo restores structure (not semantic replay)
- open transaction state can be recovered after crash
- all history is strictly transaction-scoped and removed on commit/rollback

---

### 3. Dependencies (Required)

**Does this depend on other issues or features?**  
- Can proceed in parallel with MAP Commands Phases 2.1, 2.2, and 3.
- Final production trigger integration depends on Phase 3 single command ingress cutover (`dispatch_map_command -> Runtime::dispatch`).
- Interim implementation may mock or adapter-trigger snapshot policy until Runtime descriptor pipeline is final.

Related references:
- MAP Commands Spec: https://memetic-activation-platform.github.io/map-dev-docs/core/commands-old/
- Commands Cheat Sheet: https://memetic-activation-platform.github.io/map-dev-docs/core/commands-cheat-sheet/

---

### 4. Proposed Solution (Required)

**How would you solve it?**

#### 4.1 Structural Snapshot Model (No Command Replay)

Persist and restore transaction graph state directly (staged/transient + history), not command input replay.

Rationale:

- avoids TemporaryId replay instability
- preserves provider openness (no replay determinism contract)
- aligns with existing structural undo model

#### 4.2 Snapshot Payloads (Versioned)

Use serialized, versioned payload envelopes stored as SQLite BLOBs:

```rust
// Example conceptual shape
struct TxGraphSnapshotV1 {
    tx_id: TxId,
    staged_holons: SerializableHolonPool,
    transient_holons: SerializableHolonPool,
    local_holon_space: Option<HolonReferenceWire>,
    // optional transaction metadata
}

struct UndoRedoCheckpointV1 {
    checkpoint_id: String,
    tx_id: TxId,
    snapshot: TxGraphSnapshotV1,
    created_at_ms: i64,
    command_name: Option<String>,
    disable_undo: bool,
}

struct RecoveryEnvelopeV1 {
    tx_id: TxId,
    undo_stack: Vec<String>, // checkpoint_ids
    redo_stack: Vec<String>, // checkpoint_ids
    markers: Vec<UndoMarkerV1>,
    latest_checkpoint_id: Option<String>,
    updated_at_ms: i64,
}
```

#### 4.3 Blob Generation Path (Normative)

Generate blobs from existing wire serializers, not custom SQL projection logic:

1. Export runtime state from `TransactionContext`:
   - `export_staged_holons()`
   - `export_transient_holons()`
2. Convert to serializable wire pools:
   - `SerializableHolonPool::from(&HolonPool)`
   - via `HolonWire` / `TransientHolonWire` / `StagedHolonWire`
3. Build versioned snapshot/envelope structs
4. Serialize to bytes (`serde_json` initially; optionally MessagePack later)
5. Persist in SQLite within one atomic DB transaction

Restore is the inverse:

1. Read BLOBs
2. Deserialize envelopes
3. Rebind pools via `SerializableHolonPool.bind(&context)`
4. Import with `import_staged_holons(...)` / `import_transient_holons(...)`

#### 4.4 SQLite Schema (Initial)

```sql
CREATE TABLE recovery_session (
  tx_id                 INTEGER PRIMARY KEY,
  space_id              TEXT,
  lifecycle_state       TEXT NOT NULL,           -- expected Open while recoverable
  latest_checkpoint_id  TEXT,
  envelope_blob         BLOB NOT NULL,           -- RecoveryEnvelopeV1
  format_version        INTEGER NOT NULL DEFAULT 1,
  updated_at_ms         INTEGER NOT NULL
);

CREATE TABLE recovery_checkpoint (
  checkpoint_id         TEXT PRIMARY KEY,
  tx_id                 INTEGER NOT NULL,
  stack_kind            TEXT NOT NULL,           -- 'undo' | 'redo'
  stack_pos             INTEGER NOT NULL,        -- 0..N per stack
  snapshot_blob         BLOB NOT NULL,           -- UndoRedoCheckpointV1 / TxGraphSnapshotV1
  snapshot_hash         TEXT,                    -- optional integrity/debug
  created_at_ms         INTEGER NOT NULL,
  FOREIGN KEY (tx_id) REFERENCES recovery_session(tx_id) ON DELETE CASCADE
);

CREATE UNIQUE INDEX idx_checkpoint_stack_pos
  ON recovery_checkpoint(tx_id, stack_kind, stack_pos);

CREATE INDEX idx_checkpoint_tx_created
  ON recovery_checkpoint(tx_id, created_at_ms);
```

#### 4.5 Write/Restore Semantics

- One SQLite transaction per successful command post-processing.
- If `snapshot_after == true`: persist new checkpoint + updated envelope atomically.
- If `disable_undo == true`: skip undo checkpoint creation, but persist latest recoverable transaction state when configured.
- Clear redo stack on any successful new undoable command.
- Persist only fully consistent state; never partial command state.
- On startup, restore only most recent consistent snapshot.

#### 4.6 Lifecycle Cleanup

On `commit` or `rollback`:

- destroy in-memory undo/redo history
- delete persisted recovery rows (`recovery_session` + `recovery_checkpoint`) for tx
- no history survives transaction boundary

---

### 5. Scope and Impact (Required)

**What does this impact?**  
Impacts:

- IntegrationHub transaction editing lifecycle
- transaction state persistence layer (SQLite)
- undo/redo behavior for experience-level editing
- command execution post-success checkpoint pipeline

Does not impact:

- post-commit/domain-level compensating undo semantics
- trust-channel compensations/inter-agent reversal
- cross-transaction undo history
- DHT-visible persistence behavior

---

### 6. Testing Considerations (Required)

**How will this enhancement be tested?**

- Can it be validated with existing test cases?
  - Partially (existing transaction/lifecycle tests remain relevant).
- Do new test cases need to be created?
  - Yes:
    - undo/redo stack semantics (LIFO, redo clearing, empty-stack failures)
    - `disable_undo` behavior
    - `snapshot_after`-driven persistence behavior
    - blob serialization/deserialization roundtrip tests
    - crash recovery restore from SQLite snapshot
    - consistency guarantees (no partial state persisted)
    - cleanup on commit/rollback
- Are there specific areas in the test ecosystem impacted by this enhancement?
  - host runtime transaction tests
  - integration tests simulating crash/restart
  - regression tests across loader/bulk operations with undo disabled

---

### 7. Definition of Done (Required)

**When is this enhancement complete?**

- [ ] Structural undo/redo stacks implemented per open transaction
- [ ] Checkpoint creation occurs only after successful command completion
- [ ] Redo stack clears on successful new undoable command
- [ ] `disable_undo` metadata behavior implemented
- [ ] `snapshot_after` policy hook implemented (mocked trigger acceptable until Phase 3 cutover)
- [ ] SQLite schema implemented (`recovery_session`, `recovery_checkpoint`, indexes)
- [ ] Snapshot blobs generated from wire serializer path (export -> wire -> serialize)
- [ ] Crash/restart restores consistent transaction state + stacks
- [ ] Commit/rollback destroys in-memory history and deletes persisted recovery snapshot
- [ ] Tests cover stack semantics, blob roundtrip, recovery, lifecycle cleanup, and policy triggers

---

<details>
<summary>Optional Details (Expand if needed)</summary>

### 8. Alternatives Considered

**What other solutions did you think about?**  
- Delta/replay-based undo: rejected for v0 due to TemporaryId fragility and provider determinism burden.
- Command-specific semantic undo handlers: rejected for v0 to preserve openness and lower integration burden.
- Cross-transaction persistent history: rejected as out-of-scope and semantically risky.

### 9. Risks or Concerns

**What could go wrong?**  
- Snapshot size/performance during high-frequency editing
- Drift between mocked trigger and final Runtime descriptor trigger
- Inconsistent recovery if persistence writes are not atomic

Mitigations:

- enforce DB atomicity and “latest consistent only” restore rule
- keep trigger integration seam explicit for Phase 3 handoff
- allow `disable_undo` for bulk/regenerable operations
- optimize encoding/compaction only after measurement

### 10. Additional Context

**Any supporting material?**  
Based on: *MAP Core — Structural Pre-Commit Transaction Editing Model / Structural Experience-Level Undo / Redo Specification*.

Parallelization intent:

- implement undo/recovery persistence engine now
- integrate final command-trigger wiring when Phase 3 single-ingress Runtime path is active

</details>


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Crash-Resilient Structural Undo/Redo for Open Transactions #412

1. Summary (Required)

2. Problem Statement (Required)

3. Dependencies (Required)

4. Proposed Solution (Required)

4.1 Structural Snapshot Model (No Command Replay)

4.2 Snapshot Payloads (Versioned)

4.3 Blob Generation Path (Normative)

4.4 SQLite Schema (Initial)

4.5 Write/Restore Semantics

4.6 Lifecycle Cleanup

5. Scope and Impact (Required)

6. Testing Considerations (Required)

7. Definition of Done (Required)

8. Alternatives Considered

9. Risks or Concerns

10. Additional Context

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Crash-Resilient Structural Undo/Redo for Open Transactions #412

Description

1. Summary (Required)

2. Problem Statement (Required)

3. Dependencies (Required)

4. Proposed Solution (Required)

4.1 Structural Snapshot Model (No Command Replay)

4.2 Snapshot Payloads (Versioned)

4.3 Blob Generation Path (Normative)

4.4 SQLite Schema (Initial)

4.5 Write/Restore Semantics

4.6 Lifecycle Cleanup

5. Scope and Impact (Required)

6. Testing Considerations (Required)

7. Definition of Done (Required)

8. Alternatives Considered

9. Risks or Concerns

10. Additional Context

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions