Skip to content

Rewrite relentless as zero-dependency Rust library#1

Open
sutyum wants to merge 4 commits intodevfrom
claude/rethink-approach-z1Dpj
Open

Rewrite relentless as zero-dependency Rust library#1
sutyum wants to merge 4 commits intodevfrom
claude/rethink-approach-z1Dpj

Conversation

@sutyum
Copy link
Member

@sutyum sutyum commented Feb 19, 2026

Summary

Complete rewrite of relentless from a Python framework to a zero-dependency Rust library for composing async tasks with automatic compensation on failure. The library is built on the saga pattern and designed for robotics workflows where actions need structured undo semantics.

Key Changes

  • Language & Architecture: Migrated from Python + Zenoh to pure Rust with tokio + async-trait as the only runtime dependencies

  • Core Abstraction: Introduced Step trait as the fundamental building block, with implementations for:

    • FnTask: Closure-based steps for inline task definition
    • Sequence: Linear execution with reverse-order compensation
    • Parallel: Concurrent execution with sibling compensation on failure
    • Guard: Conditional execution with optional fallback
    • Branch: Runtime selection among multiple steps
    • Loop: Repeated execution with condition checking
    • Locked: Resource coordination via shared mutex
  • Error Handling: Comprehensive Error enum with variants for task failures, timeouts, cancellation, and compensation tracking

  • Retry & Timeout: Configurable RetryPolicy with exponential/linear/Fibonacci backoff and per-step timeout support

  • Cancellation: CancellationToken for cooperative graceful shutdown with automatic compensation

  • Context & State: Context provides shared execution state, adapter I/O, hooks, and journaling

  • Adapter Pattern: Adapter trait for pluggable I/O backends; LocalAdapter for testing

  • Journaling: Journal trait for write-ahead logging and crash recovery

  • Hooks: Lifecycle callbacks (on_step_start, on_step_end, on_step_error, on_compensate)

  • Value System: Dynamic Value enum for type-flexible adapter communication

Notable Implementation Details

  • All steps are Send + Sync and use interior mutability (Arc<RwLock<…>>) to safely share context across parallel branches
  • Compensation is automatic and reverse-ordered; failed steps are not compensated
  • Error strategies (Compensate, Skip, Escalate) allow per-step failure discrimination
  • Comprehensive integration tests covering sequences, parallel execution, guards, branches, loops, cancellation, and crash recovery
  • Multiple examples demonstrating real-world patterns: pick-and-place, dual-arm assembly, inspection/sorting, palletizing loops, emergency stop, and crash recovery
  • Updated documentation (README, concept.md) reflects Rust API and core concepts

Migration Notes

This is a complete rewrite; the Python API is not preserved. Users should refer to the new Rust examples and documentation for usage patterns.

https://claude.ai/code/session_01RbxGJBcM18D1NXVUq3TZ1T

Problems with the previous approach:
- All documentation, zero implementation
- Hard-coupled to Zenoh (unusable without it, untestable)
- Inconsistent API (conflicting patterns, mismatched param names)
- Premature feature comparisons against Temporal.io/AWS Step Functions

New direction:
- Library, not framework (you call it, it doesn't restructure your code)
- Transport-agnostic core with Adapter protocol (Zenoh as optional extra)
- Testable by default via LocalAdapter (no hardware/network needed)
- Function-first API with @task decorator and @task.undo compensation
- Sequence composition with automatic reverse-order compensation
- Configurable retry policies (linear, exponential, fibonacci backoff)
- Working implementation with 25 passing tests

https://claude.ai/code/session_01RbxGJBcM18D1NXVUq3TZ1T
Examples that expose where the library breaks:
- dual_arm_assembly: needs parallel, nesting, loops
- bin_emptying: needs loops, conditionals, partial success
- force_insertion: needs guards, result passing, sensor waits
- palletizing: needs parallel, sequence timeout, cancellation
- inspection_sort: needs branching, error discrimination, hooks
- emergency_recovery: needs interrupts, compensation retry, human-in-loop

Critical gaps fixed (50 tests, all passing):
- Parallel: concurrent execution with coordinated compensation
- Nesting: Sequence usable as a step inside another Sequence
- Branch: route execution based on runtime selector
- Guard: precondition check before running a step
- Hooks: lifecycle callbacks (on_step_start/end/error, on_compensate)
- Compensation retry: undo_retry parameter for safety-critical tasks
- Step protocol: common interface for BoundTask/Parallel/Guard/Branch/Sequence

Gaps documented honestly in README Known Limitations table.

https://claude.ai/code/session_01RbxGJBcM18D1NXVUq3TZ1T
Replace the entire Python codebase with a zero-dependency Rust library.
All 8 previously-documented gaps are now first-class features:

- Loops: Loop combinator with condition + max_iterations safety cap
- Cancellation: CancellationToken with cooperative checking at step boundaries
- Sequence-level timeout: Sequence::timeout() using tokio::select!
- Persistence/crash recovery: Journal trait + MemoryJournal, skip completed steps
- Partial success: ErrorStrategy::Skip continues past non-critical failures
- Resource locking: ResourceLock + Locked wrapper with shared async mutex
- Error discrimination: ErrorStrategy enum (Compensate/Skip/Escalate) per step
- Sensor streaming: Adapter::subscribe() returns mpsc::Receiver<Value>

Architecture: Step trait with async-trait for Box<dyn Step> dynamic dispatch,
Context with Arc<RwLock<HashMap>> for safe parallel sharing, composable
combinators (Sequence, Parallel, Guard, Branch, Loop) that all impl Step.

27 integration tests, 6 examples covering realistic robotics scenarios.

https://claude.ai/code/session_01RbxGJBcM18D1NXVUq3TZ1T
Add 22 adversarial robotics stress tests modeled after CNC, surgical,
warehouse, autonomous vehicle, semiconductor, space probe, and other
real-world scenarios. These tests exposed 9 bugs in the saga engine:

Bugs fixed:
- Timeout/cancellation now compensate completed steps (via shared
  Arc<Mutex<Vec>> between execute and timeout/cancel branches)
- Loop self-compensates all completed iterations on body failure
  (saga pattern: composite steps own their internal compensation)
- Branch only compensates the taken path, not all branches
- Parallel cancels running siblings on first failure (AtomicBool flag)
- Retry respects ErrorStrategy (Escalate/Skip break immediately)
- Compensation runs in "compensation mode" — cancellation checks are
  skipped so adapter calls succeed during undo (new Context flag)
- Compensation has configurable per-step timeout (default 30s)
- Duplicate step names handled via count-based journal dedup
- Error::is_cancelled/is_timeout recurse through wrapper types

All 49 tests pass (27 integration + 22 stress), zero warnings.

https://claude.ai/code/session_01RbxGJBcM18D1NXVUq3TZ1T
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants