Skip to content

Sub-orchestration losers in select2/select races should be cancelled #55

@affandar

Description

@affandar

Problem

When a sub-orchestration loses a select2() or select() race, it is not cancelled and continues running as an orphan. This is inconsistent with the expected behavior and leads to resource leaks.

Current Behavior

In src/futures.rs#L818-L840, when select2() resolves:

  1. All losers have their source_event_id added to cancelled_source_ids (so their completion doesn't block FIFO ordering)
  2. Only activities get provider-side cancellation:
// If the loser is an activity, request provider-side cancellation.
// Timers/external/sub-orchestrations don't have worker-queue entries.
if matches!(&child.kind, Kind::Activity { .. }) {
    inner.cancelled_activity_ids.insert(source_id);
}

Impact

Aspect Current Behavior
Completion skipped ✅ Loser's completion is skipped in FIFO ordering
Child continues running ⚠️ Sub-orchestration keeps running as orphan
No cancellation signal ⚠️ No CancelInstance sent to child
Storage cleanup ⚠️ Child instance remains in database forever

Example

let child = ctx.schedule_sub_orchestration("SlowChild", input);
let timeout = ctx.schedule_timer(Duration::from_secs(30)).into_timer();

// If timer wins, SlowChild continues running indefinitely!
let (winner, _) = ctx.select2(child, timeout).await;

Expected Behavior

When a sub-orchestration loses a select2()/select() race, the runtime should automatically send a CancelInstance work item to cancel the child orchestration, similar to how activities are cancelled via lock stealing.

Proposed Solution

  1. Track sub-orchestration losers similar to cancelled_activity_ids
  2. In execution.rs, generate WorkItem::CancelInstance for sub-orchestration losers
  3. Enqueue these cancellation items to the orchestrator queue

Related

  • Activity cancellation already works via cancelled_activity_ids and provider lock stealing
  • Cascading cancellation already exists when a parent is explicitly cancelled (via client.cancel_instance())
  • See proposals/auto-pruning.md which documents this as a known limitation

Acceptance Criteria

  • Sub-orchestration losers in select2()/select() are automatically cancelled
  • Cancellation cascades to grandchildren
  • Add test: select2 with sub-orchestration vs timer, verify child is cancelled
  • Update documentation to reflect automatic cancellation behavior

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions