-
Notifications
You must be signed in to change notification settings - Fork 2
Open
Description
Problem
When a sub-orchestration loses a select2() or select() race, it is not cancelled and continues running as an orphan. This is inconsistent with the expected behavior and leads to resource leaks.
Current Behavior
In src/futures.rs#L818-L840, when select2() resolves:
- All losers have their
source_event_idadded tocancelled_source_ids(so their completion doesn't block FIFO ordering) - Only activities get provider-side cancellation:
// If the loser is an activity, request provider-side cancellation.
// Timers/external/sub-orchestrations don't have worker-queue entries.
if matches!(&child.kind, Kind::Activity { .. }) {
inner.cancelled_activity_ids.insert(source_id);
}Impact
| Aspect | Current Behavior |
|---|---|
| Completion skipped | ✅ Loser's completion is skipped in FIFO ordering |
| Child continues running | |
| No cancellation signal | CancelInstance sent to child |
| Storage cleanup |
Example
let child = ctx.schedule_sub_orchestration("SlowChild", input);
let timeout = ctx.schedule_timer(Duration::from_secs(30)).into_timer();
// If timer wins, SlowChild continues running indefinitely!
let (winner, _) = ctx.select2(child, timeout).await;Expected Behavior
When a sub-orchestration loses a select2()/select() race, the runtime should automatically send a CancelInstance work item to cancel the child orchestration, similar to how activities are cancelled via lock stealing.
Proposed Solution
- Track sub-orchestration losers similar to
cancelled_activity_ids - In
execution.rs, generateWorkItem::CancelInstancefor sub-orchestration losers - Enqueue these cancellation items to the orchestrator queue
Related
- Activity cancellation already works via
cancelled_activity_idsand provider lock stealing - Cascading cancellation already exists when a parent is explicitly cancelled (via
client.cancel_instance()) - See
proposals/auto-pruning.mdwhich documents this as a known limitation
Acceptance Criteria
- Sub-orchestration losers in
select2()/select()are automatically cancelled - Cancellation cascades to grandchildren
- Add test:
select2with sub-orchestration vs timer, verify child is cancelled - Update documentation to reflect automatic cancellation behavior
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels