fix: propagate error in ArrayNode output gathering #6860
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR fixes a silent error propagation bug in the ArrayNode worker output-gathering path (
flytepropeller/pkg/controller/nodes/array/worker.go).Due to a hardcoded
nilbeing returned on the response channel, errors encountered while reading sub-node outputs were silently discarded, leading to workflows completing successfully with missing or corrupted outputs.This is a one-character fix that restores correct error propagation and aligns the behavior with the existing node execution handling logic.
Why are the changes needed?
ArrayNodes (map tasks) rely on
gatherOutputsto read outputs from all sub-executions.Under normal production conditions (object store latency, transient network failures, rate limiting), output reads can legitimately fail.
However, the current implementation captures the error internally but never returns it to the caller, causing:
nilliteralsThis violates Flyte’s execution correctness guarantees.
What changes were proposed in this pull request?
Root issue
In
worker.run(), thegatherOutputsRequesthandler correctly setserrwhen failures occur, but always sendsnilon the response channel.Fix
Return the actual captured error instead of hardcoding
nil:This mirrors the already-correct behavior used for
nodeExecutionRequestearlier in the same worker loop and ensures consistent error handling.Steps to reproduce (realistic)
ArrayNodePhaseSucceedingphaseAfter fix: workflow fails with a clear, actionable error
Impact
Before
After
This affects all Flyte deployments using ArrayNodes, which is a core execution primitive.
How was this patch tested?
nodeExecutionRequesterror handling logicworkerErrorCollectorcorrectly captures and reports the error after the fixNo new tests were added due to the minimal and localized nature of the change, but the behavior now matches the established and already-tested execution path.
Labels
Checklist