Fix/parallel agent spawn stagger#216
Open
joernstu wants to merge 2 commits intoAutoForgeAI:masterfrom
Open
Conversation
…dition When running with concurrency > 1, multiple agents were spawned in rapid succession within the same loop iteration. All agents started nearly simultaneously and concurrently read/wrote ~/.claude.json during Claude SDK initialization, causing intermittent "JSON Parse error: Unexpected EOF" errors. Fix: introduce AGENT_SPAWN_STAGGER_SECS (1.5s) delay between consecutive agent spawns in both the coding batch loop and _maintain_testing_agents. The first spawn in each burst has zero added latency; only subsequent spawns in the same burst are staggered. - Add AGENT_SPAWN_STAGGER_SECS = 1.5 constant - Make _maintain_testing_agents async; add stagger between testing agents - Add stagger between coding batch spawns in the main run_loop - Update call site to await _maintain_testing_agents Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The previous fix only staggered spawns within the same agent type. The race condition also occurred when a testing agent and a coding agent were spawned in the same loop iteration with no delay between them. Replace the per-type index guards with a single _last_spawn_time float tracked on the orchestrator instance. A new _stagger_if_needed() async helper sleeps for the remaining time before each spawn, regardless of agent type. _last_spawn_time is updated immediately after every subprocess.Popen() call in all three spawn methods (_spawn_coding_agent, _spawn_coding_agent_batch, _spawn_testing_agent). This ensures at least AGENT_SPAWN_STAGGER_SECS (1.5s) between any two consecutive agent starts, closing the cross-type race window. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When running with concurrency > 1, agents started nearly simultaneously and caused intermittent JSON parse errors in ~/.claude.json during SDK initialization.
Two-part fix: