Skip to content

Fix/parallel agent spawn stagger#216

Open
joernstu wants to merge 2 commits intoAutoForgeAI:masterfrom
joernstu:fix/parallel-agent-spawn-stagger
Open

Fix/parallel agent spawn stagger#216
joernstu wants to merge 2 commits intoAutoForgeAI:masterfrom
joernstu:fix/parallel-agent-spawn-stagger

Conversation

@joernstu
Copy link

@joernstu joernstu commented Feb 27, 2026

When running with concurrency > 1, agents started nearly simultaneously and caused intermittent JSON parse errors in ~/.claude.json during SDK initialization.

Two-part fix:

  • Introduce AGENT_SPAWN_STAGGER_SECS (1.5s) between consecutive spawns
  • Extend stagger to cover cross-type races (coding vs testing agents)

joernstu and others added 2 commits February 27, 2026 10:31
…dition

When running with concurrency > 1, multiple agents were spawned in rapid
succession within the same loop iteration. All agents started nearly
simultaneously and concurrently read/wrote ~/.claude.json during Claude
SDK initialization, causing intermittent "JSON Parse error: Unexpected EOF"
errors.

Fix: introduce AGENT_SPAWN_STAGGER_SECS (1.5s) delay between consecutive
agent spawns in both the coding batch loop and _maintain_testing_agents.
The first spawn in each burst has zero added latency; only subsequent
spawns in the same burst are staggered.

- Add AGENT_SPAWN_STAGGER_SECS = 1.5 constant
- Make _maintain_testing_agents async; add stagger between testing agents
- Add stagger between coding batch spawns in the main run_loop
- Update call site to await _maintain_testing_agents

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The previous fix only staggered spawns within the same agent type.
The race condition also occurred when a testing agent and a coding agent
were spawned in the same loop iteration with no delay between them.

Replace the per-type index guards with a single _last_spawn_time float
tracked on the orchestrator instance. A new _stagger_if_needed() async
helper sleeps for the remaining time before each spawn, regardless of
agent type. _last_spawn_time is updated immediately after every
subprocess.Popen() call in all three spawn methods (_spawn_coding_agent,
_spawn_coding_agent_batch, _spawn_testing_agent).

This ensures at least AGENT_SPAWN_STAGGER_SECS (1.5s) between any two
consecutive agent starts, closing the cross-type race window.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant