Skip to content

Swarm supervisor loop with sequential merge queue #562

@acreeger

Description

@acreeger

Summary

Implement the core swarm supervisor: a long-running foreground process that orchestrates multiple headless agents working on an epic's child issues, using Beads for DAG resolution and managing a sequential merge queue to prevent race conditions.

Context

Part of the Autonomous Swarm Mode epic (#557). This is the heart of the swarm feature — the supervisor loop that drives the entire autonomous execution lifecycle.

Scope

Supervisor Class

Create src/lib/SwarmSupervisor.ts:

class SwarmSupervisor {
  constructor(
    beadsManager: BeadsManager,
    loomManager: LoomManager,
    issueTracker: IssueTracker,
    settings: SwarmSettings
  )

  async run(epicLoom: EpicLoomContext): Promise<SwarmResult>
}

Core Loop

1. Init Beads (if not already)
2. Sync epic children + deps → Beads
3. Loop:
   a. Query `bd ready` for unblocked tasks
   b. Claim up to `maxConcurrent - activeAgents.length` tasks
   c. For each claimed task:
      - Create child loom (minimal worktree-only path)
      - Spawn `il spin -p` as child process with swarm env vars
      - Track PID, issue number, log file path
   d. Monitor running processes (poll exit codes)
   e. On agent exit (success):
      - Enqueue PR for merge
      - Process merge queue sequentially:
        - Merge PR into epic branch via GitHub API
        - Verify merge succeeded
        - Close issue via API
        - `bd close` the task
   f. On agent exit (failure):
      - Log error, delegate to failure handler (see #563)
   g. When `bd ready` returns empty AND no agents running → complete
4. Return SwarmResult with aggregate stats

Process Management

  • Spawn il spin -p via execa in the child loom's worktree directory
  • Set environment: ILOOM_SWARM_MODE=1, ILOOM_EPIC_BRANCH=<branch>, ILOOM_EPIC_ISSUE=<number>
  • Redirect stdout/stderr to per-agent log files: <epic-loom-dir>/agent-logs/<issue-number>.log
  • Track PIDs in a file: <epic-loom-dir>/swarm-pids.json (for zombie cleanup)
  • Use process.on('exit', ...) to track child process completion

Sequential Merge Queue

  • Maintain an ordered queue of PRs ready to merge
  • Process one at a time to prevent race conditions on the epic branch
  • For each PR in queue:
    1. Merge via gh pr merge <number> --merge (or GitHub API)
    2. If merge fails (conflict) → delegate to conflict resolver (see Wire swarm into il start entry point #564)
    3. If merge succeeds → close issue, bd close, remove from queue

Graceful Shutdown

  • SIGINT (Ctrl+C): stop claiming new tasks, let running agents finish, merge their results
  • SIGTERM: same as SIGINT
  • Log "Shutting down gracefully. Waiting for N running agents to complete..."

Basic Logging

  • Log to terminal: agent started/completed/failed, merge status, DAG progress
  • Use existing logger patterns from the codebase

Acceptance Criteria

  • Supervisor claims and spawns agents up to maxConcurrent limit
  • Agents run as headless child processes with proper env vars
  • Merge queue processes PRs sequentially
  • Successful merges close the issue and update Beads
  • Graceful shutdown on SIGINT/SIGTERM
  • Per-agent log files capture stdout/stderr
  • PID tracking file maintained for zombie detection
  • Happy path end-to-end: all tasks complete, all PRs merged
  • Unit tests with mocked BeadsManager, LoomManager, and process spawning
  • Integration test with a simple 2-task dependency chain

Scope Boundaries

Dependencies

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions