Batch discovery pipeline by dimavrem22 · Pull Request #202 · VectorlyApp/bluebox

dimavrem22 · 2026-02-24T18:05:32Z

Agent Refactor

Collapsed AbstractSpecialist into AbstractAgent — all specialists now inherit directly from AbstractAgent, eliminating the intermediate layer
Autonomous loop (run_autonomous, finalize tools, iteration tracking, urgency notices, output schema injection) lives entirely in AbstractAgent
@agent_tool gained persist (NEVER/ALWAYS/OVERFLOW), max_characters, and token_optimized parameters:
- Token optimization: token_optimized=True encodes results via toon encoder for cheaper token usage
- Workspace persistence: persist=ALWAYS saves every tool result as a workspace artifact
- Character overflow management: persist=OVERFLOW auto-saves results exceeding max_characters to raw/ as artifacts, returning an 800-char preview + artifact ID pointer to the LLM instead of blowing up context
All agents can be attached to a workspace and execute Python unless explicitly disabled
Each concrete subclass must declare an AGENT_CARD (enforced by __init_subclass__)
_collect_tools is now @lru_cache per subclass to avoid repeated dir(cls) traversal on every LLM call
Auto-wrap fix: if the LLM passes finalize output fields as top-level kwargs instead of nested under "output", the base class rewraps automatically

Workspace Refactor

Workspace is now an artifact-oriented system with a strict directory layout
Allows mounting external files in read-only mode via hardlinks (no copying of large capture files)
Each workspace has these directories:
- raw/ (read-only): tool result artifacts and mounted external files
- output/: agent-generated deliverables (written by tools, read by humans)
- context/: reusable notes/context saved for later use in the same run
- meta/: system-managed metadata (manifest.jsonl, input_mounts.jsonl) — not editable
- scratch/: ephemeral scratch space
save_artifact() is the core write API — records provenance in meta/manifest.jsonl with SHA-256, size, content type, timestamp, and optional tool/code-run metadata
Snapshot/diff primitives (snapshot_paths, diff_snapshot) for tracking which output files changed during a tool call

API Indexing Pipeline

Overview

End-to-end pipeline that turns raw CDP captures into a catalog of executable routines, fully autonomous.

Phase 1 — Exploration (4 specialists run in parallel via ThreadPoolExecutor):

NetworkSpecialist → NetworkExplorationSummary
ValueTraceResolverSpecialist → StorageExplorationSummary
DOMSpecialist → DOMExplorationSummary
InteractionSpecialist → UIExplorationSummary

Each filters thousands of raw events down to the 5–15 endpoints/tokens/forms that actually matter.

Phase 2 — Routine Construction (PI orchestrator loop):

PrincipalInvestigator reads all 4 exploration summaries, plans a routine catalog, dispatches experiments to concurrent ExperimentWorker agents
Workers have live browser tools + recorded capture lookup tools, execute experiments, report structured findings
PI reviews results, accumulates proven artifacts, assembles routines, and submits to RoutineInspector for quality gating
Routines that pass inspection ship; those that fail get iterated on
Full incremental persistence: every experiment, attempt, routine, and agent thread is written to disk as it happens
PI crash recovery: if the PI dies (context exhaustion, API error), a fresh PI is constructed from the persisted DiscoveryLedger and continues where the previous one left off (up to 3 attempts)

New Agents

PrincipalInvestigator: orchestrator with no browser access — plans routines, dispatches workers, reviews results, assembles and ships routines
ExperimentWorker: browser-capable execution agent with live browser_* tools (navigate, eval JS, raw CDP) and recorded capture lookup tools — executes experiments, does NOT make strategic decisions
RoutineInspector: independent quality gate — scores routines on 6 dimensions (task completion, data quality, parameter coverage, robustness, structural correctness, documentation), hard-fails on 4xx/5xx responses or unresolved placeholders

How to Run

bluebox-api-index \
  --cdp-captures-dir ./cdp_captures \
  --task "Recover and validate routines from this captured session. Get all routines that deliver useful data to the user!" \
  --output-dir ./api_indexing_output \
  --model gpt-5.2 \
  --post-run-analysis

Other Changes

DOM Data Loader: new DOMDataLoader for dom/events.jsonl — parses full DOM string-interning tables, classifies elements by tag family
Code Execution Sandbox: added Lambda backend (BLUEBOX_SANDBOX_MODE=lambda), auto mode (Lambda > Docker > blocklist), read_only_paths support for workspace safety, expanded blocked-module workaround hints
New data models: DiscoveryLedger, ExperimentEntry, RoutineSpec, RoutineAttempt, RoutineCatalog, RoutineInspectionResult in orchestration/; exploration summaries in api_indexing/
Agent docs: runtime-searchable markdown docs (agent_docs/) for auth token resolution, naming conventions, CORS workarounds
Deleted AbstractSpecialist, RoutineDiscoveryAgentBeta, and all old docs/ planning files

Post-Review Edits

Merged `run_python_code` into `execute_python`

BlueBoxAgent exposed two Python execution tools to the LLM (execute_python from AbstractAgent and run_python_code defined locally), which was confusing. Merged them:

Moved file-tracking logic (workspace snapshot/diff) from BlueBoxAgent._run_python_code into the base AbstractAgent._execute_python — all workspace-backed agents now get file-tracking automatically
Deleted _run_python_code and its imports from BlueBoxAgent
Updated BlueBoxAgent.SYSTEM_PROMPT to reference execute_python everywhere
Updated tests in test_blocklist_hints.py to use execute_python

Workspace file-tracking covers entire workspace

Previously execute_python only snapshotted output/ — files written to context/ or scratch/ via Python code were never diffed or uploaded to S3. Fixed by snapshotting the entire workspace root before and after execution, so every file change is captured regardless of directory.

Removed `_after_chat_added` wrapper

Removed the _after_chat_added method from AbstractAgent that silently swallowed all exceptions and ignored the Chat argument. The on_chat_added callback is now called directly in _add_chat with the Chat object passed through. Updated PI's lambda wiring to accept the _chat param.

Create DOMDataLoader for parsing CDP DOMSnapshot.captureSnapshot data with element extraction (forms, inputs, buttons, links, tables, headings, clickable), plus new methods for meta tags, script tags (with inline content for __NEXT_DATA__ etc.), and hidden inputs for token/key discovery. Add DOMSpecialist agent with 15 tools wrapping the loader, TUI run script, HTTP adapter integration, and exploration scripts for storage/network/UI domains. Includes 89 unit tests and API indexing data models for the exploration phase. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ation - Rename UISpecialist → InteractionSpecialist (DOM tools optional, interactions primary) - Delete old InteractionSpecialist (subset of new one) - Clean run_dom_exploration.py to DOM-only (no interaction auto-upgrade) - Create run_ui_exploration.py using InteractionSpecialist for user intent - Add UIExplorationSummary model, remove user_inputs/inferred_intent from DOMExplorationSummary - Update InterestLevel enum in network exploration prompt - Add exploration_output/ with all 4 Premier League capture results - Add planning docs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…on dam tokens????

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ements - Fix pipeline diagram to show workers report to PI, not directly to inspector - Clarify iteration definition (one LLM API call, not one tool call) - Remove fabricated anti-bot bullet from DOM exploration; add to spec_v2 as improvement - Clarify PI-side quality gates are pure Python static checks (no LLM) - Clarify auth-first ordering is prompt-only, not enforced in code - Fix UI exploration data source: InteractionSpecialist also gets DOM loader - Add --max-pi-attempts CLI flag (was hardcoded MAX_PI_ATTEMPTS = 3) - Add docs/api_indexing_spec_v2/potential_improvements.md with 3 improvements: 1. WindowProperty exploration specialist 2. True pipeline resumability and agent thread replay 3. Anti-bot detection as first-class exploration output Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…rkers only Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ts improvement Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…provement (#5) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…orrect test params Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…, tool observability, output schemas, PI execution visibility Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

All system prompts from every agent in the API indexing pipeline in one file for easy auditing: exploration specialists, PI, worker, inspector, plus dynamic sections and schemas. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…dd impact/effort ratings Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…oilerplate DRY up the ensure-browser / timeout / error envelope duplicated across all browser tools in ExperimentWorker and related specialists. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

bluebox/workspace/local_workspace.py

bluebox/agents/workspace.py

bluebox/agents/principal_investigator.py

bluebox/agents/abstract_agent.py

bluebox/agents/workers/experiment_worker.py

…ompt Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

bluebox/data_models/orchestration/ledger.py

bluebox/agents/specialists/interaction_specialist.py

…g pipeline Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

BlueBoxAgent exposed two Python execution tools to the LLM, which was confusing. Move the file-tracking logic (output/ snapshot/diff) from BlueBoxAgent._run_python_code into the base AbstractAgent._execute_python so all workspace-backed agents benefit and BlueBoxAgent only exposes one Python tool. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

bluebox/agents/abstract_agent.py

…tput/ Files written to context/ or scratch/ via execute_python were never snapshotted or diffed, so S3Workspace never uploaded them. Add WRITABLE_ROOTS class attribute to AgentWorkspace ABC and use it in _execute_python so all writable directories are tracked consistently. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Remove the _after_chat_added method that silently swallowed all exceptions and ignored the Chat argument. The on_chat_added callback is now called directly in _add_chat with the Chat object passed through. Also fix the callback type from Callable[[], None] to Callable[[Chat], None]. Also add WRITABLE_ROOTS to AgentWorkspace ABC so the agent doesn't hardcode which directories are writable. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Remove WRITABLE_ROOTS and snapshot the whole workspace root so every file created or modified by execute_python gets diffed and uploaded. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

dimavrem22 force-pushed the batch-discovery-pipeline branch 2 times, most recently from e05deca to 0aed6ce Compare February 26, 2026 20:33

dimavrem22 and others added 28 commits February 26, 2026 22:57

docs

02e5210

rm some file

9f8d0dd

wild and crazy orchestation vibe coded from top to bottom!

24ac6af

is the the first successful run i seee?? even if it costs a a gazilli…

2be9b97

…on dam tokens????

we got auth to work! lets goooo

5705442

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

extra errors added to context

57de81a

checkpoint

9700b47

clarify PI data loader access: holds loaders but passes through to wo…

4be4bda

…rkers only Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

clarify PI context: agent docs are on-demand file reads, not pre-loaded

e0162d8

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fix worker context docs: no exploration summaries; add proven artifac…

df471b2

…ts improvement Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

docs chpt

75434a9

add When column to PI context table; add deferred routine planning im…

9cade36

…provement (#5) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

add screenshot + OCR improvements (#6) to spec_v2 potential improvements

9705851

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix RoutineInspector context table: add spec comparison, doc tools, c…

1d17972

…orrect test params Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

add potential improvements #7-13: inspector context, proven artifacts…

f33363d

…, tool observability, output schemas, PI execution visibility Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

add cleaned potential improvements: deduplicate, organize by theme, a…

02d589e

…dd impact/effort ratings Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

workspace, abstract agent, refactored. beta discovery removed

4f71c64

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

stricter-code-sandboxing

904363b

network-specailists uses new workspace and tools

5e01885

mounting data for specialists done

1b97d16

fixed bluebox and made agent_workspace

44b7c46

i think the massive agent refactor is now done!!!

7aee2e5

pipeline looks to start working?

e7f30bf

we re soooaring! flyyyyinggit add .

341bf9e

dimavrem22 and others added 5 commits February 27, 2026 00:19

refactor: extract _browser_execute helper to eliminate browser tool b…

e8a9965

…oilerplate DRY up the ensure-browser / timeout / error envelope duplicated across all browser tools in ExperimentWorker and related specialists. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

docs finishing

c49c0b5

docs

67002c3

new diagrams

78c1631

new diagrams

c38753b

dimavrem22 marked this pull request as ready for review February 27, 2026 06:50

dimavrem22 requested review from alex-w-99 and rayruizhiliao as code owners February 27, 2026 06:50

alex-w-99 reviewed Feb 27, 2026

View reviewed changes

bluebox/workspace/local_workspace.py Show resolved Hide resolved

alex-w-99 reviewed Feb 27, 2026

View reviewed changes

bluebox/agents/workspace.py Outdated Show resolved Hide resolved

alex-w-99 reviewed Feb 27, 2026

View reviewed changes

bluebox/agents/principal_investigator.py Show resolved Hide resolved

alex-w-99 reviewed Feb 27, 2026

View reviewed changes

bluebox/agents/abstract_agent.py Outdated Show resolved Hide resolved

abs agent cleanup, workspace cleanup, bluebox cleanup, aliases removed

79bffea

alex-w-99 reviewed Feb 27, 2026

View reviewed changes

bluebox/agents/workers/experiment_worker.py Outdated Show resolved Hide resolved

fix: remove references to non-existent follow_up tool in PI system pr…

3bad643

…ompt Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

alex-w-99 reviewed Feb 27, 2026

View reviewed changes

bluebox/data_models/orchestration/ledger.py Show resolved Hide resolved

alex-w-99 reviewed Feb 27, 2026

View reviewed changes

bluebox/agents/specialists/interaction_specialist.py Show resolved Hide resolved

more cleanup

061a419

This comment was marked as resolved.

Sign in to view

dimavrem22 and others added 3 commits February 27, 2026 11:35

docs: update CLAUDE.md for agent refactor, workspace, and API indexin…

34a64a3

…g pipeline Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

uni tests

932ace4

anthropic models for api indexing

ad39e52

alex-w-99 approved these changes Feb 27, 2026

View reviewed changes

alex-w-99 reviewed Feb 27, 2026

View reviewed changes

bluebox/agents/abstract_agent.py Outdated Show resolved Hide resolved

dimavrem22 and others added 4 commits February 27, 2026 14:25

fix: snapshot entire workspace for file-tracking, not just writable dirs

5a30a5d

Remove WRITABLE_ROOTS and snapshot the whole workspace root so every file created or modified by execute_python gets diffed and uploaded. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

claude out

51cbdd9

dimavrem22 merged commit 130e9fb into main Feb 27, 2026
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batch discovery pipeline#202

Batch discovery pipeline#202
dimavrem22 merged 46 commits intomainfrom
batch-discovery-pipeline

dimavrem22 commented Feb 24, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment was marked as resolved.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dimavrem22 commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Agent Refactor

Workspace Refactor

API Indexing Pipeline

Overview

New Agents

How to Run

Other Changes

Post-Review Edits

Merged run_python_code into execute_python

Workspace file-tracking covers entire workspace

Removed _after_chat_added wrapper

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment was marked as resolved.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dimavrem22 commented Feb 24, 2026 •

edited

Loading

Merged `run_python_code` into `execute_python`

Removed `_after_chat_added` wrapper