Merged
Conversation
e05deca to
0aed6ce
Compare
Create DOMDataLoader for parsing CDP DOMSnapshot.captureSnapshot data with element extraction (forms, inputs, buttons, links, tables, headings, clickable), plus new methods for meta tags, script tags (with inline content for __NEXT_DATA__ etc.), and hidden inputs for token/key discovery. Add DOMSpecialist agent with 15 tools wrapping the loader, TUI run script, HTTP adapter integration, and exploration scripts for storage/network/UI domains. Includes 89 unit tests and API indexing data models for the exploration phase. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ation - Rename UISpecialist → InteractionSpecialist (DOM tools optional, interactions primary) - Delete old InteractionSpecialist (subset of new one) - Clean run_dom_exploration.py to DOM-only (no interaction auto-upgrade) - Create run_ui_exploration.py using InteractionSpecialist for user intent - Add UIExplorationSummary model, remove user_inputs/inferred_intent from DOMExplorationSummary - Update InterestLevel enum in network exploration prompt - Add exploration_output/ with all 4 Premier League capture results - Add planning docs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…on dam tokens????
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ements - Fix pipeline diagram to show workers report to PI, not directly to inspector - Clarify iteration definition (one LLM API call, not one tool call) - Remove fabricated anti-bot bullet from DOM exploration; add to spec_v2 as improvement - Clarify PI-side quality gates are pure Python static checks (no LLM) - Clarify auth-first ordering is prompt-only, not enforced in code - Fix UI exploration data source: InteractionSpecialist also gets DOM loader - Add --max-pi-attempts CLI flag (was hardcoded MAX_PI_ATTEMPTS = 3) - Add docs/api_indexing_spec_v2/potential_improvements.md with 3 improvements: 1. WindowProperty exploration specialist 2. True pipeline resumability and agent thread replay 3. Anti-bot detection as first-class exploration output Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…rkers only Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ts improvement Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…provement (#5) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…orrect test params Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…, tool observability, output schemas, PI execution visibility Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
All system prompts from every agent in the API indexing pipeline in one file for easy auditing: exploration specialists, PI, worker, inspector, plus dynamic sections and schemas. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…dd impact/effort ratings Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…oilerplate DRY up the ensure-browser / timeout / error envelope duplicated across all browser tools in ExperimentWorker and related specialists. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
alex-w-99
reviewed
Feb 27, 2026
alex-w-99
reviewed
Feb 27, 2026
alex-w-99
reviewed
Feb 27, 2026
alex-w-99
reviewed
Feb 27, 2026
alex-w-99
reviewed
Feb 27, 2026
…ompt Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
alex-w-99
reviewed
Feb 27, 2026
alex-w-99
reviewed
Feb 27, 2026
This comment was marked as resolved.
This comment was marked as resolved.
…g pipeline Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
alex-w-99
approved these changes
Feb 27, 2026
BlueBoxAgent exposed two Python execution tools to the LLM, which was confusing. Move the file-tracking logic (output/ snapshot/diff) from BlueBoxAgent._run_python_code into the base AbstractAgent._execute_python so all workspace-backed agents benefit and BlueBoxAgent only exposes one Python tool. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
alex-w-99
reviewed
Feb 27, 2026
…tput/ Files written to context/ or scratch/ via execute_python were never snapshotted or diffed, so S3Workspace never uploaded them. Add WRITABLE_ROOTS class attribute to AgentWorkspace ABC and use it in _execute_python so all writable directories are tracked consistently. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove the _after_chat_added method that silently swallowed all exceptions and ignored the Chat argument. The on_chat_added callback is now called directly in _add_chat with the Chat object passed through. Also fix the callback type from Callable[[], None] to Callable[[Chat], None]. Also add WRITABLE_ROOTS to AgentWorkspace ABC so the agent doesn't hardcode which directories are writable. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove WRITABLE_ROOTS and snapshot the whole workspace root so every file created or modified by execute_python gets diffed and uploaded. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Agent Refactor
AbstractSpecialistintoAbstractAgent— all specialists now inherit directly fromAbstractAgent, eliminating the intermediate layerrun_autonomous, finalize tools, iteration tracking, urgency notices, output schema injection) lives entirely inAbstractAgent@agent_toolgainedpersist(NEVER/ALWAYS/OVERFLOW),max_characters, andtoken_optimizedparameters:token_optimized=Trueencodes results viatoonencoder for cheaper token usagepersist=ALWAYSsaves every tool result as a workspace artifactpersist=OVERFLOWauto-saves results exceedingmax_characterstoraw/as artifacts, returning an 800-char preview + artifact ID pointer to the LLM instead of blowing up contextAGENT_CARD(enforced by__init_subclass__)_collect_toolsis now@lru_cacheper subclass to avoid repeateddir(cls)traversal on every LLM call"output", the base class rewraps automaticallyWorkspace Refactor
raw/(read-only): tool result artifacts and mounted external filesoutput/: agent-generated deliverables (written by tools, read by humans)context/: reusable notes/context saved for later use in the same runmeta/: system-managed metadata (manifest.jsonl,input_mounts.jsonl) — not editablescratch/: ephemeral scratch spacesave_artifact()is the core write API — records provenance inmeta/manifest.jsonlwith SHA-256, size, content type, timestamp, and optional tool/code-run metadatasnapshot_paths,diff_snapshot) for tracking which output files changed during a tool callAPI Indexing Pipeline
Overview
End-to-end pipeline that turns raw CDP captures into a catalog of executable routines, fully autonomous.
Phase 1 — Exploration (4 specialists run in parallel via
ThreadPoolExecutor):NetworkSpecialist→NetworkExplorationSummaryValueTraceResolverSpecialist→StorageExplorationSummaryDOMSpecialist→DOMExplorationSummaryInteractionSpecialist→UIExplorationSummaryEach filters thousands of raw events down to the 5–15 endpoints/tokens/forms that actually matter.
Phase 2 — Routine Construction (PI orchestrator loop):
PrincipalInvestigatorreads all 4 exploration summaries, plans a routine catalog, dispatches experiments to concurrentExperimentWorkeragentsRoutineInspectorfor quality gatingDiscoveryLedgerand continues where the previous one left off (up to 3 attempts)New Agents
PrincipalInvestigator: orchestrator with no browser access — plans routines, dispatches workers, reviews results, assembles and ships routinesExperimentWorker: browser-capable execution agent with livebrowser_*tools (navigate, eval JS, raw CDP) and recorded capture lookup tools — executes experiments, does NOT make strategic decisionsRoutineInspector: independent quality gate — scores routines on 6 dimensions (task completion, data quality, parameter coverage, robustness, structural correctness, documentation), hard-fails on 4xx/5xx responses or unresolved placeholdersHow to Run
bluebox-api-index \ --cdp-captures-dir ./cdp_captures \ --task "Recover and validate routines from this captured session. Get all routines that deliver useful data to the user!" \ --output-dir ./api_indexing_output \ --model gpt-5.2 \ --post-run-analysisOther Changes
DOMDataLoaderfordom/events.jsonl— parses full DOM string-interning tables, classifies elements by tag familyBLUEBOX_SANDBOX_MODE=lambda),automode (Lambda > Docker > blocklist),read_only_pathssupport for workspace safety, expanded blocked-module workaround hintsDiscoveryLedger,ExperimentEntry,RoutineSpec,RoutineAttempt,RoutineCatalog,RoutineInspectionResultinorchestration/; exploration summaries inapi_indexing/agent_docs/) for auth token resolution, naming conventions, CORS workaroundsAbstractSpecialist,RoutineDiscoveryAgentBeta, and all olddocs/planning filesPost-Review Edits
Merged
run_python_codeintoexecute_pythonBlueBoxAgentexposed two Python execution tools to the LLM (execute_pythonfromAbstractAgentandrun_python_codedefined locally), which was confusing. Merged them:BlueBoxAgent._run_python_codeinto the baseAbstractAgent._execute_python— all workspace-backed agents now get file-tracking automatically_run_python_codeand its imports fromBlueBoxAgentBlueBoxAgent.SYSTEM_PROMPTto referenceexecute_pythoneverywheretest_blocklist_hints.pyto useexecute_pythonWorkspace file-tracking covers entire workspace
Previously
execute_pythononly snapshottedoutput/— files written tocontext/orscratch/via Python code were never diffed or uploaded to S3. Fixed by snapshotting the entire workspace root before and after execution, so every file change is captured regardless of directory.Removed
_after_chat_addedwrapperRemoved the
_after_chat_addedmethod fromAbstractAgentthat silently swallowed all exceptions and ignored theChatargument. Theon_chat_addedcallback is now called directly in_add_chatwith theChatobject passed through. Updated PI's lambda wiring to accept the_chatparam.