Dev Browser Studio

Version 1.2.0

A browser automation toolkit with video recording and an autonomous AI perception-action loop for UI testing, data extraction, and browser agent development.

What is Dev Browser Studio?

Dev Browser Studio is a tool that lets you automate web browser actions (like clicking buttons, filling forms, and navigating websites) while recording everything on video. It also includes an autonomous perception-action loop that uses Claude's Vision API to see, reason about, and interact with web pages — no selectors or scripting required.

This is especially useful for:

Autonomous browser agents - Give the AI a task and watch it navigate, click, and extract data
Testing websites - Make sure buttons work, forms submit correctly, and pages load properly
Data extraction - Scrape structured data from pages using natural language instructions
Debugging issues - Record exactly what happens when something goes wrong
Quality assurance (QA) - Verify that your website looks and works correctly

Why Choose Dev Browser Studio?

The Problem with Other Tools

Most browser automation tools (like Playwright or Selenium) can take screenshots, but they can't easily record videos of what's happening. When something goes wrong, you're left guessing what happened between screenshots.

The Solution

Dev Browser Studio gives you on-demand video recording. You control exactly when recording starts and stops, and you get the video file immediately. No waiting, no complicated setup.

Feature	Dev Browser Studio	Standard Playwright	Playwright MCP
Video Recording	Start/stop anytime	Only automatic, full session	No
Video Available	Immediately	After page closes	N/A
Console Log Capture	Yes, with timestamps	Manual setup required	No
AI-Parseable Output	Key frames + JSON	No	No
Autonomous Agent Loop	Yes (Claude Vision)	No	No
Persistent Pages	Yes	No	No
Recording Control	Full control	No control	N/A

Key Advantages

Perception Loop - Autonomous AI agent that sees, reasons, and acts on web pages
On-Demand Recording - Start and stop recording whenever you want
Instant Access - Get the video file immediately after stopping
Console Log Capture - Automatically captures console.log/warn/error during recordings
AI-Parseable Output - Extracts key frames as images + JSON summary for AI analysis
Persistent Pages - Pages stay open between scripts, so you don't lose your place
Budget Controls - Limit cycles, tokens, cost, and duration for agent runs
Simple API - Easy to use, even if you're new to programming

Prerequisites

Before you can use Dev Browser Studio, you need to install a few things on your computer.

Required Software

1. Node.js (version 18 or later)

Node.js is a program that runs JavaScript code on your computer.

How to check if you have it:

node --version

If you see a number like v18.0.0 or higher, you're good! If not, install it:

Mac: brew install node (requires Homebrew)
Windows: Download from nodejs.org
Linux: sudo apt install nodejs npm (Ubuntu/Debian)

2. npm (comes with Node.js)

npm is a package manager that installs JavaScript libraries.

How to check:

npm --version

3. ffmpeg (for video encoding)

ffmpeg converts the recorded frames into video files.

How to install:

Mac: brew install ffmpeg
Windows: Download from ffmpeg.org and add to PATH
Linux: sudo apt install ffmpeg

How to check:

ffmpeg -version

Note: If you don't have ffmpeg, the tool will save individual image frames instead of a video file. The video feature works best with ffmpeg installed.

4. Anthropic API Key (for perception loop)

The autonomous perception loop requires a Claude API key:

export ANTHROPIC_API_KEY="sk-ant-..."

Get one at console.anthropic.com. Not needed for video recording or manual scripting.

5. Claude Code (optional but recommended)

If you're using this as a Claude Code skill:

npm install -g @anthropic-ai/claude-code

Installation

Option 1: As a Claude Code Skill (Recommended)

If you use Claude Code, this is the easiest way to get started.

Step 1: Clone the repository

git clone https://github.com/tripleyak/dev-browser-studio.git ~/.claude/skills/dev-browser-studio

Step 2: Install dependencies

cd ~/.claude/skills/dev-browser-studio
npm install

Step 3: Restart Claude Code

Close and reopen Claude Code. The skill will be automatically available.

Option 2: Standalone Installation

If you want to use this without Claude Code:

Step 1: Clone the repository

git clone https://github.com/tripleyak/dev-browser-studio.git
cd dev-browser-studio

Step 2: Install dependencies

npm install

Step 3: Start the server

npm run start-server

You should see:

Using persistent browser profile: /path/to/.browser-data
Recordings directory: /path/to/recordings
Browser launched with persistent profile...
HTTP API server running on port 9222

Quick Start Guide

Here's a simple example to get you started. This script will:

Open a website
Start recording
Click around
Stop recording and save the video

Example: Record a Website Visit

import { connect, waitForPageLoad } from "./src/client.js";

// Connect to the browser server
const client = await connect();

// Create a new page called "demo"
const page = await client.page("demo");

// Go to a website
await page.goto("https://example.com");
await waitForPageLoad(page);

// Start recording
await client.startRecording("demo");
console.log("Recording started!");

// Do some actions (these will be recorded)
await page.click("a");  // Click a link
await waitForPageLoad(page);

// Take a screenshot too (optional)
await page.screenshot({ path: "screenshot.png" });

// Stop recording and get AI-parseable results
const result = await client.stopRecording("demo");
console.log(`Video saved to: ${result.videoPath}`);
console.log(`Duration: ${result.durationMs}ms`);
console.log(`Frames captured: ${result.frameCount}`);
console.log(`Console logs: ${result.consoleLogs?.length ?? 0}`);
console.log(`Key frames for AI: ${result.keyFramePaths?.join(", ")}`);
console.log(`Summary JSON: ${result.summaryPath}`);

// Disconnect (the page stays open for later)
await client.disconnect();

Running the Example

Save the code above to a file called demo.ts, then run:

npx tsx demo.ts

Your video will be saved in the recordings/ folder!

How to Use

Starting the Server

Before you can automate browsers, start the server:

# From the dev-browser-studio directory
npm run start-server

Options:

Add --headless to run without showing the browser window
The browser window is useful for debugging (you can see what's happening)

Basic Operations

Creating and Using Pages

import { connect } from "./src/client.js";

const client = await connect();

// Create or get a page by name
const page = await client.page("my-page");

// Pages persist! If you run this script again,
// you'll get the same page in the same state

Navigating Websites

// Go to a URL
await page.goto("https://google.com");

// Wait for the page to fully load
await waitForPageLoad(page);

// Get current URL
console.log(page.url());

// Get page title
console.log(await page.title());

Interacting with Elements

// Click a button or link
await page.click("button.submit");

// Type text into an input field
await page.fill("input[name='email']", "user@example.com");

// Select from a dropdown
await page.selectOption("select#country", "USA");

// Check a checkbox
await page.check("input[type='checkbox']");

Taking Screenshots

// Screenshot of visible area
await page.screenshot({ path: "screenshot.png" });

// Screenshot of entire page (including scrolled content)
await page.screenshot({ path: "full-page.png", fullPage: true });

Video Recording

This is what makes Dev Browser Studio special!

Start Recording

await client.startRecording("page-name", {
  maxWidth: 1280,    // Video width (default: 1280)
  maxHeight: 720,    // Video height (default: 720)
  quality: 80,       // JPEG quality 0-100 (default: 80)
});

Stop Recording and Get Video

const result = await client.stopRecording("page-name");

console.log(result.videoPath);   // Path to the video file
console.log(result.durationMs);  // Recording duration in milliseconds
console.log(result.frameCount);  // Number of frames captured

Check Recording Status

const status = await client.getRecordingStatus("page-name");

if (status.isRecording) {
  console.log(`Recording since: ${status.startedAt}`);
  console.log(`Frames so far: ${status.frameCount}`);
}

AI-Parseable Recording Output

Each recording produces three outputs designed for AI consumption:

Video File (WebM) - The full recording at recordings/<timestamp>.webm
Key Frame Images (JPEG) - Evenly-spaced frames that AI assistants can view directly
Summary JSON - Structured metadata including console logs

Example summary JSON:

{
  "recording": {
    "videoPath": "recordings/1705432100000.webm",
    "durationMs": 5230,
    "frameCount": 157,
    "startedAt": "2024-01-16T20:15:00.000Z",
    "stoppedAt": "2024-01-16T20:15:05.230Z"
  },
  "consoleLogs": [
    {
      "timestamp": "2024-01-16T20:15:01.234Z",
      "level": "log",
      "text": "Button clicked",
      "url": "https://example.com/app"
    },
    {
      "timestamp": "2024-01-16T20:15:02.567Z",
      "level": "error",
      "text": "Failed to fetch data",
      "url": "https://example.com/app"
    }
  ],
  "keyFrames": [
    "recordings/1705432100000_frame_0.jpg",
    "recordings/1705432100000_frame_1.jpg",
    "recordings/1705432100000_frame_2.jpg"
  ],
  "page": {
    "url": "https://example.com/app",
    "title": "My Application"
  }
}

The key frames allow Claude and other AI assistants to "see" what happened during the recording by viewing the extracted images.

AI-Friendly Page Inspection

Dev Browser Studio can describe what's on a page in a format that's easy for AI assistants to understand.

// Get a structured description of the page
const snapshot = await client.getAISnapshot("page-name");
console.log(snapshot);

This returns something like:

- banner:
  - link "Home" [ref=e1]
  - link "About" [ref=e2]
- main:
  - heading "Welcome" [level=1]
  - button "Get Started" [ref=e3]
  - textbox [ref=e4]
    - /placeholder: "Enter your email"

You can then interact with elements by their reference:

const button = await client.selectSnapshotRef("page-name", "e3");
await button.click();

Perception Loop (Autonomous Agent)

The perception loop lets an AI agent autonomously interact with web pages. It captures screenshots and ARIA snapshots, sends them to Claude Vision, receives structured actions, and executes them in a loop until the task is complete.

Basic Usage

import { connect } from "./src/client.js";
import { PerceptionLoop } from "./src/perception-loop.js";

const client = await connect();
const page = await client.page("agent");
await page.goto("https://books.toscrape.com");

const loop = new PerceptionLoop({
  maxCycles: 10,
  budget: { maxEstimatedCostUSD: 0.50 },
});

const result = await loop.run(
  client,
  "agent",
  "Find the cheapest book on this page. Return the title and price in extracted_data.",
);

console.log(result.success);        // true
console.log(result.extractedData);  // { title: "...", price: "£13.99" }
console.log(result.budgetUsed);     // { cycles: 3, estimatedTokens: 39359, ... }

How It Works

Each cycle:

Capture — Takes a JPEG screenshot + ARIA accessibility snapshot
Perceive — Sends both to Claude Vision with the task and action history
Act — Claude returns a structured tool_use action (click, type, scroll, navigate, etc.)
Execute — The action runs on the page via Playwright
Repeat — Until the agent calls done or fail, or a budget limit is hit

The agent has 10 actions available: click, type, scroll, navigate, keyboard, wait, hover, select, done, and fail.

Configuration

const loop = new PerceptionLoop({
  model: "claude-sonnet-4-5-20250929",  // Claude model (default: Sonnet)
  maxCycles: 50,                         // Max perception-action cycles
  maxConsecutiveErrors: 5,               // Stop after N errors in a row
  maxSnapshotChars: 40000,               // Truncate large ARIA snapshots
  settleTimeMs: 300,                     // Wait after each action for page to settle
  budget: {
    maxCycles: 100,                      // Hard cycle limit
    maxTokens: 500000,                   // Total input+output tokens
    maxCostUSD: 5.00,                    // Estimated cost cap
    maxDurationMs: 600000,               // 10 minute timeout
  },
  safety: {
    readOnlyMode: false,                 // Block clicks/typing (allow only scroll/navigate)
    blockedURLPatterns: [".*admin.*"],    // Regex patterns to block navigation
  },
});

Audit Logging

Every run produces an audit trail in recordings/perception-<timestamp>/:

perception-1705432100000/
├── cycles.jsonl    # One JSON line per cycle (action, result, tokens, timing)
├── summary.json    # Final result + budget usage
└── frames/
    ├── cycle-0.jpg # Screenshot at each cycle
    ├── cycle-1.jpg
    └── ...

Result Object

interface LoopResult {
  success: boolean;                    // Did the agent complete the task?
  summary: string;                     // Agent's summary of what happened
  cycles: number;                      // Total cycles used
  extractedData?: Record<string, unknown>;  // Data the agent extracted
  budgetUsed: {
    cycles: number;
    estimatedTokens: number;
    estimatedCostUSD: number;
    durationMs: number;
  };
}

API Reference

Client Methods

Method	Description
`connect(url?)`	Connect to the server (default: `http://localhost:9222`)
`client.page(name, options?)`	Get or create a page by name
`client.list()`	List all page names
`client.close(name)`	Close a page
`client.disconnect()`	Disconnect from server (pages stay open)
`client.startRecording(name, options?)`	Start video recording
`client.stopRecording(name)`	Stop recording and get video + console logs + key frames
`client.getRecordingStatus(name)`	Check if recording is active
`client.getConsoleLogs(name)`	Get captured console logs
`client.clearConsoleLogs(name)`	Clear captured console logs
`client.getAISnapshot(name)`	Get AI-friendly page description
`client.selectSnapshotRef(name, ref)`	Get element by reference ID

Recording Options

Option	Type	Default	Description
`maxWidth`	number	1280	Maximum video width in pixels
`maxHeight`	number	720	Maximum video height in pixels
`quality`	number	80	JPEG quality (0-100, higher = better quality, larger files)
`everyNthFrame`	number	1	Capture every Nth frame (1 = all frames)
`captureConsoleLogs`	boolean	true	Capture console.log/warn/error during recording
`extractKeyFrames`	boolean	true	Extract key frames as separate images for AI viewing
`keyFrameCount`	number	5	Number of key frames to extract

Perception Loop Options

Option	Type	Default	Description
`model`	string	`claude-sonnet-4-5-20250929`	Claude model for vision
`maxCycles`	number	50	Max perception-action cycles
`maxConsecutiveErrors`	number	5	Stop after N consecutive errors
`maxSnapshotChars`	number	40000	Truncate ARIA snapshots beyond this
`settleTimeMs`	number	300	Wait time after actions (ms)
`apiTimeoutMs`	number	30000	Claude API call timeout (ms)
`budget.maxCycles`	number	100	Hard cycle budget limit
`budget.maxTokens`	number	500000	Max total tokens
`budget.maxCostUSD`	number	5.00	Max estimated cost
`budget.maxDurationMs`	number	600000	Max duration (ms)
`safety.readOnlyMode`	boolean	false	Block mutating actions
`safety.blockedURLPatterns`	string[]	[]	Regex patterns to block

Page Options

Option	Type	Description
`viewport.width`	number	Browser window width
`viewport.height`	number	Browser window height

Common Use Cases

Testing a Login Flow

const client = await connect();
const page = await client.page("login-test");

// Start recording to capture the entire flow
await client.startRecording("login-test");

// Navigate to login page
await page.goto("https://myapp.com/login");
await waitForPageLoad(page);

// Fill in credentials
await page.fill("input[name='email']", "test@example.com");
await page.fill("input[name='password']", "testpassword");

// Click login button
await page.click("button[type='submit']");
await waitForPageLoad(page);

// Verify we're logged in
const welcomeText = await page.textContent("h1");
console.log(`Page says: ${welcomeText}`);

// Stop recording
const { videoPath } = await client.stopRecording("login-test");
console.log(`Login flow recorded to: ${videoPath}`);

await client.disconnect();

Debugging a Bug

const client = await connect();
const page = await client.page("debug");

// Go to the page with the bug
await page.goto("https://myapp.com/buggy-page");

// Start recording before the problematic action
await client.startRecording("debug");

// Perform the action that causes the bug
await page.click("#problematic-button");

// Wait a moment to capture the result
await new Promise(resolve => setTimeout(resolve, 2000));

// Stop recording
const { videoPath } = await client.stopRecording("debug");
console.log(`Bug reproduction recorded to: ${videoPath}`);

// Also take a screenshot of the final state
await page.screenshot({ path: "bug-state.png" });

await client.disconnect();

Visual Regression Testing

const client = await connect();
const page = await client.page("visual-test");

const pagesToTest = [
  "https://myapp.com/",
  "https://myapp.com/about",
  "https://myapp.com/contact",
];

await client.startRecording("visual-test");

for (const url of pagesToTest) {
  await page.goto(url);
  await waitForPageLoad(page);

  // Take a screenshot of each page
  const filename = url.replace(/[^a-z0-9]/gi, "_") + ".png";
  await page.screenshot({ path: `screenshots/${filename}`, fullPage: true });
}

const { videoPath } = await client.stopRecording("visual-test");
console.log(`Visual test recorded to: ${videoPath}`);

await client.disconnect();

Troubleshooting

"Cannot connect to server"

Problem: The script can't connect to http://localhost:9222

Solution: Make sure the server is running:

npm run start-server

"ffmpeg not found"

Problem: Videos aren't being created, only image frames

Solution: Install ffmpeg:

Mac: brew install ffmpeg
Windows: Download from ffmpeg.org
Linux: sudo apt install ffmpeg

"Page not found"

Problem: client.page("name") throws an error

Solution: Make sure the server is running and the page name is correct. Page names are case-sensitive.

Browser window doesn't appear

Problem: You can't see the browser

Solution: Make sure you didn't start with --headless. Run:

npm run start-server

Without any flags, the browser window should be visible.

Recording has no frames

Problem: stopRecording returns frameCount: 0

Solution: Make sure something is happening on the page during recording. Try adding delays or actions between start and stop:

await client.startRecording("test");
await page.goto("https://example.com");
await new Promise(r => setTimeout(r, 1000)); // Wait 1 second
const result = await client.stopRecording("test");

Frequently Asked Questions

What browsers does this support?

Dev Browser Studio uses Chromium (the open-source version of Chrome). It's bundled with Playwright, so you don't need to install it separately.

Can I use my existing Chrome profile?

Yes! Use "Extension Mode" by installing the Chrome extension. This lets you automate your existing Chrome browser with all your logged-in sessions, bookmarks, and extensions.

How long can recordings be?

There's no hard limit, but longer recordings create larger files. A typical 1-minute recording at 720p is around 5-10MB.

What video format is used?

Videos are saved as WebM files using the VP9 codec. This format is widely supported and provides good compression.

Can I record mobile layouts?

Yes! Set a mobile viewport when creating the page:

const page = await client.page("mobile-test", {
  viewport: { width: 375, height: 812 }  // iPhone X size
});

Does this work on CI/CD pipelines?

Yes! Use headless mode:

npm run start-server -- --headless

Project Structure

dev-browser-studio/
├── README.md              # This file
├── SKILL.md               # Skill instructions for AI assistants
├── package.json           # Project dependencies
├── tsconfig.json          # TypeScript configuration
├── server.sh              # Server startup script
├── src/
│   ├── index.ts           # Server code
│   ├── client.ts          # Client API
│   ├── perception-loop.ts # Autonomous agent loop
│   ├── vlm-client.ts      # Claude Vision API wrapper
│   ├── tools.ts           # Agent action vocabulary + executor
│   ├── frame-sampler.ts   # Perceptual hash change detection
│   ├── budget.ts          # Cycle/token/cost/duration limits
│   ├── audit-logger.ts    # JSONL audit trail + frame saving
│   ├── video-encoder.ts   # Video encoding
│   ├── types.ts           # TypeScript types
│   └── snapshot/          # Page inspection code
├── scripts/
│   └── start-server.ts    # Server entry point
└── recordings/            # Videos + perception loop audit logs

Contributing

Contributions are welcome! If you find a bug or have a feature request:

Open an issue on GitHub
Fork the repository
Create a branch for your changes
Submit a pull request

License

MIT License - feel free to use this in your own projects!

Credits

Dev Browser Studio is built on top of:

Playwright - Browser automation library
Chrome DevTools Protocol - For video recording
Original concept from dev-browser by Sawyer Hood

Version History

v1.2.0

Autonomous perception-action loop using Claude Vision API
10 agent actions: click, type, scroll, navigate, keyboard, wait, hover, select, done, fail
Budget controls: cycle, token, cost, and duration limits
ARIA snapshot truncation for large pages (configurable maxSnapshotChars)
Navigation recovery: page handle re-acquired after link clicks
Perceptual hash frame sampling (skip unchanged frames)
JSONL audit logging with per-cycle screenshots
Safety guardrails: read-only mode, URL pattern blocking, stuck detection
New dependencies: @anthropic-ai/sdk, sharp

v1.1.0

Console log capture via CDP Runtime API
Key frame extraction as JPEG images for AI viewing
Recording summary JSON with metadata and logs
Enhanced stopRecording response with consoleLogs, keyFramePaths, summaryPath
New client methods: getConsoleLogs(), clearConsoleLogs()
Updated documentation with AI-parseable output examples

v1.0.0 (Initial Release)

On-demand video recording with CDP Screencast
Persistent page management
AI-friendly page snapshots
WebM video encoding with ffmpeg
Comprehensive documentation for beginners

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
references		references
scripts		scripts
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
SKILL.md		SKILL.md
package-lock.json		package-lock.json
package.json		package.json
server.sh		server.sh
tsconfig.json		tsconfig.json
vitest.config.ts		vitest.config.ts

License

tripleyak/dev-browser-studio

Folders and files

Latest commit

History

Repository files navigation

Dev Browser Studio

What is Dev Browser Studio?

Why Choose Dev Browser Studio?

The Problem with Other Tools

The Solution

Key Advantages

Prerequisites

Required Software

1. Node.js (version 18 or later)

2. npm (comes with Node.js)

3. ffmpeg (for video encoding)

4. Anthropic API Key (for perception loop)

5. Claude Code (optional but recommended)

Installation

Option 1: As a Claude Code Skill (Recommended)

Option 2: Standalone Installation

Quick Start Guide

Example: Record a Website Visit

Running the Example

How to Use

Starting the Server

Basic Operations

Creating and Using Pages

Navigating Websites

Interacting with Elements

Taking Screenshots

Video Recording

Start Recording

Stop Recording and Get Video

Check Recording Status

AI-Parseable Recording Output

AI-Friendly Page Inspection

Perception Loop (Autonomous Agent)

Basic Usage

How It Works

Configuration

Audit Logging

Result Object

API Reference

Client Methods

Recording Options

Perception Loop Options

Page Options

Common Use Cases

Testing a Login Flow

Debugging a Bug

Visual Regression Testing

Troubleshooting

"Cannot connect to server"

"ffmpeg not found"

"Page not found"

Browser window doesn't appear

Recording has no frames

Frequently Asked Questions

What browsers does this support?

Can I use my existing Chrome profile?

How long can recordings be?

What video format is used?

Can I record mobile layouts?

Does this work on CI/CD pipelines?

Project Structure

Contributing

License

Credits

Version History

v1.2.0

v1.1.0

v1.0.0 (Initial Release)

About

Topics

Resources

License

Uh oh!

Stars

Packages