Koda is an intelligent autonomous agent for the web. Unlike traditional scrapers that break easily, Koda uses Computer Vision and Multi-LLM Intelligence to understand and interact with any web application β just like a human user.
β οΈ Public Beta: Koda is in active development. Report issues at GitHub Issues.
- π§ Smart Understanding: Uses AI (Gemini, OpenAI, Claude) to understand web pages naturally
- π Self-Healing: Adapts when sites change β no more broken selectors
- π Multi-Browser: Works with Chrome, Firefox, Safari, Edge
- π± Mobile Ready: Automate iOS and Android apps with the same code
- π― Stagehand Compatible: Drop-in replacement with familiar API
- β‘ Easy to Use: Simple JavaScript API for complex tasks
Recent improvements now live in the codebase:
- REST API now includes built-in request rate limiting and safer default host binding (
127.0.0.1by default). - Tool hardening:
APIToolblocks private/local network targets by default and supports explicit host allowlists.FileToolenforces stronger path-boundary checks and rejects symlink targets.
- Packaging reliability:
- npm tarballs now include required root-level runtime modules used by the public entrypoint.
- Release/CI reliability:
test,lint, andbuildscripts no longer swallow failures.- GitHub workflows use
npm ci --omit=optional --legacy-peer-depsfor cleaner deterministic installs.
See detailed audit and roadmap in docs/reports/PROJECT_AUDIT_RECOMMENDATIONS.md.
npm install @trentpierce/kodaconst { createAgent } = require('@trentpierce/koda');
async function main() {
const agent = await createAgent({
provider: 'gemini', // or 'openai', 'anthropic'
apiKey: process.env.GEMINI_API_KEY,
headless: false
});
// Navigate and interact naturally
await agent.goto('https://example.com');
await agent.act('Click the login button');
await agent.type('#username', 'myuser');
await agent.type('#password', 'mypass');
await agent.act('Click submit');
// Extract information
const data = await agent.extract('Get all product prices');
console.log(data);
await agent.close();
}
main().catch(console.error);That's it! Koda handles the rest β finding elements, handling dynamic content, adapting to changes.
- Node.js 18+
- An LLM API key (Gemini, OpenAI, or Anthropic)
# Core library
npm install @trentpierce/koda
# Optional: Puppeteer (recommended)
npm install puppeteer
# Optional: Mobile automation
npm install webdriverio
npm install -g appium
# Optional: Computer vision
npm install sharp opencv4nodejs
# Optional: LLM SDKs
npm install openai @anthropic-ai/sdk- Multi-LLM Support: Switch between Gemini, OpenAI, and Claude
- Visual Understanding: Sees the page like a human, not just the DOM
- Natural Language Commands: Describe what you want, Koda figures out how
Automatically adapts when sites change:
- Falls back to alternative selectors when primary ones fail
- Uses visual matching when selectors aren't available
- Reduces maintenance and debugging time
- Web: Chrome, Firefox, Safari, Edge
- Mobile: iOS and Android apps with Appium
- Same code works everywhere
- Session Management: Persistent authentication across runs
- Network Interception: Mock APIs and intercept requests
- Computer Vision: OCR, object detection, visual element finding
- Reinforcement Learning: Improve over time with experience
const { SelfHealingSelector } = require('@trentpierce/koda');
const selector = new SelfHealingSelector({
enableHealing: true,
maxHealingAttempts: 5
});
// If #login-btn fails, automatically tries:
// [data-testid="login"], [aria-label="Login"], button:has-text("Login"), etc.
const element = await selector.findWithHealing(page, '#login-btn');const { MobileAgent } = require('@trentpierce/koda/mobile');
const agent = new MobileAgent({
platform: 'android',
deviceName: 'Pixel_6_API_33',
appPackage: 'com.example.app'
});
await agent.initialize();
// Same natural language commands as web
await agent.tap('Login');
await agent.type('#username', 'testuser');
await agent.swipe({ direction: 'up' });const { NetworkInterceptor } = require('@trentpierce/koda');
const interceptor = new NetworkInterceptor();
await interceptor.init(page);
// Mock API responses
interceptor.mock('**/api/users', {
status: 200,
body: [{ id: 1, name: 'Mock User' }]
});
// Modify requests
interceptor.route('**/*', (request) => {
if (request.url().includes('api')) {
request.continue({
headers: { ...request.headers(), 'X-Custom-Header': 'value' }
});
} else {
request.continue();
}
});const { SessionManager } = require('@trentpierce/koda');
const sessions = new SessionManager({ storagePath: './sessions' });
// Capture state after login
await sessions.captureState(page, 'user-session');
// Restore later without re-login
await sessions.restoreState(page, 'user-session');const { createAgent } = require('@trentpierce/koda');
const agent = await createAgent({ ... });
// Register custom tool
agent.registerTool('myTool', async (params) => {
return { success: true, data: params };
}, {
name: 'myTool',
description: 'My custom tool',
parameters: {
type: 'object',
properties: { key: { type: 'string' } }
}
});
// Use it
const result = await agent.useTool('myTool', { key: 'value' });Create a .env file:
# Required: Choose at least one LLM provider
GEMINI_API_KEY=your_gemini_key
OPENAI_API_KEY=your_openai_key
ANTHROPIC_API_KEY=your_anthropic_key
# Optional
GEMINI_MODEL=gemini-1.5-flash
OPENAI_MODEL=gpt-4
ANTHROPIC_MODEL=claude-3-opus-20240229- Quick Start Guide - All installation and setup options
- Mobile Automation - iOS and Android automation
- Reinforcement Learning - RL algorithms and usage
- Contributing - How to contribute
- API Reference - Complete API documentation
Import as a JavaScript module. Perfect for:
- Integration into existing projects
- CI/CD pipelines
- Node.js applications
Desktop application with UI. Good for:
- Manual testing and debugging
- Visual workflow creation
- Password-protected memory
Run as a service. Ideal for:
- Cloud deployment
- Multi-user access
- API-driven automation
# Run as server
npm run server
# Run standalone (Electron)
npm start# Run tests
npm test
# Run with coverage
npm run test:coverage
# Lint
npm run lint- Version: 2.2.0
- License: Non-Commercial with Attribution
- CI/CD: GitHub Actions
- Test Coverage: Comprehensive
- Browsers: Chrome, Firefox, Safari, Edge
- Mobile: iOS, Android (Appium)
Koda Non-Commercial License with Attribution
- Use for personal projects
- Use for educational purposes
- Use in non-profit organizations
- Create open-source derivatives
- Contribute improvements back
- Provide attribution to Trent Pierce in source code, documentation, and UI
- Include the license file when distributing
- State any changes you make
- Use for commercial purposes without a separate license
- Sell this software or derivatives
- Use in business operations for profit
- Remove attribution
See LICENSE for full terms.
Commercial licensing available - Contact Trent Pierce for inquiries.
Contributions welcome! Please:
- Fork the repository
- Create a feature branch
- Make your changes
- Submit a pull request
See CONTRIBUTING.md for guidelines.
- GitHub: https://github.com/TrentPierce/Koda
- Issues: https://github.com/TrentPierce/Koda/issues
- Community: Join the discussion
Built with intelligence, designed for scale.

