Agent Brain User Guide

This guide covers how to use Agent Brain for document indexing and semantic search using the Claude Code plugin.

Overview
Plugin Commands
Plugin Agents
Search Modes
Two-Stage Retrieval with Reranking
Indexing
Job Queue
Provider Configuration
Multi-Project Support
Runtime Autodiscovery
CLI Reference
Local Integration Check
Troubleshooting

Overview

Agent Brain is a RAG (Retrieval-Augmented Generation) system that indexes and searches documentation and source code. The primary interface is the Claude Code plugin which provides:

Component	Count	Description
Commands	24	Slash commands for all operations
Agents	3	Intelligent assistants for complex tasks
Skills	2	Context for optimal search and configuration

How It Works

Indexing: Reads documents/code, splits into semantic chunks, generates embeddings
Storage: Stores chunks in ChromaDB with metadata for filtering
Retrieval: Finds similar chunks using hybrid search (semantic + keyword)
GraphRAG: Extracts entities and relationships for dependency queries

Plugin Commands

Search Commands

Command	Description	Best For
`/agent-brain-search`	Smart hybrid search	General questions
`/agent-brain-semantic`	Pure vector search	Conceptual queries
`/agent-brain-keyword`	BM25 keyword search	Exact terms, function names
`/agent-brain-bm25`	Alias for keyword search	Error messages, symbols
`/agent-brain-vector`	Alias for semantic search	"How does X work?"
`/agent-brain-hybrid`	Hybrid with alpha control	Fine-tuned searches
`/agent-brain-graph`	Knowledge graph search	Dependencies, relationships
`/agent-brain-multi`	All modes with RRF fusion	Maximum recall

Server Commands

Command	Description
`/agent-brain-start`	Start server (auto-port allocation)
`/agent-brain-stop`	Stop the running server
`/agent-brain-status`	Check health and document count
`/agent-brain-list`	List all running instances
`/agent-brain-index`	Index documents or code
`/agent-brain-reset`	Clear the index

Setup Commands

Command	Description
`/agent-brain-setup`	Complete guided setup wizard
`/agent-brain-install`	Install pip packages
`/agent-brain-init`	Initialize project directory
`/agent-brain-config`	View/edit configuration
`/agent-brain-verify`	Verify configuration
`/agent-brain-help`	Show help information
`/agent-brain-version`	Show version information

Provider Commands

Command	Description
`/agent-brain-providers`	List and configure providers
`/agent-brain-embeddings`	Configure embedding provider
`/agent-brain-summarizer`	Configure summarization provider

Plugin Agents

Agent Brain includes three intelligent agents that handle complex, multi-step tasks:

Search Assistant

Performs multi-step searches across different modes and synthesizes answers.

Triggers: "Find all references to...", "Search for...", "What files contain..."

Example:

You: "Find all references to the authentication module"

Search Assistant:
1. Searches documentation for auth concepts
2. Searches code for auth imports and usage
3. Uses graph mode to find dependencies
4. Returns comprehensive list with file locations

Research Assistant

Deep exploration with follow-up queries and cross-referencing.

Triggers: "Research how...", "Investigate...", "Analyze the architecture of..."

Example:

You: "Research how error handling is implemented"

Research Assistant:
1. Identifies error handling patterns in docs
2. Finds exception classes and try/catch blocks
3. Traces error propagation through call graph
4. Synthesizes findings with code references

Setup Assistant

Guided installation, configuration, and troubleshooting.

Triggers: "Help me set up Agent Brain", "Configure...", "Why isn't... working"

Example:

You: "Help me set up Agent Brain with Ollama"

Setup Assistant:
1. Checks if Ollama is installed
2. Verifies embedding model is pulled
3. Configures provider settings
4. Tests the configuration
5. Reports success or guides through fixes

Search Modes

HYBRID (Default)

Combines semantic similarity with keyword matching. Best for general questions.

/agent-brain-search "how does the caching system work"

Adjust the balance with --alpha:

--alpha 0.7 - More semantic (conceptual queries)
--alpha 0.3 - More keyword (specific terms)

/agent-brain-hybrid "authentication flow" --alpha 0.7

VECTOR (Semantic)

Pure embedding-based search. Best for conceptual understanding.

/agent-brain-semantic "explain the overall architecture"

BM25 (Keyword)

TF-IDF based search. Best for exact terms, function names, error codes.

/agent-brain-keyword "NullPointerException"
/agent-brain-bm25 "getUserById"

GRAPH (Knowledge Graph)

Traverses entity relationships. Best for dependency and relationship queries.

/agent-brain-graph "what classes use AuthService"
/agent-brain-graph "what calls the validate function"

MULTI (Fusion)

Combines all modes using Reciprocal Rank Fusion. Best for maximum recall.

/agent-brain-multi "everything about data validation"

Two-Stage Retrieval with Reranking

Agent Brain can optionally use two-stage retrieval to improve search precision by 15-20%.

How It Works

Without Reranking (Default):

Query is embedded using the embedding model
Vector similarity search finds top_k most similar documents
Results are returned

With Reranking Enabled:

Query is embedded using the embedding model
Vector + BM25 hybrid search retrieves 10x more candidates
Cross-encoder model scores each candidate for relevance to the query
Results are reordered by cross-encoder score
Top_k results are returned

Why Reranking Helps

Embedding models (bi-encoders) are fast but approximate. They encode the query and documents separately, then compare vectors. This can miss nuanced relevance.

Cross-encoders process the query AND document together, allowing the model to attend across both texts. This is slower but more accurate.

When to Enable Reranking

Enable reranking when:

Precision matters more than latency
Queries are complex or nuanced
Initial results seem "close but not quite right"

Keep reranking disabled when:

Latency is critical (real-time search)
Running on resource-constrained hardware
Search quality is already acceptable

Configuration

Enable with environment variable:

export ENABLE_RERANKING=true

Or in config.yaml:

reranker:
  provider: sentence-transformers
  model: cross-encoder/ms-marco-MiniLM-L-6-v2

Provider Choices

sentence-transformers (Recommended):

Uses HuggingFace CrossEncoder models
Downloads model on first use (~50MB)
Fast inference (~50ms for 100 candidates)

ollama (Fully Local):

Uses Ollama chat completions for scoring
No external downloads
Slower (~500ms for 100 candidates)
Requires Ollama running locally

Response Fields

When reranking is enabled, results include additional metadata:

rerank_score: Cross-encoder relevance score
original_rank: Position before reranking (1-indexed)

Indexing

Index Documentation

/agent-brain-index ./docs

Index Code and Documentation

/agent-brain-index . --include-code

Index Specific Languages

/agent-brain-index ./src --include-code --languages python,typescript

Generate Code Summaries

Improves semantic search for code by generating LLM descriptions:

/agent-brain-index ./src --include-code --generate-summaries

Supported Languages

Agent Brain supports AST-aware chunking for:

Python (.py)
TypeScript (.ts, .tsx)
JavaScript (.js, .jsx)
Java (.java)
Go (.go)
Rust (.rs)
C (.c, .h)
C++ (.cpp, .hpp, .cc)
C# (.cs, .csx)
Swift (.swift)

Other languages use intelligent text-based chunking.

Check Index Status

/agent-brain-status

Clear and Rebuild Index

/agent-brain-reset
/agent-brain-index . --include-code

Job Queue

As of v3.0.0, indexing operations are queued and processed asynchronously.

How It Works

Submit: POST /index returns immediately with a job ID
Queue: Jobs are stored in .claude/agent-brain/jobs/index_queue.jsonl
Process: Background worker processes jobs sequentially
Track: Poll job status or use CLI --watch option

CLI Jobs Commands

# List all jobs
agent-brain jobs

# Watch queue with live updates
agent-brain jobs --watch

# Get job details
agent-brain jobs job_abc123def456

# Cancel a job
agent-brain jobs job_abc123def456 --cancel

Job States

Status	Description
`pending`	Queued, waiting to run
`running`	Currently processing
`done`	Completed successfully
`failed`	Failed with error
`cancelled`	Cancelled by user

Deduplication

The queue automatically deduplicates identical requests. If you submit the same folder with the same options while a job is pending or running, you get back the existing job ID.

Polling for Completion

# Check if indexing is done
agent-brain status --json | jq '.indexing.indexing_in_progress'

# Or poll specific job
agent-brain jobs job_abc123 | grep status

Provider Configuration

Agent Brain supports pluggable providers for embeddings and summarization.

Configure Providers Interactively

/agent-brain-providers

Embedding Providers

Provider	Models	Local
OpenAI	text-embedding-3-large, text-embedding-3-small	No
Ollama	nomic-embed-text, mxbai-embed-large	Yes
Cohere	embed-english-v3.0, embed-multilingual-v3.0	No

Summarization Providers

Provider	Models	Local
Anthropic	claude-haiku-4-5-20251001, claude-sonnet-4-5-20250514	No
OpenAI	gpt-5, gpt-5-mini	No
Gemini	gemini-3-flash, gemini-3-pro	No
Grok	grok-4, grok-4-fast	No
Ollama	llama4:scout, mistral-small3.2, qwen3-coder	Yes

Fully Local Mode

Run completely offline with Ollama:

/agent-brain-providers
# Select Ollama for embeddings
# Select Ollama for summarization

Multi-Project Support

Agent Brain supports multiple isolated instances for different projects.

Initialize a Project

/agent-brain-init

Creates .claude/agent-brain/ with project-specific configuration.

Start Project Server

/agent-brain-start

Automatically allocates a unique port (no conflicts).

List Running Instances

/agent-brain-list

Shows all running Agent Brain servers across projects.

Work from Subdirectories

Commands automatically resolve the project root:

cd src/deep/nested/directory
/agent-brain-status  # Finds the parent project's server

Runtime Autodiscovery

The CLI automatically discovers the server URL without manual configuration.

How It Works

When you run agent-brain start, the server writes a runtime.json file:

.claude/agent-brain/runtime.json

Contents:

{
  "base_url": "http://127.0.0.1:49321",
  "port": 49321,
  "bind_host": "127.0.0.1",
  "pid": 12345,
  "started_at": "2026-02-03T10:00:00Z",
  "foreground": false
}

CLI Resolution Order

The CLI resolves the server URL in this priority:

Environment variable: AGENT_BRAIN_URL
Runtime file: .claude/agent-brain/runtime.json (searches cwd upward)
Config file: config.yaml (if contains URL)
Default: http://127.0.0.1:8000

Config Discovery Order

Config files are searched in this order:

.claude/agent-brain/config.yaml (cwd, then walk upward)
~/.agent-brain/config.yaml
~/.config/agent-brain/config.yaml
Environment variable: AGENT_BRAIN_CONFIG

Example Workflow

# Start server (writes runtime.json automatically)
agent-brain start

# CLI auto-discovers server URL - no --url flag needed
agent-brain status
agent-brain index ./docs
agent-brain query "search term"

CLI Reference

For advanced users or automation, the CLI provides direct access:

Installation

pip install agent-brain-rag agent-brain-cli

Common Commands

# Initialize project
agent-brain init

# Start/stop server
agent-brain start          # Backgrounds by default
agent-brain start --foreground  # Run in foreground
agent-brain stop

# Index documents
agent-brain index ./docs --include-code

# Query
agent-brain query "your question" --mode hybrid

# Job management (v3.0+)
agent-brain jobs           # List all jobs
agent-brain jobs --watch   # Watch with live updates
agent-brain jobs JOB_ID    # Job details
agent-brain jobs JOB_ID --cancel  # Cancel job

# Status
agent-brain status
agent-brain list

Query Options

# Search modes
agent-brain query "term" --mode vector
agent-brain query "term" --mode bm25
agent-brain query "term" --mode hybrid --alpha 0.7
agent-brain query "term" --mode graph
agent-brain query "term" --mode multi

# Result tuning
agent-brain query "term" --top-k 10 --threshold 0.3

# Filtering
agent-brain query "term" --source-types code
agent-brain query "term" --languages python,typescript

# Output formats
agent-brain query "term" --json
agent-brain query "term" --scores

Local Integration Check

Before releasing or after major changes, run the local integration check to validate E2E functionality.

Running the Check

./scripts/local_integration_check.sh

Or using Task:

task local-integration

What It Validates

Server startup: Verifies server starts and writes runtime.json
Runtime autodiscovery: CLI finds server URL from runtime.json
Job queue: Indexing job completes without 409/500 errors
Query: Returns valid HTTP 200 response
CLI commands: agent-brain jobs works correctly

Expected Output

=== Agent Brain Local Integration Check ===
Step 1: Cleaning up stray processes...
Step 2: Cleaning up old state...
Step 3: Starting server in foreground...
Step 4: Checking runtime.json...
  Found runtime.json
  Server URL: http://127.0.0.1:49321
Step 5: Waiting for health endpoint...
  Server is healthy!
...
=== Integration Check PASSED ===

Troubleshooting Failed Checks

If the check fails:

runtime.json not found: Server failed to start - check for port conflicts
Job failed: Check server logs in .claude/agent-brain/logs/
Query failed: Index may be empty - verify test data was created

Troubleshooting

Server Not Running

/agent-brain-status

If not running:

/agent-brain-start

No Results Found

Check document count: /agent-brain-status
If 0 documents, re-index: /agent-brain-index ./docs
Try lowering threshold: /agent-brain-search "term" --threshold 0.3
Try different search mode: /agent-brain-keyword "exact term"

Configuration Issues

/agent-brain-verify

This checks:

Package installation
API key configuration
Server connectivity
Provider setup

Provider Errors

/agent-brain-providers

Verify your API keys are set correctly for the selected provider.

Reset Everything

/agent-brain-reset
/agent-brain-init
/agent-brain-start
/agent-brain-index . --include-code

Next Steps

Quick Start - Get running in minutes
Plugin Guide - All 24 commands in detail
API Reference - REST API documentation
GraphRAG Guide - Knowledge graph features
Provider Configuration - Provider setup

FilesExpand file tree

USER_GUIDE.md

Latest commit

History

USER_GUIDE.md

File metadata and controls

Agent Brain User Guide

Table of Contents

Overview

How It Works

Plugin Commands

Search Commands

Server Commands

Setup Commands

Provider Commands

Plugin Agents

Search Assistant

Research Assistant

Setup Assistant

Search Modes

HYBRID (Default)

VECTOR (Semantic)

BM25 (Keyword)

GRAPH (Knowledge Graph)

MULTI (Fusion)

Two-Stage Retrieval with Reranking

How It Works

Why Reranking Helps

When to Enable Reranking

Configuration

Provider Choices

Response Fields

Indexing

Index Documentation

Index Code and Documentation

Index Specific Languages

Generate Code Summaries

Supported Languages

Check Index Status

Clear and Rebuild Index

Job Queue

How It Works

CLI Jobs Commands

Job States

Deduplication

Polling for Completion

Provider Configuration

Configure Providers Interactively

Embedding Providers

Summarization Providers

Fully Local Mode

Multi-Project Support

Initialize a Project

Start Project Server

List Running Instances

Work from Subdirectories

Runtime Autodiscovery

How It Works

CLI Resolution Order

Config Discovery Order

Example Workflow

CLI Reference

Installation

Common Commands

Query Options

Local Integration Check

Running the Check

What It Validates

Expected Output

Troubleshooting Failed Checks

Troubleshooting

Server Not Running

No Results Found

Configuration Issues

Provider Errors

Reset Everything

Next Steps