SlopCodeBench Documentation

Welcome to the SlopCodeBench documentation. This guide will help you navigate the documentation and find what you need.

Quick Start

New to SlopCodeBench? Start with FAQ.md for an overview and answers to common questions.

What do you want to do?

Run an agent on problems → agents/ → commands/run.md
Create a new problem → contributing-problems/ → problems/tutorial.md
Evaluate results → evaluation/ → commands/eval.md
Understand code quality metrics → metrics/
Troubleshoot issues → FAQ.md → KNOWN_ISSUES.md

Documentation Structure

The documentation is organized into focused groups, each covering a specific aspect of SlopCodeBench.

agents/

Configure and run AI agents with API credentials, models, and providers

Learn how to set up agents, configure API credentials, select models, and run benchmarks. Includes detailed documentation for six agent implementations (Claude Code, Codex, Gemini, MiniSWE, OpenCode, OpenHands).

Key files:

agents/README.md - Quick start and configuration flow
agents/providers.md - Provider definitions and credential sources
agents/models.md - Model catalog and pricing
agents/agents/ - Individual agent implementation guides

commands/

Reference documentation for all CLI commands

Complete reference for SlopCodeBench's command-line interface, including agent execution, evaluation, metrics, utilities, Docker image building, and visualization tools.

Key files:

commands/README.md - Command summary and index
commands/run.md - Agent execution with unified config system
commands/eval.md - Evaluate run directories
commands/metrics.md - Metrics subcommands (static, judge, variance)
commands/viz.md - Visualization tools (diff viewer)

problems/

Guide to problem authoring, structure, and configuration

Everything you need to create effective benchmark problems, from basic structure to advanced features like checkpoints, test case groups, and regression testing.

Key files:

problems/README.md - Main problem authoring guide
problems/tutorial.md - Step-by-step tutorial for your first problem
problems/config-schema.md - Complete field-by-field config reference
problems/examples/ - Worked examples with walkthroughs

contributing-problems/

Design philosophy and validation for problem contributions

Understand best practices for creating high-quality benchmark problems, including design philosophy, worked examples, and a pre-submission checklist.

Key files:

contributing-problems/README.md - Problem design philosophy
contributing-problems/example_process.md - Worked example (calculator problem)
contributing-problems/checklist.md - Pre-submission validation checklist

evaluation/

Pytest-based test execution and results reporting

Deep dive into how SlopCodeBench executes pytest tests against submissions, categorizes results by type, and generates structured reports.

Key files:

evaluation/README.md - Quick start and overview
evaluation/architecture.md - Pytest runner and execution flow
evaluation/configuration.md - Problem and checkpoint configuration
evaluation/reporting.md - Results, pass policies, and export formats

execution/

Workspace management, runtime execution, and snapshots

Understand how SlopCodeBench manages workspaces, executes code in different environments (local, Docker), handles assets, and captures snapshots.

Key files:

execution/README.md - Overview of Workspace, Session, and SubmissionRuntime
execution/manager.md - Session lifecycle and workspace orchestration
execution/docker.md - Container execution and networking
execution/snapshots.md - Capturing and comparing workspace state

metrics/

Code quality measurement and analysis

Learn about SlopCodeBench's 7-category metric system for measuring code quality, including interpretation, configuration, and output formats.

Key files:

metrics/README.md - Overview and metric categories
metrics/interpreting-results.md - Explanation of each metric
metrics/configuration.md - Thresholds and AST-grep rules

Support Files

FAQ.md

Frequently asked questions covering getting started, running agents, evaluation, metrics, troubleshooting, and contributing. Start here if you're new to SlopCodeBench.

KNOWN_ISSUES.md

Current limitations and known bugs organized by problem. Check this if you encounter unexpected behavior.

Common Workflows

Creating Your First Problem

Read contributing-problems/README.md for design philosophy
Follow problems/tutorial.md for step-by-step guidance
Reference problems/config-schema.md for configuration details
Review contributing-problems/checklist.md before submission

Running Your First Agent

Review FAQ.md for system overview
Set up credentials following agents/README.md
Configure your agent in agents/models.md
Run with commands/run.md

Evaluating and Understanding Results

Evaluate runs using commands/eval.md
Understand the evaluation system in evaluation/
Analyze code quality with metrics/
Interpret results using metrics/interpreting-results.md

Extending SlopCodeBench

Write custom pytest tests following problems/pytest/
Add new execution environments with execution/extending.md
Implement custom agents following agents/agent-class.md

Documentation Conventions

README.md files in each directory provide quick-start guides and navigation
Detailed guides cover specific topics in depth
Cross-references link related documentation
Examples are provided throughout with real-world scenarios
Troubleshooting sections appear in most modules for common issues

Need Help?

Check FAQ.md for common questions
Review KNOWN_ISSUES.md for current limitations
Look for troubleshooting sections in relevant documentation groups
Check evaluation/troubleshooting.md for pytest-related issues

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SlopCodeBench Documentation

Quick Start

Documentation Structure

agents/

commands/

problems/

contributing-problems/

evaluation/

execution/

metrics/

Support Files

FAQ.md

KNOWN_ISSUES.md

Common Workflows

Creating Your First Problem

Running Your First Agent

Evaluating and Understanding Results

Extending SlopCodeBench

Documentation Conventions

Need Help?

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

SlopCodeBench Documentation

Quick Start

Documentation Structure

Support Files

Common Workflows

Creating Your First Problem

Running Your First Agent

Evaluating and Understanding Results

Extending SlopCodeBench

Documentation Conventions

Need Help?