Name	Name	Last commit message	Last commit date
parent directory ..
README.md	README.md
docker.md	docker.md
eval-problem.md	eval-problem.md
eval-snapshot.md	eval-snapshot.md
eval.md	eval.md
infer-problem.md	infer-problem.md
metrics.md	metrics.md
problems.md	problems.md
run.md	run.md
tools.md	tools.md
utils.md	utils.md
viz.md	viz.md

version	last_updated
1.0	2025-12-22

CLI Commands Reference

This section documents all commands available in the slop-code CLI.

Quick Reference

Command	Description
`run`	Run agents on benchmarks with unified config system
`eval`	Evaluate a directory of agent inference results
`eval-problem`	Evaluate a single problem directory
`eval-snapshot`	Evaluate a single snapshot directory
`infer-problem`	Run inference on a single problem
`metrics`	Calculate metrics (static, judge, carry-forward, variance)
`utils`	Utility commands for maintenance and data processing
`docker`	Docker image building utilities
`problems`	Problem inspection and registry commands
`tools`	Interactive tools and case runners
`viz`	Visualization tools (diff viewer)

Global Options

These options are available on all commands:

Option	Type	Default	Description
`-v, --verbose`	flag	0	Increase verbosity (repeatable)
`--seed`	int	42	Random seed
`-p, --problem-path`	path	`problems/`	Path to problem directory
`--overwrite`	flag	false	Overwrite existing output directory
`--debug`	flag	false	Enable debugging mode
`--snapshot-dir-name`	string	`snapshot`	Name of the snapshot directory

Command Categories

Core Workflow

Running agents:

# Run with config file
slop-code run --config my_run.yaml --problem file_backup

# Run with CLI flags
slop-code run --model anthropic/sonnet-4.5 --problem file_backup

Evaluating results:

# Evaluate all problems in a run
slop-code eval outputs/my_run

# Evaluate a single problem
slop-code eval-problem outputs/my_run/file_backup

# Evaluate a single snapshot
slop-code eval-snapshot outputs/my_run/file_backup/checkpoint_1/snapshot \
  -o outputs/eval -p file_backup -c 1 -e configs/environments/docker-python3.12-uv.yaml

Metrics and Analysis

# Calculate static code quality metrics
slop-code metrics static outputs/my_run

# Run LLM judge evaluation
slop-code metrics judge outputs/my_run -r configs/rubrics/slop.jsonl -m anthropic/sonnet-4.5

# Compute variance across runs
slop-code metrics variance base outputs/runs -o outputs/variance

Utilities

# Backfill reports for existing runs
slop-code utils backfill-reports outputs/my_run

# Combine results from multiple runs
slop-code utils combine-results outputs/all_runs -o outputs/combined.jsonl

Docker Management

# Build base image
slop-code docker build-base configs/environments/docker-python3.12-uv.yaml

# Build agent image
slop-code docker build-agent configs/agents/claude_code-2.0.51.yaml configs/environments/docker-python3.12-uv.yaml

Problem Inspection

# List all problems
slop-code problems ls

# Check problem conversion status
slop-code problems status file_backup

Test Case Runner

# Run pytest tests for a snapshot
slop-code tools run-case -s outputs/snapshot -p file_backup -c 1 -e configs/environments/docker-python3.12-uv.yaml

Visualization

# Launch diff viewer for a run
slop-code viz diff outputs/my_run

Documentation Index

Document	Description
run.md	Comprehensive guide to `slop-code run` with configuration system
eval.md	Evaluating run directories
eval-problem.md	Evaluating single problems
eval-snapshot.md	Evaluating single snapshots
infer-problem.md	Running inference on single problems
metrics.md	All metrics subcommands
utils.md	All utility subcommands
docker.md	Docker image management
problems.md	Problem inspection commands
tools.md	Interactive tools
viz.md	Visualization tools

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

CLI Commands Reference

Quick Reference

Global Options

Command Categories

Core Workflow

Metrics and Analysis

Utilities

Docker Management

Problem Inspection

Test Case Runner

Visualization

Documentation Index

See Also

FilesExpand file tree

commands

Directory actions

More options

Directory actions

More options

Latest commit

History

commands

Folders and files

parent directory

README.md

CLI Commands Reference

Quick Reference

Global Options

Command Categories

Core Workflow

Metrics and Analysis

Utilities

Docker Management

Problem Inspection

Test Case Runner

Visualization

Documentation Index

See Also