AgentOpt · guru-code-expert · Feb 10, 2026 · Feb 10, 2026 · Feb 11, 2026 · Feb 11, 2026
diff --git a/.gitignore b/.gitignore
@@ -4,4 +4,11 @@ __pycache__/
 external/*
 **/uv.lock
 *.egg-info/
-**/.venv/
+**/.venv/
+.env
+runs/
+runs_test/
+notebooks/01_smoke_runner_with_output.ipynb
+notebooks/01_m1_minimal_api_with_output.ipynb
+/.tmp_runs_run
+/.tmp_runs_validate
diff --git a/README.md b/README.md
@@ -5,6 +5,70 @@ Currently, we are adding problems/domains one folder at a time.
 
 The instructions to run each task are located inside the task folder.
 
+## Quick Start (Runner/CLI)
+
+```bash
+# M1 review checklist (recommended order)
+# 1) List tasks (LLM4AD + example stubs)
+trace-bench list-tasks --root LLM4AD/benchmark_tasks
+
+# 2) Validate a config
+trace-bench validate --config configs/smoke.yaml
+
+# 3) Run Stub smoke (deterministic, no keys)
+trace-bench run --config configs/smoke.yaml --runs-dir runs
+
+# 4) Run Real smoke (requires OPENAI_API_KEY)
+trace-bench run --config configs/smoke_real.yaml --runs-dir runs
+
+# 5) Run tests (disable external plugin autoload)
+PYTEST_DISABLE_PLUGIN_AUTOLOAD=1 pytest -q
+
+# List tasks (LLM4AD + example stubs)
+trace-bench list-tasks --root LLM4AD/benchmark_tasks
+
+# Validate a config
+trace-bench validate --config configs/smoke.yaml
+
+# Run a smoke benchmark
+trace-bench run --config configs/smoke.yaml
+
+# Launch UI (stub)
+trace-bench ui --runs-dir runs
+```
+
+Expected run artifacts:
+- `runs/<run_id>/config.snapshot.yaml`
+- `runs/<run_id>/env.json`
+- `runs/<run_id>/results.csv`
+- `runs/<run_id>/events.jsonl`
+- `runs/<run_id>/summary.json`
+- `runs/<run_id>/tb/`
+
+## M1 Dependencies (Required for Full Pass)
+
+System:
+- Graphviz (system package)
+
+Python:
+- `graphviz`, `pyyaml`, `pytest`, `numpy`, `matplotlib`, `litellm==1.75.0`
+
+OpenTrace examples strict smoke (for 100% pass):
+- `datasets`, `textgrad`, `dspy`, `autogen`, `python-dotenv`
+
+## OpenTrace Examples Smoke (100% Pass Mode)
+
+To enforce 100% example smoke in CI, run:
+```bash
+TRACE_BENCH_STRICT_EXAMPLES=1 PYTEST_DISABLE_PLUGIN_AUTOLOAD=1 pytest -q
+```
+Without strict mode, the smoke test skips only when optional deps are missing.
+
+## VeriBench Status (In Scope, Pending Input)
+
+VeriBench is in scope but requires the Trace team to provide the task entrypoint/task list.
+CLI flags are ready (`--bench veribench`); when the entrypoint is unavailable, tasks are skipped with a structured reason rather than raising.
+
 ## Problem Sets
 
 ### General Problem Sets
@@ -27,9 +91,9 @@ Current implementation of graph is a single node.
 
 **Supported Algorithms:** PrioritySearch, GEPA-Base, GEPA-UCB, GEPA-Beam
 
-📖 **[See detailed usage guide →](LM4AD/readme.md)**
+**See detailed usage guide:** `LM4AD/readme.md`
 
 ## Agent Architecture
 - ReAct agent
 
-All the libraries from other repos are stored and managed in the `external` folder -- this folder will be created if one of the `install.sh` script is run inside the task folder.
+All the libraries from other repos are stored and managed in the `external` folder -- this folder will be created if one of the `install.sh` script is run inside the task folder.
diff --git a/configs/m1_matrix_smoke.yaml b/configs/m1_matrix_smoke.yaml
@@ -0,0 +1,24 @@
+runs_dir: runs
+mode: stub
+seeds: [123]
+max_workers: 1
+fail_fast: false
+
+tasks:
+  - id: internal:numeric_param
+  - id: llm4ad:circle_packing
+    eval_kwargs:
+      timeout_seconds: 10
+
+trainers:
+  - id: PrioritySearch
+    params_variants:
+      - ps_steps: 1
+        ps_batches: 1
+
+  - id: GEPA-Base
+    params_variants:
+      - gepa_iters: 1
+        gepa_train_bs: 2
+        gepa_merge_every: 2
+        gepa_pareto_subset: 2
diff --git a/configs/m1_validation.yaml b/configs/m1_validation.yaml
@@ -0,0 +1,55 @@
+runs_dir: runs
+mode: stub
+seeds: [123]
+max_workers: 1
+fail_fast: false
+
+tasks:
+  - id: internal:code_param
+  - id: internal:numeric_param
+  - id: internal:multi_param
+  - id: internal:non_trainable
+  - id: trace_examples:greeting_stub
+  - id: llm4ad:circle_packing
+    eval_kwargs:
+      timeout_seconds: 10
+  - id: veribench:smoke_placeholder
+
+trainers:
+  - id: PrioritySearch
+    params_variants:
+      - threads: 2
+        ps_steps: 1
+        ps_batches: 1
+        ps_candidates: 2
+        ps_proposals: 2
+        ps_mem_update: 1
+
+  - id: GEPA-Base
+    params_variants:
+      - threads: 2
+        gepa_iters: 1
+        gepa_train_bs: 2
+        gepa_merge_every: 2
+        gepa_pareto_subset: 2
+    optimizer: OPROv2
+    optimizer_kwargs: {}
+
+  - id: GEPA-UCB
+    params_variants:
+      - threads: 2
+        gepa_iters: 1
+        gepa_train_bs: 2
+        gepa_merge_every: 2
+        gepa_pareto_subset: 2
+
+  - id: GEPA-Beam
+    params_variants:
+      - threads: 2
+        gepa_iters: 1
+        gepa_train_bs: 2
+        gepa_merge_every: 2
+        gepa_pareto_subset: 2
+
+eval_kwargs:
+  timeout_seconds: 10
diff --git a/configs/smoke.yaml b/configs/smoke.yaml
@@ -0,0 +1,12 @@
+runs_dir: runs
+mode: stub
+seeds: [123]
+
+tasks:
+  - id: internal:numeric_param
+
+trainers:
+  - id: PrioritySearch
+    params_variants:
+      - ps_steps: 1
+        ps_batches: 1
diff --git a/configs/smoke_real.yaml b/configs/smoke_real.yaml
@@ -0,0 +1,12 @@
+runs_dir: runs
+mode: real
+seeds: [123]
+
+tasks:
+  - id: trace_examples:greeting_stub
+
+trainers:
+  - id: PrioritySearch
+    params_variants:
+      - ps_steps: 1
+        ps_batches: 1