Skip to content

feat(bench): add csearch benchmark harness#68

Draft
Liam-Deacon wants to merge 5 commits intofeat/search-defrom
feat/search-bench
Draft

feat(bench): add csearch benchmark harness#68
Liam-Deacon wants to merge 5 commits intofeat/search-defrom
feat/search-bench

Conversation

@Liam-Deacon
Copy link
Owner

@Liam-Deacon Liam-Deacon commented Dec 31, 2025

Problem

  • Add a reproducible benchmark harness for csearch runs across datasets, seeds, and optimizers.

Solution

  • Introduce a JSON dataset manifest plus runner script that stages inputs, runs csearch, and captures traces/summary outputs.
  • Add a plotting helper script and a report template for publishing results.
  • Document usage and ignore generated benchmark outputs.

Testing

  • python3 tools/benchmarks/run_benchmarks.py --dry-run --leed /usr/bin/true --rfac /usr/bin/true --optimizers si --seeds 1 --max-evals 1 --max-iters 1 --output-dir benchmarks/out
  • python3 -m pre_commit run --all-files

Links

Follow-ups

  • Extend manifest to cover real datasets and recorded baselines.

Summary by Sourcery

Add a benchmark harness for running and analyzing csearch sweeps across datasets, optimizers, and seeds.

New Features:

  • Introduce a manifest-driven benchmark runner that stages inputs, executes csearch runs, and records per-run summaries and traces.
  • Add a plotting script to visualize convergence traces per dataset from benchmark outputs.
  • Provide a JSON benchmark manifest, usage documentation, and a markdown report template for organizing benchmark results.

Documentation:

  • Document the benchmark harness manifest format, usage, and plotting workflow in benchmarks/README.md.

Tests:

  • Add a stub benchmark manifest wired to existing test fixtures for lightweight, reproducible benchmark runs.

Chores:

  • Ignore generated benchmark output directories in version control.

Add particle swarm optimization with new optimizer entry, config limits, and registry aliases.
Update csearch docs/man page for PSO usage and budgets.
Add PSO regression test to validate convergence.

Tests: cmake --build build --target test_search_pso; ctest --test-dir build --output-on-failure -R search.pso; python3 -m pre_commit run --all-files
Allow PSO swarm size, coefficients, and vmax to be configured via CLI/env.
Propagate settings into PSO run config and log output.
Update csearch help/docs and config env test.

Tests: cmake --build build --target test_search_optimizer test_search_pso; ctest --test-dir build --output-on-failure -R 'search.optimizer|search.pso'; python3 -m pre_commit run --all-files
Implement DE optimizer and register it in the optimizer registry with the 'de' alias.
Add DE config defaults, limits, and logging; update csearch help/docs/man page.
Add DE regression test and extend optimizer lookup coverage.

Tests: cmake -S . -B build -DCMAKE_BUILD_TYPE=Release; cmake --build build --target test_search_de test_search_optimizer; ctest --test-dir build --output-on-failure -R 'search.de|search.optimizer'; python3 -m pre_commit run --all-files
Add a JSON manifest format, runner script, and plot helper for csearch benchmarks.
Include report template and README plus gitignore entries for output.

Tests: python3 tools/benchmarks/run_benchmarks.py --dry-run --leed /usr/bin/true --rfac /usr/bin/true --optimizers si --seeds 1 --max-evals 1 --max-iters 1 --output-dir benchmarks/out; python3 -m pre_commit run --all-files
@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Dec 31, 2025

Reviewer's Guide

Adds a reproducible CSEARCH benchmarking harness, including a manifest‑driven runner that stages datasets and executes sweeps, a plotting script for convergence curves, and documentation/templates for recording and sharing benchmark results while ignoring generated outputs.

Sequence diagram for the CSEARCH benchmark sweep runner

sequenceDiagram
    actor User
    participant RunBenchmarksCLI as run_benchmarks_cli
    participant Runner as run_benchmarks_py
    participant Csearch as csearch_process
    participant FS as filesystem

    User->>RunBenchmarksCLI: invoke with manifest, seeds, optimizers
    RunBenchmarksCLI->>Runner: main() parses args and calls run_benchmarks(args)

    Runner->>FS: load_manifest(manifest.json)
    FS-->>Runner: datasets with resolved paths

    Runner->>Runner: parse_seeds(args)
    Runner->>Runner: parse optimizers list
    Runner->>Runner: ensure_program for CSEARCH_LEED and CSEARCH_RFAC

    Runner->>FS: create output_root directory with timestamped run_id
    Runner->>FS: copy manifest.json into output_root

    loop for each dataset
        Runner->>FS: validate dataset input file exists
        loop for each optimizer
            loop for each seed
                Runner->>FS: create run_dir dataset/optimizer/seed
                Runner->>FS: copy input, bulk, control, extra_files into run_dir

                Runner->>Runner: build csearch command line
                Runner->>Runner: build environment with CSEARCH_LEED, CSEARCH_RFAC

                alt dry_run enabled
                    Runner->>Runner: simulate exit_code, stdout, stderr
                else execute csearch
                    Runner->>Csearch: subprocess.run(cmd, cwd=run_dir, env=env)
                    Csearch-->>Runner: exit_code, stdout, stderr
                end

                Runner->>FS: write run.stdout and run.stderr in run_dir

                Runner->>FS: read project.log in run_dir via parse_log()
                FS-->>Runner: rmin, reported_iters, trace

                alt trace not empty
                    Runner->>FS: write trace.csv via write_trace_csv(trace)
                else
                    Runner->>Runner: set trace_path to None
                end

                Runner->>Runner: derive evals from last trace point
                Runner->>Runner: append result row to results list
            end
        end
    end

    Runner->>FS: write summary.json with results
    Runner->>FS: write summary.csv with results

    Runner-->>RunBenchmarksCLI: return output_root
    RunBenchmarksCLI-->>User: print benchmark results location
Loading

Entity relationship diagram for the benchmark manifest schema

erDiagram
    Manifest {
        string path
    }

    Dataset {
        string name
        string description
        string input
        string bulk
        string control
        float delta
        string extra_files
    }

    Manifest ||--o{ Dataset : contains

    %% Notes on fields (as attributes):
    %% - input: required path to .inp file (resolved relative to manifest)
    %% - bulk: optional path to .bul file
    %% - control: optional path to .ctr file
    %% - delta: optional displacement value passed as -d
    %% - extra_files: optional list of additional paths copied into run_dir
Loading

Flow diagram for the CSEARCH benchmarking and reporting pipeline

flowchart LR
    subgraph BenchConfig["Benchmark configuration"]
        M["benchmarks/manifest.json"]
        RBPY["tools/benchmarks/run_benchmarks.py"]
    end

    subgraph BenchOutputs["Benchmark run outputs"]
        OUTROOT["benchmarks/out/<run_id>/"]
        SUMJSON["summary.json"]
        SUMCSV["summary.csv"]
        TRACECSV["trace.csv (per run)"]
        LOGS["*.log, run.stdout, run.stderr"]
    end

    subgraph Plotting["Plotting and reporting"]
        PBPY["tools/benchmarks/plot_benchmarks.py"]
        PLOTS["benchmarks/plots/<dataset>_convergence.png"]
        REPORTTPL["benchmarks/report_template.md"]
    end

    M --> RBPY
    RBPY --> OUTROOT
    OUTROOT --> SUMJSON
    OUTROOT --> SUMCSV
    OUTROOT --> TRACECSV
    OUTROOT --> LOGS

    SUMJSON --> PBPY
    PBPY --> PLOTS

    SUMCSV --> REPORTTPL
    PLOTS --> REPORTTPL

    classDef config fill:#e7f0ff,stroke:#4a78c2
    classDef outputs fill:#e8ffe7,stroke:#3c9a3c
    classDef plotting fill:#fff3cd,stroke:#b7950b

    class M,RBPY config
    class OUTROOT,SUMJSON,SUMCSV,TRACECSV,LOGS outputs
    class PBPY,PLOTS,REPORTTPL plotting
Loading

File-Level Changes

Change Details Files
Introduce a manifest-driven benchmark runner that stages datasets, runs CSEARCH sweeps over optimizers and seeds, parses logs, and writes structured summaries and traces.
  • Load and normalize dataset definitions from a JSON manifest, resolving relative paths and validating presence of datasets and files.
  • Parse CSEARCH log files to extract rmin, iteration counts, and evaluation-by-evaluation rt values, writing them to a per-run trace.csv.
  • Sweep over datasets, optimizers, and seeds while staging required input files into per-run directories and constructing the CSEARCH command-line including optional delta, max-evals, and max-iters.
  • Execute CSEARCH (or simulate in dry-run mode) with LEED/RFAC paths taken from CLI or environment, capturing stdout/stderr, timing runs, and recording status and metadata.
  • Aggregate all runs into summary.json and summary.csv under a timestamped output root and print the final location to stdout.
tools/benchmarks/run_benchmarks.py
Add a plotting utility to visualize convergence traces per dataset from benchmark summaries.
  • Load summary.json and group runs by dataset, skipping entries with missing trace files.
  • Load per-run trace.csv, compute best-so-far R sequences, and plot convergence curves for each optimizer/seed combination using matplotlib.
  • Emit per-dataset PNG plots into a configurable output directory and print the location of generated plots.
tools/benchmarks/plot_benchmarks.py
Document the benchmark harness, its manifest format, and provide a report template and example manifest.
  • Describe the benchmark harness workflow, manifest schema, run_benchmarks usage, and plotting commands in a new benchmarks README.
  • Provide a markdown report template outlining sections for benchmark objectives, configuration, datasets, results, and follow-ups.
  • Add an example manifest.json pointing at an existing csearch_stub test fixture dataset to serve as a minimal runnable benchmark setup.
benchmarks/README.md
benchmarks/report_template.md
benchmarks/manifest.json

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@coderabbitai
Copy link

coderabbitai bot commented Dec 31, 2025

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.


Comment @coderabbitai help to get the list of available commands and usage tips.

@codacy-production
Copy link

Codacy's Analysis Summary

19 new issues (≤ 0 issue)
3 new security issues
66 complexity
0 duplications

Review Pull Request in Codacy →

AI Reviewer available: add the codacy-review label to get contextual insights without leaving GitHub.

@Liam-Deacon
Copy link
Owner Author

Tried running the benchmark harness against the Ni111_Cu example.

  • Run 1 failed: missing CLEED_PHASE.
  • Run 2 (with CLEED_PHASE and all .fsm/.smo2 extras copied) still fails in crfac:
    *** error (cr_rdcleed): numbers of energies do not match (lines: 0/n_eng: 108)

Artifacts are under:
/Users/liam/repos/cleed-wt/search-bench/benchmarks/out/run_20251231_124530/Ni111_Cu/si/seed_1

I can retry with a different example or additional inputs if you can point me to a known-good dataset for csearch.

@Liam-Deacon
Copy link
Owner Author

Filed TODO issue for the example-run failure: #70

@Liam-Deacon Liam-Deacon force-pushed the feat/search-de branch 9 times, most recently from d72e121 to e2e86a0 Compare January 9, 2026 17:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant