Automated Specification Generation for C Programs using LLMs and Frama-C verification.
AutoSpec is a tool that automatically generates and verifies ACSL (ANSI/ISO C Specification Language) specifications for C programs. It combines:
- Static Analysis: Decompose C programs into verifiable components
- LLM-based Generation: Generate ACSL contracts / loop annotations via an LLM
- Formal Verification: Verify specifications using Frama-C's WP (Weakest Precondition) plugin
- Iterative Refinement: Strengthen or weaken specifications based on verification feedback
AutoSpec currently supports verification of the frama-c-problems benchmark suite, with x509-parser support planned for future releases.
- Build the Docker image:
docker build -t autospec:dev .- Run the container:
docker run -dit --name autospec --gpus all --network host -v $(pwd):/workspace autospec:dev
docker exec -it autospec /bin/bashNote
- The README assumes the repo is mounted at
/workspaceinside the container.
- Verify the installation:
./scripts/run_frama_c_problems.shThis will run verification on benchmarks in benchmarks/frama-c-problems/ground-truth to verify that Frama-C and AutoSpec are working correctly.
This is the core end-to-end workflow: LLM → insert ACSL → verify.
In a terminal inside Docker (or on the host if you prefer), start the model server:
python3 -m vllm.entrypoints.openai.api_server \
--model Qwen/Qwen3-32B \
--port 8000 \
--dtype autoNote
- If you run vLLM inside Docker, make sure the container has GPU access (e.g., started with
--gpus all). scripts/gen_specs.pytalks to an OpenAI-compatible Chat Completions endpoint.- If your endpoint requires auth, set
OPENAI_API_KEY(it will be sent as a Bearer token).
In a second terminal inside Docker:
python3 scripts/gen_specs.py \
--input-dir benchmarks/frama-c-problems/test-inputs \
--output-dir outputs/annotated \
--model Qwen/Qwen3-32B \
--endpoint http://localhost:8000/v1/chat/completions \
--verifyKey details:
- The script recursively processes all
.cfiles under--input-dir. - It annotates one function/loop at a time by inserting a returned
/*@ ... */block immediately before the target node. - With
--verify, it calls the AutoSpec verifier on each produced file. - If verification fails, it automatically enters a final feedback correction loop (up to 3 attempts) where Frama-C/WP output is fed back to the LLM to repair ACSL only.
After generation, you can verify (or re-verify) everything under an output directory:
./scripts/run_frama_c_problems.sh -d outputs/annotated #(add -v to get verbose outputs)Below is a side-by-side comparison on benchmarks/frama-c-problems/test-inputs (51 programs):
| Metric | AutoSpec (initial) | AutoSpec + final feedback loop |
|---|---|---|
| Programs passed | 21 | 24 |
| Programs failed | 30 | 27 |
| Pass rate | 41.2% | 47.1% |
What is the “final feedback loop”? After a failed verification run, we feed the Frama-C/WP error output back into the LLM and ask it to repair ACSL only (no C code changes), then re-verify.
Click to expand
| Program | AutoSpec | + Final feedback |
|---|---|---|
arrays_and_loops/1.c |
PASS | FAIL |
arrays_and_loops/2.c |
FAIL | PASS |
arrays_and_loops/3.c |
PASS | PASS |
arrays_and_loops/4.c |
FAIL | FAIL |
arrays_and_loops/5.c |
FAIL | FAIL |
general_wp_problems/absolute_value.c |
PASS | PASS |
general_wp_problems/add.c |
PASS | PASS |
general_wp_problems/ani.c |
FAIL | FAIL |
general_wp_problems/diff.c |
PASS | PASS |
general_wp_problems/gcd.c |
PASS | PASS |
general_wp_problems/max_of_2.c |
PASS | PASS |
general_wp_problems/power.c |
FAIL | FAIL |
general_wp_problems/simple_interest.c |
PASS | PASS |
general_wp_problems/swap.c |
PASS | PASS |
general_wp_problems/triangle_angles.c |
FAIL | FAIL |
general_wp_problems/triangle_sides.c |
PASS | PASS |
general_wp_problems/wp1.c |
FAIL | FAIL |
immutable_arrays/array_sum.c |
FAIL | FAIL |
immutable_arrays/binary_search.c |
FAIL | FAIL |
immutable_arrays/check_evens_in_array.c |
PASS | PASS |
immutable_arrays/max.c |
FAIL | PASS |
immutable_arrays/occurences_of_x.c |
FAIL | FAIL |
immutable_arrays/sample.c |
FAIL | FAIL |
immutable_arrays/search.c |
PASS | PASS |
immutable_arrays/search_2.c |
PASS | PASS |
loops/1.c |
FAIL | FAIL |
loops/2.c |
FAIL | FAIL |
loops/3.c |
PASS | FAIL |
loops/4.c |
FAIL | PASS |
loops/fact.c |
FAIL | FAIL |
loops/mult.c |
FAIL | FAIL |
loops/sum_digits.c |
FAIL | FAIL |
loops/sum_even.c |
FAIL | FAIL |
miscellaneous/array_find.c |
PASS | PASS |
miscellaneous/array_max_advanced.c |
FAIL | FAIL |
miscellaneous/array_swap.c |
PASS | PASS |
miscellaneous/increment_arr.c |
FAIL | FAIL |
miscellaneous/max_of_2.c |
PASS | PASS |
more_arrays/equal_arrays.c |
FAIL | PASS |
more_arrays/replace_evens.c |
FAIL | FAIL |
more_arrays/reverse_array.c |
FAIL | FAIL |
mutable_arrays/array_double.c |
FAIL | FAIL |
mutable_arrays/bubble_sort.c |
FAIL | FAIL |
pointers/add_pointers.c |
FAIL | FAIL |
pointers/add_pointers_3_vars.c |
FAIL | FAIL |
pointers/div_rem.c |
PASS | PASS |
pointers/incr_a_by_b.c |
FAIL | PASS |
pointers/max_pointers.c |
PASS | PASS |
pointers/order_3.c |
FAIL | FAIL |
pointers/reset_1st.c |
PASS | PASS |
pointers/swap.c |
PASS | PASS |
# Run all benchmark categories
./scripts/run_frama_c_problems.shNote
If you are not already inside the container shell, enter it first:
docker exec -it autospec /bin/bashIf frama-c is not found inside the container, run:
opam init
eval $(opam env)Verify a single file (ground truth or your own C file):
python3 -m autospec.cli.main verify benchmarks/frama-c-problems/ground-truth/loops/1.c --verbosepython3 -m autospec.cli.main verify file.c --timeout 120python3 -m autospec.cli.main --help
python3 -m autospec.cli.main verify --helpBenchmark Suites:
AutoSpec includes comprehensive benchmark suites for evaluation:
# Run all benchmarks (frama-c-problems + x509-parser)
./scripts/run_all_benchmarks.sh
# Run only frama-c-problems (~51 programs)
./scripts/run_all_benchmarks.sh -o frama-c
# Skip x509-parser for faster testing
./scripts/run_all_benchmarks.sh -s
# Test specific category
./scripts/run_frama_c_problems.sh loops
./scripts/run_frama_c_problems.sh arrays_and_loops -v
# Test x509-parser only
./scripts/run_x509_parser.shSee benchmarks/README.md for detailed documentation.
- Create a C file under
benchmarks/frama-c-problems/(or anywhere you like). - Add ACSL annotations (preconditions, postconditions, loop invariants).
- Verify with AutoSpec:
python3 -m autospec.cli.main verify benchmarks/frama-c-problems/your_file.c/*@
@ requires n > 0;
@ requires \valid_read(arr + (0..n-1));
@ ensures \result >= arr[0];
@ ensures \forall integer i; 0 <= i < n ==> \result >= arr[i];
@*/
int array_max(int *arr, int n) {
// ... implementation
}Edit autospec/config.py to customize:
FRAMA_C_COMMAND: Path to Frama-C executableFRAMA_C_TIMEOUT: Overall verification timeout (default: 60s)FRAMA_C_WP_TIMEOUT: Per-proof timeout (default: 10s)LOG_LEVEL: Logging verbosity
Or use environment variables:
export FRAMA_C_TIMEOUT=120
export FRAMA_C_WP_TIMEOUT=20
export VERBOSE=true