AutoSpec

Automated Specification Generation for C Programs using LLMs and Frama-C verification.

Overview

AutoSpec is a tool that automatically generates and verifies ACSL (ANSI/ISO C Specification Language) specifications for C programs. It combines:

Static Analysis: Decompose C programs into verifiable components
LLM-based Generation: Generate ACSL contracts / loop annotations via an LLM
Formal Verification: Verify specifications using Frama-C's WP (Weakest Precondition) plugin
Iterative Refinement: Strengthen or weaken specifications based on verification feedback

AutoSpec currently supports verification of the frama-c-problems benchmark suite, with x509-parser support planned for future releases.

Installation

Build the Docker image:

docker build -t autospec:dev .

Run the container:

docker run -dit --name autospec --gpus all --network host -v $(pwd):/workspace autospec:dev

docker exec -it autospec /bin/bash

Note

The README assumes the repo is mounted at /workspace inside the container.

Verify the installation:

./scripts/run_frama_c_problems.sh

This will run verification on benchmarks in benchmarks/frama-c-problems/ground-truth to verify that Frama-C and AutoSpec are working correctly.

Automated Spec Generation with vLLM

This is the core end-to-end workflow: LLM → insert ACSL → verify.

1) Start an OpenAI-compatible vLLM server

In a terminal inside Docker (or on the host if you prefer), start the model server:

python3 -m vllm.entrypoints.openai.api_server \
  --model Qwen/Qwen3-32B \
  --port 8000 \
  --dtype auto

Note

If you run vLLM inside Docker, make sure the container has GPU access (e.g., started with --gpus all).
scripts/gen_specs.py talks to an OpenAI-compatible Chat Completions endpoint.
If your endpoint requires auth, set OPENAI_API_KEY (it will be sent as a Bearer token).

2) Run generation (and verify)

In a second terminal inside Docker:

python3 scripts/gen_specs.py \
  --input-dir benchmarks/frama-c-problems/test-inputs \
  --output-dir outputs/annotated \
  --model Qwen/Qwen3-32B \
  --endpoint http://localhost:8000/v1/chat/completions \
  --verify

Key details:

The script recursively processes all .c files under --input-dir.
It annotates one function/loop at a time by inserting a returned /*@ ... */ block immediately before the target node.
With --verify, it calls the AutoSpec verifier on each produced file.
If verification fails, it automatically enters a final feedback correction loop (up to 3 attempts) where Frama-C/WP output is fed back to the LLM to repair ACSL only.

3) Check results (verify an outputs directory)

After generation, you can verify (or re-verify) everything under an output directory:

./scripts/run_frama_c_problems.sh -d outputs/annotated #(add -v to get verbose outputs)

Results

Below is a side-by-side comparison on benchmarks/frama-c-problems/test-inputs (51 programs):

Metric	AutoSpec (initial)	AutoSpec + final feedback loop
Programs passed	21	24
Programs failed	30	27
Pass rate	41.2%	47.1%

What is the “final feedback loop”? After a failed verification run, we feed the Frama-C/WP error output back into the LLM and ask it to repair ACSL only (no C code changes), then re-verify.

Per-program comparison (51 programs)

Click to expand

Program	AutoSpec	+ Final feedback
`arrays_and_loops/1.c`	PASS	FAIL
`arrays_and_loops/2.c`	FAIL	PASS
`arrays_and_loops/3.c`	PASS	PASS
`arrays_and_loops/4.c`	FAIL	FAIL
`arrays_and_loops/5.c`	FAIL	FAIL
`general_wp_problems/absolute_value.c`	PASS	PASS
`general_wp_problems/add.c`	PASS	PASS
`general_wp_problems/ani.c`	FAIL	FAIL
`general_wp_problems/diff.c`	PASS	PASS
`general_wp_problems/gcd.c`	PASS	PASS
`general_wp_problems/max_of_2.c`	PASS	PASS
`general_wp_problems/power.c`	FAIL	FAIL
`general_wp_problems/simple_interest.c`	PASS	PASS
`general_wp_problems/swap.c`	PASS	PASS
`general_wp_problems/triangle_angles.c`	FAIL	FAIL
`general_wp_problems/triangle_sides.c`	PASS	PASS
`general_wp_problems/wp1.c`	FAIL	FAIL
`immutable_arrays/array_sum.c`	FAIL	FAIL
`immutable_arrays/binary_search.c`	FAIL	FAIL
`immutable_arrays/check_evens_in_array.c`	PASS	PASS
`immutable_arrays/max.c`	FAIL	PASS
`immutable_arrays/occurences_of_x.c`	FAIL	FAIL
`immutable_arrays/sample.c`	FAIL	FAIL
`immutable_arrays/search.c`	PASS	PASS
`immutable_arrays/search_2.c`	PASS	PASS
`loops/1.c`	FAIL	FAIL
`loops/2.c`	FAIL	FAIL
`loops/3.c`	PASS	FAIL
`loops/4.c`	FAIL	PASS
`loops/fact.c`	FAIL	FAIL
`loops/mult.c`	FAIL	FAIL
`loops/sum_digits.c`	FAIL	FAIL
`loops/sum_even.c`	FAIL	FAIL
`miscellaneous/array_find.c`	PASS	PASS
`miscellaneous/array_max_advanced.c`	FAIL	FAIL
`miscellaneous/array_swap.c`	PASS	PASS
`miscellaneous/increment_arr.c`	FAIL	FAIL
`miscellaneous/max_of_2.c`	PASS	PASS
`more_arrays/equal_arrays.c`	FAIL	PASS
`more_arrays/replace_evens.c`	FAIL	FAIL
`more_arrays/reverse_array.c`	FAIL	FAIL
`mutable_arrays/array_double.c`	FAIL	FAIL
`mutable_arrays/bubble_sort.c`	FAIL	FAIL
`pointers/add_pointers.c`	FAIL	FAIL
`pointers/add_pointers_3_vars.c`	FAIL	FAIL
`pointers/div_rem.c`	PASS	PASS
`pointers/incr_a_by_b.c`	FAIL	PASS
`pointers/max_pointers.c`	PASS	PASS
`pointers/order_3.c`	FAIL	FAIL
`pointers/reset_1st.c`	PASS	PASS
`pointers/swap.c`	PASS	PASS

How to Run Verification Manually (Frama-C / WP)

Running Ground Truth Benchmarks

# Run all benchmark categories
./scripts/run_frama_c_problems.sh

Note

If you are not already inside the container shell, enter it first:

docker exec -it autospec /bin/bash

If frama-c is not found inside the container, run:

opam init
eval $(opam env)

Verify a single file (ground truth or your own C file):

python3 -m autospec.cli.main verify benchmarks/frama-c-problems/ground-truth/loops/1.c --verbose

Custom Timeout

python3 -m autospec.cli.main verify file.c --timeout 120

CLI Help

python3 -m autospec.cli.main --help
python3 -m autospec.cli.main verify --help

Benchmark Suites:

AutoSpec includes comprehensive benchmark suites for evaluation:

# Run all benchmarks (frama-c-problems + x509-parser)
./scripts/run_all_benchmarks.sh

# Run only frama-c-problems (~51 programs)
./scripts/run_all_benchmarks.sh -o frama-c

# Skip x509-parser for faster testing
./scripts/run_all_benchmarks.sh -s

# Test specific category
./scripts/run_frama_c_problems.sh loops
./scripts/run_frama_c_problems.sh arrays_and_loops -v

# Test x509-parser only
./scripts/run_x509_parser.sh

See benchmarks/README.md for detailed documentation.

Adding New C Programs

Create a C file under benchmarks/frama-c-problems/ (or anywhere you like).
Add ACSL annotations (preconditions, postconditions, loop invariants).
Verify with AutoSpec:

python3 -m autospec.cli.main verify benchmarks/frama-c-problems/your_file.c

Example ACSL Annotation

/*@
  @ requires n > 0;
  @ requires \valid_read(arr + (0..n-1));
  @ ensures \result >= arr[0];
  @ ensures \forall integer i; 0 <= i < n ==> \result >= arr[i];
  @*/
int array_max(int *arr, int n) {
    // ... implementation
}

Configuration

Edit autospec/config.py to customize:

FRAMA_C_COMMAND: Path to Frama-C executable
FRAMA_C_TIMEOUT: Overall verification timeout (default: 60s)
FRAMA_C_WP_TIMEOUT: Per-proof timeout (default: 10s)
LOG_LEVEL: Logging verbosity

Or use environment variables:

export FRAMA_C_TIMEOUT=120
export FRAMA_C_WP_TIMEOUT=20
export VERBOSE=true

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
autospec		autospec
benchmarks		benchmarks
scripts		scripts
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
autospec.pdf		autospec.pdf
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AutoSpec

Overview

Installation

Automated Spec Generation with vLLM

1) Start an OpenAI-compatible vLLM server

2) Run generation (and verify)

3) Check results (verify an outputs directory)

Results

Per-program comparison (51 programs)

How to Run Verification Manually (Frama-C / WP)

Running Ground Truth Benchmarks

Custom Timeout

CLI Help

Adding New C Programs

Example ACSL Annotation

Configuration

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AutoSpec

Overview

Installation

Automated Spec Generation with vLLM

1) Start an OpenAI-compatible vLLM server

2) Run generation (and verify)

3) Check results (verify an outputs directory)

Results

Per-program comparison (51 programs)

How to Run Verification Manually (Frama-C / WP)

Running Ground Truth Benchmarks

Custom Timeout

CLI Help

Adding New C Programs

Example ACSL Annotation

Configuration

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages