Precision Extraction from LLM API Logprobs

This repository contains code and data for inferring the internal floating-point precision of language models from API-exposed logprobs.

Blog post: TODO

Quick Start

# Compile the attack
make

# Run on cached OpenAI data
python src/eval.py --openai

# Run on cached Gemini data
python src/eval.py --gemini

# Run on synthetic ground-truth data
python src/eval.py --synthetic

# Infer precision from raw logprobs
./inverted_search -1.5 -0.77 -3.02 -4.1 -5.2

Results

OpenAI Models

Model	Precision	Agreement
gpt-3.5-turbo	FP32	100%
gpt-4	FP32	100%
gpt-4-turbo	FP32	100%
gpt-4o	BF16	97%
gpt-4o-mini	BF16	100%
gpt-4.1	BF16	100%
gpt-4.1-mini	BF16	100%
gpt-4.1-nano	BF16	100%

Gemini Models

Model	Precision	Agreement
gemini-2.0-flash	FP32	100%
gemini-2.0-flash-lite	FP32	100%

Note: Gemini 2.5+ models do not expose logprobs.

Synthetic Validation (GPT-Neo with quantized logits)

Precision	Accuracy	Notes
FP32	100%
FP16	100%
BF16	100%
FP8 E4M3	89%	Sometimes detected as E5M2
FP8 E5M2	100%

How It Works

When an LLM computes logprobs, it performs:

logprob[i] = logit[i] - log_sum_exp(logits)

If logits are stored in a reduced precision format (e.g., BF16), they have a characteristic pattern of trailing zeros in their mantissa. Our attack recovers the log_sum_exp constant by searching over possible logit values.

Key insight: If logits are BF16, there are only 65536 possible values. We enumerate candidate logits, compute w = candidate - logprob[0], and verify that logprob[i] + w gives valid BF16 values for all remaining logprobs.

Repository Structure

src/
  inverted_search.c   # Fast attack
  brute_force.c       # Naive attack
  eval.py             # Evaluate on cached data
  cache_logprobs.py   # Collect logprobs from OpenAI API
  cache_gemini.py     # Collect logprobs from Gemini API
  cache_quantized.py  # Generate synthetic ground-truth data
data/
  openai/             # Cached OpenAI model logprobs
  gemini/             # Cached Gemini model logprobs
  synthetic/          # Ground-truth validation data

Compiling

# Fast inverted search (recommended)
make

# Or manually:
gcc -O3 -march=native src/inverted_search.c -o inverted_search -lm

# Brute-force (slow, for demonstration)
gcc -O3 -march=native src/brute_force.c -o brute_force -lm

Reproducing Results

Evaluate on cached data

make
python src/eval.py

Collect fresh OpenAI logprobs

export OPENAI_API_KEY=...
python src/cache_logprobs.py

Collect fresh Gemini logprobs

export GEMINI_API_KEY=...
python src/cache_gemini.py

Generate synthetic ground-truth data

Requires GPU and PyTorch:

pip install torch transformers accelerate
python src/cache_quantized.py

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data		data
src		src
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Precision Extraction from LLM API Logprobs

Quick Start

Results

OpenAI Models

Gemini Models

Synthetic Validation (GPT-Neo with quantized logits)

How It Works

Repository Structure

Compiling

Reproducing Results

Evaluate on cached data

Collect fresh OpenAI logprobs

Collect fresh Gemini logprobs

Generate synthetic ground-truth data

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Precision Extraction from LLM API Logprobs

Quick Start

Results

OpenAI Models

Gemini Models

Synthetic Validation (GPT-Neo with quantized logits)

How It Works

Repository Structure

Compiling

Reproducing Results

Evaluate on cached data

Collect fresh OpenAI logprobs

Collect fresh Gemini logprobs

Generate synthetic ground-truth data

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages