Skip to content

y0mingzhang/precision-extraction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Precision Extraction from LLM API Logprobs

This repository contains code and data for inferring the internal floating-point precision of language models from API-exposed logprobs.

Blog post: TODO

Quick Start

# Compile the attack
make

# Run on cached OpenAI data
python src/eval.py --openai

# Run on cached Gemini data
python src/eval.py --gemini

# Run on synthetic ground-truth data
python src/eval.py --synthetic

# Infer precision from raw logprobs
./inverted_search -1.5 -0.77 -3.02 -4.1 -5.2

Results

OpenAI Models

Model Precision Agreement
gpt-3.5-turbo FP32 100%
gpt-4 FP32 100%
gpt-4-turbo FP32 100%
gpt-4o BF16 97%
gpt-4o-mini BF16 100%
gpt-4.1 BF16 100%
gpt-4.1-mini BF16 100%
gpt-4.1-nano BF16 100%

Gemini Models

Model Precision Agreement
gemini-2.0-flash FP32 100%
gemini-2.0-flash-lite FP32 100%

Note: Gemini 2.5+ models do not expose logprobs.

Synthetic Validation (GPT-Neo with quantized logits)

Precision Accuracy Notes
FP32 100%
FP16 100%
BF16 100%
FP8 E4M3 89% Sometimes detected as E5M2
FP8 E5M2 100%

How It Works

When an LLM computes logprobs, it performs:

logprob[i] = logit[i] - log_sum_exp(logits)

If logits are stored in a reduced precision format (e.g., BF16), they have a characteristic pattern of trailing zeros in their mantissa. Our attack recovers the log_sum_exp constant by searching over possible logit values.

Key insight: If logits are BF16, there are only 65536 possible values. We enumerate candidate logits, compute w = candidate - logprob[0], and verify that logprob[i] + w gives valid BF16 values for all remaining logprobs.

Repository Structure

src/
  inverted_search.c   # Fast attack
  brute_force.c       # Naive attack
  eval.py             # Evaluate on cached data
  cache_logprobs.py   # Collect logprobs from OpenAI API
  cache_gemini.py     # Collect logprobs from Gemini API
  cache_quantized.py  # Generate synthetic ground-truth data
data/
  openai/             # Cached OpenAI model logprobs
  gemini/             # Cached Gemini model logprobs
  synthetic/          # Ground-truth validation data

Compiling

# Fast inverted search (recommended)
make

# Or manually:
gcc -O3 -march=native src/inverted_search.c -o inverted_search -lm

# Brute-force (slow, for demonstration)
gcc -O3 -march=native src/brute_force.c -o brute_force -lm

Reproducing Results

Evaluate on cached data

make
python src/eval.py

Collect fresh OpenAI logprobs

export OPENAI_API_KEY=...
python src/cache_logprobs.py

Collect fresh Gemini logprobs

export GEMINI_API_KEY=...
python src/cache_gemini.py

Generate synthetic ground-truth data

Requires GPU and PyTorch:

pip install torch transformers accelerate
python src/cache_quantized.py

License

MIT

About

Extracting Model Precision from 20 Logprobs

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors