Prompt Extraction Attack

Contains data and code for the paper

Effective Prompt Extraction from Language Models
Yiming Zhang, Nicholas Carlini and Daphne Ippolito
COLM 2024

Setup

Python 3.10 + pip install -r requirements.txt

attacks/
├── real-services/          <- prompt extractions from Bard, Bing Chat, ChatGPT and Claude
├── 5-gram-attacks.json     <- extraction attack against a 5-gram output filtering defense
├── attacks.json            <- handwritten attack queries
├── generated.json          <- attack queries generated by GPT-4
└── selected.json           <- 15 most effective attack queries
data/
├── dev.jsonl               <- validation data
├── sharegpt-test.jsonl     <- sharegpt test set
└── unnatural-test.jsonl    <- unnatural instructions test set
src/
├── common.py                           <- common utils
├── evaluate-extraction.py              <- evaluate extraction outputs
├── finetune-similarity-predictor.py    <- fine-tune a DeBERTa model for confidence estimation
├── gpt-x-prompt-extraction.py          <- GPT-3.5 & GPT-4 extraction
├── guess-model/                        <- identify LLMs with generations                        
├── hf-prompt-extraction.py             <- HuggingFace LLMs extraction
├── llama-chat-prompt-extraction.py     <- Llama-2-chat extraction

LLM system prompts

All extracted system prompts for Bing Chat, Bard and ChatGPT can be found in attacks/real-services.

Prompt extraction

To reproduce our prompt extraction results on GPT-4 & awesome-chatgpt-prompts:

python src/gpt-x-prompt-extraction.py \
 --model gpt-4 \
 --data awesome \
 --attack attacks/generated.json

Notes:

Adding the argument --defense "5-gram" enables the 5-gram defense.
Specify source of prompt with --data. Valid choices are dev, awesome, sharegpt and unnatural.
See src/hf-prompt-extraction.py and src/llama-chat-prompt-extraction.py for HF/Llama-2-chat prompt extraction.

Fine-tuned prompt leakage model

Model weights are on Hugging Face. See src/evaluate-extraction.py on how to use the model to predict prompt leakage.

LICENSE

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
attacks		attacks
data		data
scripts		scripts
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Prompt Extraction Attack

Setup

LLM system prompts

Prompt extraction

Fine-tuned prompt leakage model

LICENSE

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

Prompt Extraction Attack

Setup

LLM system prompts

Prompt extraction

Fine-tuned prompt leakage model

LICENSE

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors 1

Languages

Packages