-
Notifications
You must be signed in to change notification settings - Fork 7
Open
Description
Problem
Models are evaluated via direct API calls, exposing prompts to providers and allowing potential memorization or training leakage.
Basis of issue
- Isolated execution environment (sandboxed SDK or enclave)
- Network egress restrictions during inference
- Prevention of prompt visibility prior to scoring
- Secure prompt delivery and response capture
Importance
- Central security guarantee of the paper
- Prevents prompt harvesting by model providers
- Without this, contamination resistance is fundamentally broken
Current Implementation Gap
- Models access prompts via OpenRouter / direct APIs
- No isolation or prompt secrecy guarantees
Implementation checklist
- Prompts executed inside a sealed environment
- No outbound network access during inference
- Model providers cannot log or store prompts
- Scoring occurs post-execution, not inline
coderabbitai and reisepass
Metadata
Metadata
Assignees
Labels
No labels