Better Prompt Schema in peerbench #27

reisepass · 2025-11-07T14:12:30Z

reisepass
Nov 7, 2025
Maintainer

Our current schema has too much repotition, makes the data size larger

mkaramuk · 2025-11-07T14:13:53Z

mkaramuk
Nov 7, 2025
Maintainer

A draft for Prompt;

{
  // The data that is going to be sent to the model in the same format as OpenAI's API.
  // https://platform.openai.com/docs/api-reference/chat/create
  // <Required>
  "input": [
    // System prompt if there is
    // <Optional>
    {
      "role": "system",
      "content": "You are a helpful assistant. Provide a concise answer to the user's question."
    },
    // User prompt
    // <Required>
    {
      "role": "user",
      "content": "What is the capital of France?"
    }
  ],

  // Expected answer in different formats
  // <Required, can be empty>
  "truths": [
    "Paris",
    "Paris is the capital of France",
    "The capital of France is Paris"
  ],

  // Scorer identifiers that can be used to score a Response that was given to this Prompt
  // <Optional but must include at least one item if given>
  "scorers": [""],

  // Additional metadata
  "metadata": {
    "some-redundant-info": "some-value"
  }
}

I feel like we should be compatible as much as possible with OpenAI's API schema since most of the Providers/LLMs are also following that

1 reply

reisepass Nov 7, 2025
Maintainer Author

If we want to do benchmarks with a multi step multi entity chat system should be its own schema

reisepass · 2025-11-07T14:19:15Z

reisepass
Nov 7, 2025
Maintainer Author

Our current schema was desigend to make it easy to read for a human so you always have all information that you need right there but this duplicate makes the file size very big : here current

peerBench/peerBenchJS/packages/sdk/src/types.ts

Line 35 in c1e4960

export const PromptSchema = z

0 replies

mkaramuk · 2025-11-07T14:24:20Z

mkaramuk
Nov 7, 2025
Maintainer

A real world usage example of the current Prompt schema

{
  "did": "01990a53-ccc0-762c-b2d6-92517c21062e",
  "question": {
    "data": "In cellular models of GBA1-linked Parkinson's disease, where dysfunctional lysosomes impair mitophagy and mitochondrial function, what intervention directly addresses the underlying lysosomal defect to restore mitochondrial health and bioenergetics, thereby presenting a potential therapeutic avenue?",
    "cid": "bagaaiera67eis6d2j5pbtwfldxjo7g6tt4k2miq7rtj6xqn5nph3ysnbm5aa",
    "sha256": "d90c38d93eb101437e9e23ae36d9965148272e826c0ddace2c973d6b2bfe954b"
  },
  "answer": "Directly modulating lysosomal pH levels",
  "answerKey": "C",
  "options": {
    "A": "Enhancement of proteasome activity",
    "B": "Boosting GCase enzyme production",
    "C": "Directly modulating lysosomal pH levels",
    "D": "Enhancing mitochondrial fission dynamics",
    "E": "Inhibition of ER stress response",
    "F": "Activation of autophagy initiation complex",
    "G": "Enhancing mitochondrial fusion dynamics",
    "H": "Upregulation of antioxidant enzymes"
  },
  "fullPrompt": {
    "data": "In cellular models of GBA1-linked Parkinson's disease, where dysfunctional lysosomes impair mitophagy and mitochondrial function, what intervention directly addresses the underlying lysosomal defect to restore mitochondrial health and bioenergetics, thereby presenting a potential therapeutic avenue?\n\nA: Enhancement of proteasome activity\nB: Boosting GCase enzyme production\nC: Directly modulating lysosomal pH levels\nD: Enhancing mitochondrial fission dynamics\nE: Inhibition of ER stress response\nF: Activation of autophagy initiation complex\nG: Enhancing mitochondrial fusion dynamics\nH: Upregulation of antioxidant enzymes\n",
    "cid": "bagaaieratdlo5j3cbuimqvxkgmtyfs7kajnbueo5fhjppzocwvjmubasorbq",
    "sha256": "2d2bf2aa0816bfd669aa0f2b5570d4b8554ffae6a98be4b0e7933844332500a1"
  },
  "type": "multiple-choice",
  "metadata": {
    "tags": ["generator-mcq"]
  },
  "scorers": [
    "multiple-choice",
    "ref-answer-equality-llm-judge-scorer",
    "exact-match"
  ]
}

0 replies

reisepass · 2025-11-07T14:47:51Z

reisepass
Nov 7, 2025
Maintainer Author

Related schemas from others

https://github.com/confident-ai/deepeval/blob/main/deepeval/test_case/llm_test_case.py line 159 rather tightly connected to how it is called not saving prompts individually or giving new ID's

0 replies

reisepass · 2025-11-07T14:52:17Z

reisepass
Nov 7, 2025
Maintainer Author

LM-eval creates a ID's but also uses hashes

https://github.com/EleutherAI/lm-evaluation-harness/blob/main/lm_eval/evaluator.py

Location: lm_eval/evaluator.py:634-659

Each sample saved to disk has this structure:

{
"doc_id": int, # Document ID
"doc": dict, # Original input document
"target": str, # Expected answer/target
"arguments": dict, # Prompts (converted to {"gen_args_0": {...}, ...})
"resps": list, # Raw model responses
"filtered_resps": list, # Processed/filtered responses
"filter": str, # Filter name used
"metrics": list[str], # List of metric names
"doc_hash": str, # Hash of document
"prompt_hash": str, # Hash of prompt
"target_hash": str, # Hash of target
# Plus individual metric scores (e.g., "acc": 0.95, "f1": 0.88)
}

| Field | Type | Description | |-------------|-------------------|--------------------------------------------------------------------------------| | doc_id | Integer | Sequential index from enumerate() (0, 1, 2...) | | doc_hash | String (64 chars) | SHA-256 of JSON-serialized document | | prompt_hash | String (64 chars) | SHA-256 of the prompt text | | target_hash | String (64 chars) | SHA-256 of the expected answer |

Saved to: samples_{task_name}_{timestamp}.jsonl files

0 replies

reisepass · 2025-11-07T15:37:56Z

reisepass
Nov 7, 2025
Maintainer Author

As first small improvement step. Can we remove "duplicate keys" via unnesting so

prompt.data ---> prompt
prompt.cid ---> prompt_cid
prompt.sha256 ---> propmt_sha256

fullprompt.data ----> fullprompt

renaming

"did" --> "prompt_uuid"

0 replies

mkaramuk · 2025-11-27T13:59:11Z

mkaramuk
Nov 27, 2025
Maintainer

I think one thing that we agree on is not to try to use one schema to rule them all. We should be having different schemas for different prompt types.

But anyway like I have mentioned before, I was thinking of relying on OpenAI's API schema for all type of prompts and still thinking that this may solve the most difficult part which is designing the schema. But in this case we would need to think about how to shape OpenAI's schema into a tabular format for the web application and also for storing prompts in .parquet format.

0 replies

mkaramuk · 2025-11-27T14:24:20Z

mkaramuk
Nov 27, 2025
Maintainer

I know it may not make any sense, but what if we just use SQLite as the file format? Since it is just regular SQL, we can store all the related things into different tables.

Or maybe .duckdb? As far as I remember it is a superset (or maybe equivalent) of .parquet format.

0 replies

Better Prompt Schema in peerbench #27

Uh oh!

reisepass Nov 7, 2025 Maintainer

Replies: 8 comments · 1 reply

Uh oh!

mkaramuk Nov 7, 2025 Maintainer

Uh oh!

reisepass Nov 7, 2025 Maintainer Author

Uh oh!

reisepass Nov 7, 2025 Maintainer Author

Uh oh!

mkaramuk Nov 7, 2025 Maintainer

Uh oh!

reisepass Nov 7, 2025 Maintainer Author

Uh oh!

reisepass Nov 7, 2025 Maintainer Author

Uh oh!

reisepass Nov 7, 2025 Maintainer Author

Uh oh!

Uh oh!

mkaramuk Nov 27, 2025 Maintainer

Uh oh!

mkaramuk Nov 27, 2025 Maintainer

reisepass
Nov 7, 2025
Maintainer

Replies: 8 comments 1 reply

mkaramuk
Nov 7, 2025
Maintainer

reisepass Nov 7, 2025
Maintainer Author

reisepass
Nov 7, 2025
Maintainer Author

mkaramuk
Nov 7, 2025
Maintainer

reisepass
Nov 7, 2025
Maintainer Author

reisepass
Nov 7, 2025
Maintainer Author

reisepass
Nov 7, 2025
Maintainer Author

mkaramuk
Nov 27, 2025
Maintainer

mkaramuk
Nov 27, 2025
Maintainer