Better Prompt Schema in peerbench #27
Replies: 8 comments 1 reply
-
|
A draft for Prompt; {
// The data that is going to be sent to the model in the same format as OpenAI's API.
// https://platform.openai.com/docs/api-reference/chat/create
// <Required>
"input": [
// System prompt if there is
// <Optional>
{
"role": "system",
"content": "You are a helpful assistant. Provide a concise answer to the user's question."
},
// User prompt
// <Required>
{
"role": "user",
"content": "What is the capital of France?"
}
],
// Expected answer in different formats
// <Required, can be empty>
"truths": [
"Paris",
"Paris is the capital of France",
"The capital of France is Paris"
],
// Scorer identifiers that can be used to score a Response that was given to this Prompt
// <Optional but must include at least one item if given>
"scorers": [""],
// Additional metadata
"metadata": {
"some-redundant-info": "some-value"
}
}I feel like we should be compatible as much as possible with OpenAI's API schema since most of the Providers/LLMs are also following that |
Beta Was this translation helpful? Give feedback.
-
|
Our current schema was desigend to make it easy to read for a human so you always have all information that you need right there but this duplicate makes the file size very big : here current |
Beta Was this translation helpful? Give feedback.
-
|
A real world usage example of the current Prompt schema {
"did": "01990a53-ccc0-762c-b2d6-92517c21062e",
"question": {
"data": "In cellular models of GBA1-linked Parkinson's disease, where dysfunctional lysosomes impair mitophagy and mitochondrial function, what intervention directly addresses the underlying lysosomal defect to restore mitochondrial health and bioenergetics, thereby presenting a potential therapeutic avenue?",
"cid": "bagaaiera67eis6d2j5pbtwfldxjo7g6tt4k2miq7rtj6xqn5nph3ysnbm5aa",
"sha256": "d90c38d93eb101437e9e23ae36d9965148272e826c0ddace2c973d6b2bfe954b"
},
"answer": "Directly modulating lysosomal pH levels",
"answerKey": "C",
"options": {
"A": "Enhancement of proteasome activity",
"B": "Boosting GCase enzyme production",
"C": "Directly modulating lysosomal pH levels",
"D": "Enhancing mitochondrial fission dynamics",
"E": "Inhibition of ER stress response",
"F": "Activation of autophagy initiation complex",
"G": "Enhancing mitochondrial fusion dynamics",
"H": "Upregulation of antioxidant enzymes"
},
"fullPrompt": {
"data": "In cellular models of GBA1-linked Parkinson's disease, where dysfunctional lysosomes impair mitophagy and mitochondrial function, what intervention directly addresses the underlying lysosomal defect to restore mitochondrial health and bioenergetics, thereby presenting a potential therapeutic avenue?\n\nA: Enhancement of proteasome activity\nB: Boosting GCase enzyme production\nC: Directly modulating lysosomal pH levels\nD: Enhancing mitochondrial fission dynamics\nE: Inhibition of ER stress response\nF: Activation of autophagy initiation complex\nG: Enhancing mitochondrial fusion dynamics\nH: Upregulation of antioxidant enzymes\n",
"cid": "bagaaieratdlo5j3cbuimqvxkgmtyfs7kajnbueo5fhjppzocwvjmubasorbq",
"sha256": "2d2bf2aa0816bfd669aa0f2b5570d4b8554ffae6a98be4b0e7933844332500a1"
},
"type": "multiple-choice",
"metadata": {
"tags": ["generator-mcq"]
},
"scorers": [
"multiple-choice",
"ref-answer-equality-llm-judge-scorer",
"exact-match"
]
} |
Beta Was this translation helpful? Give feedback.
-
|
Related schemas from others https://github.com/confident-ai/deepeval/blob/main/deepeval/test_case/llm_test_case.py line 159 rather tightly connected to how it is called not saving prompts individually or giving new ID's |
Beta Was this translation helpful? Give feedback.
-
|
LM-eval creates a ID's but also uses hashes https://github.com/EleutherAI/lm-evaluation-harness/blob/main/lm_eval/evaluator.py Location: lm_eval/evaluator.py:634-659 Each sample saved to disk has this structure: {
Saved to: samples_{task_name}_{timestamp}.jsonl files |
Beta Was this translation helpful? Give feedback.
-
|
As first small improvement step. Can we remove "duplicate keys" via unnesting so prompt.data ---> prompt fullprompt.data ----> fullprompt renaming "did" --> "prompt_uuid" |
Beta Was this translation helpful? Give feedback.
-
|
I think one thing that we agree on is not to try to use one schema to rule them all. We should be having different schemas for different prompt types. But anyway like I have mentioned before, I was thinking of relying on OpenAI's API schema for all type of prompts and still thinking that this may solve the most difficult part which is designing the schema. But in this case we would need to think about how to shape OpenAI's schema into a tabular format for the web application and also for storing prompts in |
Beta Was this translation helpful? Give feedback.
-
|
I know it may not make any sense, but what if we just use SQLite as the file format? Since it is just regular SQL, we can store all the related things into different tables. Or maybe |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Our current schema has too much repotition, makes the data size larger
Beta Was this translation helpful? Give feedback.
All reactions