useful utilities for prompt engineering
pip install promptools # or any other dependency manager you likeNote that the validation features use pydantic>=2 as an optional dependencies. You can use pip install promptools[validation] to install it by the way.
parse JSON from raw LLM response text
Detect and parse the last JSON block from input string.
def extract_json(text: str, /, fallback: F) -> JSON | F:It will return fallback if it fails to detect / parse JSON.
Note that the default value of fallback is None.
def extract_json(text: str, /, fallback: F, expect: Type[M]) -> M | F:You can provide a pydantic.BaseModel or a TypeAlias in the expect parameter and pydantic will validate it.
Imagine that you are using LLM on a classification task.
from promptools.extractors import extract_json
from typing import TypedDict
class Item(TypedDict):
index: int
label: str
original_text = """
The result is:
```json
[
{"index": 0, "label": "A"},
{"index": 1, "label": "B"}
]
```
"""
print(extract_json(original_text, [], list[Item]))The output will be:
[{'index': 0, 'label': 'A'}, {'index': 1, 'label': 'B'}]Imagine that you are trying to parse a malformed JSON:
from promptools.extractors import extract_json
from pydantic import BaseModel
original_text = '{"results": [{"index": 1}, {'
print(extract_json(original_text))The output will be:
{'results': [{'index': 1}, {}]}count number of tokens in prompt
def count_token(prompt: str | list[str], enc: Encoding | None = None) -> int:Provide your prompt / a list of prompts, get its token count. The second parameter is the tiktoken.Encoding instance, will default to get_encoding("cl100k_base") if not provided. The default tiktoken.Encoding instance is cached, and will not be re-created every time.
def count_token(prompt: dict | list[dict], enc: Encoding | None = None) -> int:Note that it can also be a single message / a list of messages. Every message should be a dict in the schema below:
class Message(TypedDict):
role: str
content: str
name: NotRequired[str]from tiktoken import encoding_for_model
from promptools.openai import count_token
print(count_token("hi", encoding_for_model("gpt-3.5-turbo")))The output will be:
1from promptools.openai import count_token
print(count_token(["hi", "hello"]))The output will be:
2from promptools.openai import count_token
count_token({"role": "user", "content": "hi"})The output will be:
5from promptools.openai import count_token
count_token([
{"role": "user", "content": "hi"},
{"role": "assistant", "content": "Hello! How can I assist you today?"},
])The output will be:
21from promplate.prompt.chat import U, A, S
from promptools.openai import count_token
count_token([
S @ "background" > "You are a helpful assistant.",
U @ "example_user" > "hi",
A @ "example_assistant" > "Hello! How can I assist you today?",
])The output will be:
40