A easy to use CLI to generate JSONL datasets from a TXT file using LLMs.
npm i -g @teichai/datagenOr install locally and run via npx:
npm i -D @teichai/datagen
npx datagen --helpRun tests:
npm testSet your OpenRouter API key:
export API_KEY="your_openrouter_key"Create a prompts file where each line is a prompt:
Explain the CAP theorem in simple terms.
Write a Python function to reverse a linked list.
Run:
datagen --model openai/gpt-4o-mini --prompts prompts.txtYou can also use a YAML config file:
model: openai/gpt-4o-mini
prompts: ./prompts.txt
out: ./dataset.jsonl
concurrent: 5
openrouter:
providerSort: throughputRun with:
datagen --config config.yamlNote: On startup, datagen does a quick best-effort check for a newer npm version and prints an upgrade command if available. Disable with DATAGEN_DISABLE_UPDATE_CHECK=1.
Development (build + run once):
API_KEY="your_openrouter_key" npm run dev -- --model openai/gpt-4o-mini --prompts prompts.txt--help: show the help message and exit.--version: print the CLI version and exit.--config: set a config file--model <name>: required model name.--prompts <file>: required prompts file.--out <file>: output JSONL (defaultdataset.jsonl).--api <baseUrl>: API base (default OpenRouter).--system <text>: optional system prompt.--store-system true|false: store system message in output (defaulttrue).--concurrent <num>: number of in-flight requests (default1).--openrouter.provider <slugs>: comma-separated provider slugs to try in order (OpenRouter only).--openrouter.providerSort <price|throughput|latency>: provider routing sort (OpenRouter only).--reasoningEffort <none|minimal|low|medium|high|xhigh>: pass through asreasoning.effort.--no-progress: disable the progress bar.--timeout <ms>: request timeout in milliseconds.