DataGen - By TeichAI

A easy to use CLI to generate JSONL datasets from a TXT file using LLMs.

Install

npm i -g @teichai/datagen

Or install locally and run via npx:

npm i -D @teichai/datagen
npx datagen --help

Run tests:

npm test

Usage

Set your OpenRouter API key:

export API_KEY="your_openrouter_key"

Create a prompts file where each line is a prompt:

Explain the CAP theorem in simple terms.
Write a Python function to reverse a linked list.

Run:

datagen --model openai/gpt-4o-mini --prompts prompts.txt

Configuration File

You can also use a YAML config file:

model: openai/gpt-4o-mini
prompts: ./prompts.txt
out: ./dataset.jsonl
concurrent: 5
openrouter:
  providerSort: throughput

Run with:

datagen --config config.yaml

Note: On startup, datagen does a quick best-effort check for a newer npm version and prints an upgrade command if available. Disable with DATAGEN_DISABLE_UPDATE_CHECK=1.

Development (build + run once):

API_KEY="your_openrouter_key" npm run dev -- --model openai/gpt-4o-mini --prompts prompts.txt

Options

--help: show the help message and exit.
--version: print the CLI version and exit.
--config: set a config file
--model <name>: required model name.
--prompts <file>: required prompts file.
--out <file>: output JSONL (default dataset.jsonl).
--api <baseUrl>: API base (default OpenRouter).
--system <text>: optional system prompt.
--store-system true|false: store system message in output (default true).
--concurrent <num>: number of in-flight requests (default 1).
--openrouter.provider <slugs>: comma-separated provider slugs to try in order (OpenRouter only).
--openrouter.providerSort <price|throughput|latency>: provider routing sort (OpenRouter only).
--reasoningEffort <none|minimal|low|medium|high|xhigh>: pass through as reasoning.effort.
--no-progress: disable the progress bar.
--timeout <ms>: request timeout in milliseconds.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
docs		docs
scripts		scripts
src		src
test		test
.gitignore		.gitignore
.npmrc		.npmrc
AGENTS.md		AGENTS.md
LICENSE		LICENSE
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
tsconfig.test.json		tsconfig.test.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DataGen - By TeichAI

Install

Usage

Configuration File

Options

About

Uh oh!

Releases

Contributors 2

Uh oh!

Languages

License

TeichAI/datagen

Folders and files

Latest commit

History

Repository files navigation

DataGen - By TeichAI

Install

Usage

Configuration File

Options

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Contributors 2

Uh oh!

Languages