A Zig implementation of the TOON (Token-Oriented Object Notation) format, version 2.0 compliant.
TOON is a human-readable data serialization format designed for token efficiency, with explicit structure and minimal quoting. It's particularly well-suited for arrays of uniform objects and is commonly used as a compact representation for LLM prompts.
- Full TOON encoding and decoding
- Primitive arrays with inline values (
[3]: 1,2,3) - Tabular arrays with field lists (
[2]{id,name}:) - List arrays with
-markers - Nested objects and arrays
- Multiple delimiters (comma, tab, pipe)
- Strict mode validation
- Zero dependencies (pure Zig standard library)
- Streaming API - Memory-efficient encoding/decoding for large documents
- Rich Error Context - Detailed error messages with line numbers and source context
- Key Folding (Encoder) - Collapse nested single-key objects into dotted paths
- Path Expansion (Decoder) - Expand dotted keys back into nested structures
- Flatten Depth Control - Limit the depth of key folding
- Blank Line Validation - Strict mode errors on blank lines inside arrays
- Max Depth Limit - Configurable nesting depth limit (default: 256) to prevent stack overflow attacks
- Type Coercion Control - Option to disable automatic number parsing
- GitHub Actions - Automated testing on Linux, Windows, and macOS
- Multi-platform builds - Release artifacts for all major platforms
The easiest way to add toon-zig as a dependency is using the zig fetch command. Zig supports two URL formats:
Using Git URL (recommended):
zig fetch --save git+https://github.com/bkataru/toon-zig.git#HEADUsing tarball URL:
zig fetch --save https://github.com/bkataru/toon-zig/archive/refs/heads/main.tar.gzThis will automatically download the package and add it to your build.zig.zon with the correct hash.
To fetch a specific version or tag:
# Using git URL with tag reference
zig fetch --save git+https://github.com/bkataru/toon-zig.git#v0.1.1
# Or using tarball URL for a specific tag
zig fetch --save https://github.com/bkataru/toon-zig/archive/refs/tags/v0.1.1.tar.gzTo save with a custom dependency name:
zig fetch --save=toon git+https://github.com/bkataru/toon-zig.git#HEADNote: The
git+https://protocol clones the repository directly, while tarball URLs download a snapshot archive. Git URLs are generally more reliable for version pinning.
Alternatively, you can manually add toon-zig as a dependency in your build.zig.zon:
.dependencies = .{
.toon = .{
// Using git URL (recommended)
.url = "git+https://github.com/bkataru/toon-zig.git#v0.1.1",
// Or using tarball URL:
// .url = "https://github.com/bkataru/toon-zig/archive/refs/tags/v0.1.1.tar.gz",
.hash = "...", // Run `zig build` to get the correct hash
},
},Note: On the first build attempt, Zig will display the correct hash value. Copy that hash and update your build.zig.zon file accordingly.
After adding the dependency (via either method), add the following to your build.zig:
const toon = b.dependency("toon", .{
.target = target,
.optimize = optimize,
});
exe.root_module.addImport("toon", toon.module("toon"));const std = @import("std");
const toon = @import("toon");
pub fn main() !void {
var gpa = std.heap.GeneralPurposeAllocator(.{}){};
defer _ = gpa.deinit();
const allocator = gpa.allocator();
// Create an object
var fields = try allocator.alloc(toon.Value.Object.Field, 2);
fields[0] = .{ .key = "name", .value = .{ .string = "Alice" } };
fields[1] = .{ .key = "age", .value = .{ .number = 30 } };
const value = toon.Value{ .object = .{ .fields = fields } };
const options = toon.Options{};
const encoded = try toon.encode(allocator, value, options);
defer allocator.free(encoded);
std.debug.print("{s}\n", .{encoded});
// Output:
// name: Alice
// age: 30
}const std = @import("std");
const toon = @import("toon");
pub fn main() !void {
var gpa = std.heap.GeneralPurposeAllocator(.{}){};
defer _ = gpa.deinit();
const allocator = gpa.allocator();
const data =
\\name: Alice
\\age: 30
;
const result = try toon.decode(allocator, data, .{});
defer result.deinit(allocator);
// Access decoded values
const obj = result.value.object;
for (obj.fields) |field| {
std.debug.print("{s}: ", .{field.key});
switch (field.value) {
.string => |s| std.debug.print("{s}\n", .{s}),
.number => |n| std.debug.print("{d}\n", .{n}),
else => {},
}
}
}For more granular control, use the specialized option types:
const std = @import("std");
const toon = @import("toon");
pub fn main() !void {
var gpa = std.heap.GeneralPurposeAllocator(.{}){};
defer _ = gpa.deinit();
const allocator = gpa.allocator();
// Encode with specific options
var encode_opts = toon.EncodeOptions{};
encode_opts.indent_size = 4;
encode_opts.key_folding = .safe;
const encoded = try toon.encodeWithOptions(allocator, value, encode_opts);
defer allocator.free(encoded);
// Decode with specific options
var decode_opts = toon.DecodeOptions{};
decode_opts.coerce_types = false; // Keep numbers as strings
decode_opts.max_depth = 100; // Custom depth limit
const result = try toon.decodeWithOptions(allocator, data, decode_opts);
defer result.deinit(allocator);
}To validate TOON data without fully parsing it:
const std = @import("std");
const toon = @import("toon");
pub fn main() !void {
var gpa = std.heap.GeneralPurposeAllocator(.{}){};
defer _ = gpa.deinit();
const allocator = gpa.allocator();
const data = "items[3]: a,b,c";
// Validate - throws error if invalid
try toon.validate(allocator, data, .{});
// Or with specific options
var opts = toon.DecodeOptions{ .strict = true };
try toon.validateWithOptions(allocator, data, opts);
}Control whether numeric-looking strings are automatically converted to numbers:
const std = @import("std");
const toon = @import("toon");
pub fn main() !void {
var gpa = std.heap.GeneralPurposeAllocator(.{}){};
defer _ = gpa.deinit();
const allocator = gpa.allocator();
const data = "value: 123";
// Default: coerce_types = true, "123" becomes number 123
{
const result = try toon.decode(allocator, data, .{});
defer result.deinit(allocator);
// result.value.object.fields[0].value is .number = 123
}
// With coerce_types = false, "123" stays as string "123"
{
var opts = toon.Options{};
opts.coerce_types = false;
const result = try toon.decode(allocator, data, opts);
defer result.deinit(allocator);
// result.value.object.fields[0].value is .string = "123"
}
}Key folding collapses nested single-key objects into dotted paths for more compact output:
const std = @import("std");
const toon = @import("toon");
pub fn main() !void {
var gpa = std.heap.GeneralPurposeAllocator(.{}){};
defer _ = gpa.deinit();
const allocator = gpa.allocator();
// Create nested structure: { data: { meta: { id: 1 } } }
// ... (setup code)
// Enable key folding
var options = toon.Options{};
options.key_folding = .safe;
const encoded = try toon.encode(allocator, value, options);
defer allocator.free(encoded);
std.debug.print("{s}\n", .{encoded});
// Output: data.meta.id: 1
}Path expansion is the decoder counterpart to key folding, expanding dotted keys into nested objects:
const std = @import("std");
const toon = @import("toon");
pub fn main() !void {
var gpa = std.heap.GeneralPurposeAllocator(.{}){};
defer _ = gpa.deinit();
const allocator = gpa.allocator();
const data = "data.meta.id: 1";
// Enable path expansion
var options = toon.Options{};
options.expand_paths = .safe;
const result = try toon.decode(allocator, data, options);
defer result.deinit(allocator);
// Result is: { data: { meta: { id: 1 } } }
}For memory-efficient processing of large documents, use the streaming API:
const std = @import("std");
const toon = @import("toon");
pub fn main() !void {
var gpa = std.heap.GeneralPurposeAllocator(.{}){};
defer _ = gpa.deinit();
const allocator = gpa.allocator();
// Write directly to stdout or any writer
var stdout_buffer: [4096]u8 = undefined;
var stdout_writer = std.fs.File.stdout().writer(&stdout_buffer);
// ... create value ...
// Stream encode - writes directly without buffering entire output
try toon.streamingEncode(allocator, stdout_writer.any(), value, .{});
}const std = @import("std");
const toon = @import("toon");
fn valueCallback(path: []const []const u8, value: toon.Value, user_data: ?*anyopaque) !void {
// Process each value as it's parsed
std.debug.print("Path: ", .{});
for (path) |segment| {
std.debug.print("{s}.", .{segment});
}
std.debug.print(" = {any}\n", .{value});
}
pub fn main() !void {
var gpa = std.heap.GeneralPurposeAllocator(.{}){};
defer _ = gpa.deinit();
const allocator = gpa.allocator();
var stdin_buffer: [4096]u8 = undefined;
var stdin_reader = std.fs.File.stdin().reader(&stdin_buffer);
// Stream parse with callback for each value
try toon.streamParse(allocator, stdin_reader.any(), .{}, valueCallback, null);
}Get detailed error information including line numbers and source context:
const std = @import("std");
const toon = @import("toon");
pub fn main() !void {
var gpa = std.heap.GeneralPurposeAllocator(.{}){};
defer _ = gpa.deinit();
const allocator = gpa.allocator();
const invalid_data = " bad indentation";
// Use decodeWithContext for rich error information
const result = toon.decodeWithContext(allocator, invalid_data, .{ .strict = true });
if (result.isError()) {
// Get the parse error with full context
if (result.getParseError()) |parse_err| {
const msg = try parse_err.toString(allocator);
defer allocator.free(msg);
std.debug.print("Error: {s}\n", .{msg});
// Output: "error at line 1: indentation must be a multiple of indent size"
}
// Or access individual fields
if (result.getErrorContext()) |ctx| {
std.debug.print("Line: {d}, Column: {d}\n", .{ ctx.line, ctx.column });
if (ctx.source_line) |src| {
std.debug.print("Source: {s}\n", .{src});
}
}
} else {
defer result.deinit(allocator);
// Use result.value...
}
}Key folding and path expansion are designed to work together for lossless round-trips:
// Original: { data: { meta: { id: 1 } } }
// Encode with key_folding = .safe
// → "data.meta.id: 1"
// Decode with expand_paths = .safe
// → { data: { meta: { id: 1 } } }TOON supports several array formats:
[3]: 1,2,3
[2]{id,name}:
1,Alice
2,Bob
[3]:
- hello
- world
- [2]: a,b
const options = toon.Options{
// Validation
.strict = true, // Enable strict validation (default: true)
// Formatting
.indent_size = 2, // Spaces per indent level (default: 2)
.document_delimiter = .comma, // Delimiter for document values
.array_delimiter = .comma, // Delimiter for array values (.comma, .tab, .pipe)
// v1.5+ Key Folding (Encoder)
.key_folding = .off, // .off or .safe
.flatten_depth = null, // Max depth to fold (null = unlimited)
// v1.5+ Path Expansion (Decoder)
.expand_paths = .off, // .off or .safe
// Type handling
.coerce_types = true, // Auto-convert numeric strings to numbers
// Security
.max_depth = null, // Max nesting depth (null = 256)
};const encode_opts = toon.EncodeOptions{
.indent_size = 2,
.document_delimiter = .comma,
.array_delimiter = .comma,
.key_folding = .off,
.flatten_depth = null,
};const decode_opts = toon.DecodeOptions{
.strict = true,
.indent_size = 2,
.expand_paths = .off,
.coerce_types = true,
.max_depth = null,
};A tagged union representing TOON values:
pub const Value = union(enum) {
null,
bool: bool,
number: f64,
string: []const u8,
array: []Value,
object: Object,
};Result from decoding, includes the value and backing buffer:
pub const DecodedValue = struct {
value: Value,
buffer: []const u8,
pub fn deinit(self: DecodedValue, allocator: Allocator) void;
};| Function | Description |
|---|---|
encode(allocator, value, options) |
Encode a Value to TOON string |
encodeWithOptions(allocator, value, encode_opts) |
Encode with EncodeOptions |
encodeWithContext(allocator, value, options) |
Encode with rich error context (returns EncodeResultWithContext) |
encodeWithContextAndOptions(allocator, value, encode_opts) |
Encode with rich error context using EncodeOptions |
decode(allocator, data, options) |
Decode TOON string to Value |
decodeWithOptions(allocator, data, decode_opts) |
Decode with DecodeOptions |
decodeWithContext(allocator, data, options) |
Decode with rich error context (returns DecodeResultWithContext) |
decodeWithContextAndOptions(allocator, data, decode_opts) |
Decode with rich error context using DecodeOptions |
validate(allocator, data, options) |
Validate without returning value |
validateWithOptions(allocator, data, decode_opts) |
Validate with DecodeOptions |
| Function | Description |
|---|---|
streamingEncode(allocator, writer, value, options) |
Stream encode to writer |
streamingEncodeWithOptions(allocator, writer, value, encode_opts) |
Stream encode with EncodeOptions |
streamParse(allocator, reader, options, callback, user_data) |
Stream parse with callback |
createStreamingEncoder(allocator, writer, options) |
Create streaming encoder instance |
createStreamingEncoderWithOptions(allocator, writer, encode_opts) |
Create streaming encoder with EncodeOptions |
createStreamingDecoder(allocator, reader, options) |
Create streaming decoder instance |
createStreamingDecoderWithOptions(allocator, reader, decode_opts) |
Create streaming decoder with DecodeOptions |
| Function | Description |
|---|---|
writeValue(writer, value, options) |
Write encoded value directly to an std.Io.Writer |
readValue(reader, allocator, options) |
Read and decode value from an std.Io.Reader |
| Function | Description |
|---|---|
encodeAlloc(allocator, value, options) |
Encode any Zig struct/type to TOON string |
parseFromSlice(T, allocator, data, options) |
Parse TOON string into a Zig type T |
| Function | Description |
|---|---|
errorMessage(err) |
Get human-readable message for a ToonError |
errorAt(line, err) |
Create a ParseError at a specific line |
errorAtPos(line, column, err) |
Create a ParseError at a specific line and column |
errorWithSource(line, source_line, err) |
Create a ParseError with source line context |
errorWithSuggestion(line, err, suggestion) |
Create a ParseError with a fix suggestion |
| Function | Description |
|---|---|
looksNumeric(s) |
Check if a string looks like a number (needs quoting) |
hasLeadingZeroDecimal(s) |
Check if string has leading zeros (e.g., "007") |
isValidUnquotedKey(key) |
Check if a key can be written without quotes |
isIdentifierSegment(s) |
Check if string is valid for key folding/expansion |
areAllSegmentsIdentifiers(path) |
Check if all segments of a dotted path are valid identifiers |
pub const ToonError = error{
// Syntax errors
MissingColon,
InvalidHeader,
InvalidEscape,
UnterminatedString,
InvalidIndentation,
TabsInIndentation,
UnsupportedControlChar,
InvalidKey,
InvalidNumber,
// Structural errors
LengthMismatch,
WidthMismatch,
BlankLineInArray,
UnexpectedIndentation,
InvalidListItem,
DelimiterMismatch,
// Path expansion errors (v1.5+)
PathExpansionConflict,
// Security errors
MaxDepthExceeded,
// General errors
UnexpectedEof,
InvalidInput,
};| Constant | Value | Description |
|---|---|---|
MAX_DEPTH |
256 | Default maximum nesting depth |
/// Progress callback for streaming encoder
/// Called after each line is written with line number (1-based) and cumulative bytes written
pub const ProgressCallback = *const fn (line_num: usize, bytes_written: usize) void;
/// Value callback for streaming decoder
/// Called for each parsed value with its path and the value
pub const ValueCallback = *const fn (path: []const []const u8, value: Value, user_data: ?*anyopaque) anyerror!void;| Method | Description |
|---|---|
init(allocator, writer, options) |
Initialize encoder |
initWithCallback(allocator, writer, options, callback) |
Initialize with progress callback |
encode(value) |
Encode a value to the writer |
encodeWithCallback(value, callback) |
Encode with progress callback |
getLineCount() |
Get total lines written |
getBytesWritten() |
Get total bytes written |
| Method | Description |
|---|---|
init(allocator, reader, options) |
Initialize decoder |
deinit() |
Clean up resources |
readAll() |
Read and parse entire stream, return complete Value |
readLine() |
Read next line from stream |
parseLine(line, callback, user_data) |
Parse a single line, invoke callback for values |
streamParse(callback, user_data) |
Parse entire stream with callbacks |
getLastError() |
Get the last error that occurred |
getCurrentLine() |
Get current line number |
getState() |
Get current parsing state |
reset(new_reader) |
Reset decoder for a new stream |
/// Generic result type for error context propagation
pub fn Result(comptime T: type) type {
return union(enum) {
ok: T,
err: ResultError,
pub fn success(value: T) Self;
pub fn failure(code: ToonError, ctx: ErrorContext) Self;
pub fn failureAtLine(code: ToonError, line: usize) Self;
pub fn isOk(self) bool;
pub fn isError(self) bool;
pub fn unwrap(self) ToonError!T;
pub fn getErrorContext(self) ?ErrorContext;
pub fn getParseError(self) ?ParseError;
pub fn formatError(self, allocator) !?[]const u8;
};
}
/// Result with error context for decoding
pub const DecodeResultWithContext = struct {
value: ?Value,
buffer: ?[]const u8,
error_info: ?struct {
code: ToonError,
context: ErrorContext,
},
pub fn isOk(self) bool;
pub fn isError(self) bool;
pub fn getErrorCode(self) ?ToonError;
pub fn getErrorContext(self) ?ErrorContext;
pub fn getParseError(self) ?ParseError;
pub fn formatError(self, allocator) !?[]const u8;
pub fn unwrap(self) ToonError!Value;
pub fn deinit(self, allocator) void;
};
/// Result with error context for encoding
pub const EncodeResultWithContext = struct {
output: ?[]const u8,
error_info: ?struct {
code: ToonError,
context: ErrorContext,
},
pub fn isOk(self) bool;
pub fn isError(self) bool;
pub fn unwrap(self) ToonError![]const u8;
pub fn success(output: []const u8) EncodeResultWithContext;
pub fn failure(code: ToonError, ctx: ErrorContext) EncodeResultWithContext;
};
/// Error context with location information
pub const ErrorContext = struct {
line: usize, // 1-based line number
column: usize, // 1-based column number
source_line: ?[]const u8, // The actual source line
suggestion: ?[]const u8, // Hint for fixing the error
};
/// Parse error combining error code with context
pub const ParseError = struct {
err: ToonError,
context: ErrorContext,
pub fn toString(self, allocator) ![]const u8;
};TOON uses f64 (64-bit floating point) for all numeric values, matching the JSON data model. This has some implications:
- Safe integer range: Integers from -9007199254740991 to 9007199254740991 (±2^53 - 1) are exactly representable
- Beyond safe range: Larger integers may lose precision when round-tripped
- Decimal precision: Approximately 15-17 significant decimal digits
// These are exactly representable
const safe_int: f64 = 9007199254740991; // 2^53 - 1
// This loses precision
const too_large: f64 = 100000000000000000000; // 1e20
// Actual value stored may differ from intended value- For large integers that must be exact, store them as quoted strings
- For financial data, consider using fixed-point representations as strings
- For UUIDs or large IDs, use strings rather than numbers
| Input | Output |
|---|---|
NaN |
Encoded as null |
+Infinity |
Encoded as null |
-Infinity |
Encoded as null |
-0 |
Normalized to 0 |
To prevent stack overflow attacks from deeply nested malicious input, a maximum depth limit is enforced:
// Default limit is 256 levels
const result = try toon.decode(allocator, malicious_input, .{});
// Custom limit for more restrictive environments
var opts = toon.Options{};
opts.max_depth = 32; // Only allow 32 levels of nesting
const result = try toon.decode(allocator, data, opts);When the depth limit is exceeded, ToonError.MaxDepthExceeded is returned.
- Always use strict mode (
strict = true, the default) when processing untrusted input - Strict mode validates:
- Array length declarations match actual content
- Indentation is consistent
- No blank lines inside arrays
- Delimiter consistency
This implementation targets TOON v2.0 and includes:
| Feature | Status |
|---|---|
| Core encoding/decoding | ✅ |
| Primitive arrays | ✅ |
| Tabular arrays | ✅ |
| List arrays | ✅ |
| Arrays of arrays | ✅ |
| Delimiter support (comma, tab, pipe) | ✅ |
| Strict mode validation | ✅ |
| Key folding (v1.5+) | ✅ |
| Path expansion (v1.5+) | ✅ |
| Flatten depth control | ✅ |
| Blank line validation | ✅ |
| Type coercion control | ✅ |
| Max depth security limit | ✅ |
| Separate encode/decode options | ✅ |
| Validation-only mode | ✅ |
| Streaming encoder | ✅ |
| Streaming decoder | ✅ |
| Rich error context | ✅ |
| GitHub Actions CI | ✅ |
# Run unit tests
zig build test
# Run fixture tests (official spec fixtures)
zig build fixture-test
# Run streaming module tests
zig build test-streaming
# Run error module tests
zig build test-errors
# Run utils module tests
zig build test-utils
# Run all tests
zig build test-all
# Build library and CLI
zig build
# Build release
zig build -Doptimize=ReleaseSafeThis project uses GitHub Actions for CI/CD:
- Test: Runs on every push/PR on Ubuntu, Windows, and macOS
- Build: Builds the CLI binary for all platforms
- Release Build: Creates optimized release artifacts
- Lint: Checks code formatting with
zig fmt
See .github/workflows/ci.yml for the full configuration.
- Unit tests: 100+ tests covering all features
- Fixture tests: 340 official spec fixtures (196 decode + 144 encode)
- Streaming tests: Tests for streaming encoder and decoder
- Error tests: Tests for rich error context and helpers
- Utils tests: Tests for shared utility functions
- Zig 0.15.0 or later
MIT License - see LICENSE for details.
- toon-format/spec - TOON format specification
- toon-format/toon - TypeScript reference implementation
- toon-format/toon-rust - Rust implementation
- toon-format/toon-go - Go implementation
- toon-format/toon-python - Python implementation