Skip to content

bkataru/toon-zig

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

toon-zig

A Zig implementation of the TOON (Token-Oriented Object Notation) format, version 2.0 compliant.

TOON is a human-readable data serialization format designed for token efficiency, with explicit structure and minimal quoting. It's particularly well-suited for arrays of uniform objects and is commonly used as a compact representation for LLM prompts.

Features

Core Features

  • Full TOON encoding and decoding
  • Primitive arrays with inline values ([3]: 1,2,3)
  • Tabular arrays with field lists ([2]{id,name}:)
  • List arrays with - markers
  • Nested objects and arrays
  • Multiple delimiters (comma, tab, pipe)
  • Strict mode validation
  • Zero dependencies (pure Zig standard library)
  • Streaming API - Memory-efficient encoding/decoding for large documents
  • Rich Error Context - Detailed error messages with line numbers and source context

v1.5+ Features

  • Key Folding (Encoder) - Collapse nested single-key objects into dotted paths
  • Path Expansion (Decoder) - Expand dotted keys back into nested structures
  • Flatten Depth Control - Limit the depth of key folding
  • Blank Line Validation - Strict mode errors on blank lines inside arrays

Security Features

  • Max Depth Limit - Configurable nesting depth limit (default: 256) to prevent stack overflow attacks
  • Type Coercion Control - Option to disable automatic number parsing

CI/CD

  • GitHub Actions - Automated testing on Linux, Windows, and macOS
  • Multi-platform builds - Release artifacts for all major platforms

Installation

Option 1: Using zig fetch (Recommended)

The easiest way to add toon-zig as a dependency is using the zig fetch command. Zig supports two URL formats:

Using Git URL (recommended):

zig fetch --save git+https://github.com/bkataru/toon-zig.git#HEAD

Using tarball URL:

zig fetch --save https://github.com/bkataru/toon-zig/archive/refs/heads/main.tar.gz

This will automatically download the package and add it to your build.zig.zon with the correct hash.

To fetch a specific version or tag:

# Using git URL with tag reference
zig fetch --save git+https://github.com/bkataru/toon-zig.git#v0.1.1

# Or using tarball URL for a specific tag
zig fetch --save https://github.com/bkataru/toon-zig/archive/refs/tags/v0.1.1.tar.gz

To save with a custom dependency name:

zig fetch --save=toon git+https://github.com/bkataru/toon-zig.git#HEAD

Note: The git+https:// protocol clones the repository directly, while tarball URLs download a snapshot archive. Git URLs are generally more reliable for version pinning.

Option 2: Manual Configuration

Alternatively, you can manually add toon-zig as a dependency in your build.zig.zon:

.dependencies = .{
    .toon = .{
        // Using git URL (recommended)
        .url = "git+https://github.com/bkataru/toon-zig.git#v0.1.1",
        // Or using tarball URL:
        // .url = "https://github.com/bkataru/toon-zig/archive/refs/tags/v0.1.1.tar.gz",
        .hash = "...", // Run `zig build` to get the correct hash
    },
},

Note: On the first build attempt, Zig will display the correct hash value. Copy that hash and update your build.zig.zon file accordingly.

Configuring build.zig

After adding the dependency (via either method), add the following to your build.zig:

const toon = b.dependency("toon", .{
    .target = target,
    .optimize = optimize,
});

exe.root_module.addImport("toon", toon.module("toon"));

Usage

Basic Encoding

const std = @import("std");
const toon = @import("toon");

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer _ = gpa.deinit();
    const allocator = gpa.allocator();

    // Create an object
    var fields = try allocator.alloc(toon.Value.Object.Field, 2);
    fields[0] = .{ .key = "name", .value = .{ .string = "Alice" } };
    fields[1] = .{ .key = "age", .value = .{ .number = 30 } };

    const value = toon.Value{ .object = .{ .fields = fields } };
    const options = toon.Options{};

    const encoded = try toon.encode(allocator, value, options);
    defer allocator.free(encoded);

    std.debug.print("{s}\n", .{encoded});
    // Output:
    // name: Alice
    // age: 30
}

Basic Decoding

const std = @import("std");
const toon = @import("toon");

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer _ = gpa.deinit();
    const allocator = gpa.allocator();

    const data =
        \\name: Alice
        \\age: 30
    ;

    const result = try toon.decode(allocator, data, .{});
    defer result.deinit(allocator);

    // Access decoded values
    const obj = result.value.object;
    for (obj.fields) |field| {
        std.debug.print("{s}: ", .{field.key});
        switch (field.value) {
            .string => |s| std.debug.print("{s}\n", .{s}),
            .number => |n| std.debug.print("{d}\n", .{n}),
            else => {},
        }
    }
}

Using Separate Encode/Decode Options

For more granular control, use the specialized option types:

const std = @import("std");
const toon = @import("toon");

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer _ = gpa.deinit();
    const allocator = gpa.allocator();

    // Encode with specific options
    var encode_opts = toon.EncodeOptions{};
    encode_opts.indent_size = 4;
    encode_opts.key_folding = .safe;
    
    const encoded = try toon.encodeWithOptions(allocator, value, encode_opts);
    defer allocator.free(encoded);

    // Decode with specific options
    var decode_opts = toon.DecodeOptions{};
    decode_opts.coerce_types = false;  // Keep numbers as strings
    decode_opts.max_depth = 100;       // Custom depth limit
    
    const result = try toon.decodeWithOptions(allocator, data, decode_opts);
    defer result.deinit(allocator);
}

Validation Only

To validate TOON data without fully parsing it:

const std = @import("std");
const toon = @import("toon");

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer _ = gpa.deinit();
    const allocator = gpa.allocator();

    const data = "items[3]: a,b,c";
    
    // Validate - throws error if invalid
    try toon.validate(allocator, data, .{});
    
    // Or with specific options
    var opts = toon.DecodeOptions{ .strict = true };
    try toon.validateWithOptions(allocator, data, opts);
}

Type Coercion Control

Control whether numeric-looking strings are automatically converted to numbers:

const std = @import("std");
const toon = @import("toon");

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer _ = gpa.deinit();
    const allocator = gpa.allocator();

    const data = "value: 123";

    // Default: coerce_types = true, "123" becomes number 123
    {
        const result = try toon.decode(allocator, data, .{});
        defer result.deinit(allocator);
        // result.value.object.fields[0].value is .number = 123
    }

    // With coerce_types = false, "123" stays as string "123"
    {
        var opts = toon.Options{};
        opts.coerce_types = false;
        
        const result = try toon.decode(allocator, data, opts);
        defer result.deinit(allocator);
        // result.value.object.fields[0].value is .string = "123"
    }
}

Key Folding (v1.5+)

Key folding collapses nested single-key objects into dotted paths for more compact output:

const std = @import("std");
const toon = @import("toon");

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer _ = gpa.deinit();
    const allocator = gpa.allocator();

    // Create nested structure: { data: { meta: { id: 1 } } }
    // ... (setup code)

    // Enable key folding
    var options = toon.Options{};
    options.key_folding = .safe;

    const encoded = try toon.encode(allocator, value, options);
    defer allocator.free(encoded);

    std.debug.print("{s}\n", .{encoded});
    // Output: data.meta.id: 1
}

Path Expansion (v1.5+)

Path expansion is the decoder counterpart to key folding, expanding dotted keys into nested objects:

const std = @import("std");
const toon = @import("toon");

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer _ = gpa.deinit();
    const allocator = gpa.allocator();

    const data = "data.meta.id: 1";

    // Enable path expansion
    var options = toon.Options{};
    options.expand_paths = .safe;

    const result = try toon.decode(allocator, data, options);
    defer result.deinit(allocator);

    // Result is: { data: { meta: { id: 1 } } }
}

Streaming API

For memory-efficient processing of large documents, use the streaming API:

Streaming Encoder

const std = @import("std");
const toon = @import("toon");

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer _ = gpa.deinit();
    const allocator = gpa.allocator();

    // Write directly to stdout or any writer
    var stdout_buffer: [4096]u8 = undefined;
    var stdout_writer = std.fs.File.stdout().writer(&stdout_buffer);
    
    // ... create value ...
    
    // Stream encode - writes directly without buffering entire output
    try toon.streamingEncode(allocator, stdout_writer.any(), value, .{});
}

Streaming Decoder

const std = @import("std");
const toon = @import("toon");

fn valueCallback(path: []const []const u8, value: toon.Value, user_data: ?*anyopaque) !void {
    // Process each value as it's parsed
    std.debug.print("Path: ", .{});
    for (path) |segment| {
        std.debug.print("{s}.", .{segment});
    }
    std.debug.print(" = {any}\n", .{value});
}

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer _ = gpa.deinit();
    const allocator = gpa.allocator();

    var stdin_buffer: [4096]u8 = undefined;
    var stdin_reader = std.fs.File.stdin().reader(&stdin_buffer);
    
    // Stream parse with callback for each value
    try toon.streamParse(allocator, stdin_reader.any(), .{}, valueCallback, null);
}

Rich Error Context

Get detailed error information including line numbers and source context:

const std = @import("std");
const toon = @import("toon");

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer _ = gpa.deinit();
    const allocator = gpa.allocator();

    const invalid_data = "  bad indentation";

    // Use decodeWithContext for rich error information
    const result = toon.decodeWithContext(allocator, invalid_data, .{ .strict = true });
    
    if (result.isError()) {
        // Get the parse error with full context
        if (result.getParseError()) |parse_err| {
            const msg = try parse_err.toString(allocator);
            defer allocator.free(msg);
            std.debug.print("Error: {s}\n", .{msg});
            // Output: "error at line 1: indentation must be a multiple of indent size"
        }
        
        // Or access individual fields
        if (result.getErrorContext()) |ctx| {
            std.debug.print("Line: {d}, Column: {d}\n", .{ ctx.line, ctx.column });
            if (ctx.source_line) |src| {
                std.debug.print("Source: {s}\n", .{src});
            }
        }
    } else {
        defer result.deinit(allocator);
        // Use result.value...
    }
}

Round-Trip with Folding and Expansion

Key folding and path expansion are designed to work together for lossless round-trips:

// Original: { data: { meta: { id: 1 } } }

// Encode with key_folding = .safe
// → "data.meta.id: 1"

// Decode with expand_paths = .safe
// → { data: { meta: { id: 1 } } }

Array Formats

TOON supports several array formats:

Inline Primitive Array

[3]: 1,2,3

Tabular Array (CSV-like)

[2]{id,name}:
  1,Alice
  2,Bob

List Array

[3]:
  - hello
  - world
  - [2]: a,b

Options Reference

Unified Options

const options = toon.Options{
    // Validation
    .strict = true,                    // Enable strict validation (default: true)

    // Formatting
    .indent_size = 2,                  // Spaces per indent level (default: 2)
    .document_delimiter = .comma,      // Delimiter for document values
    .array_delimiter = .comma,         // Delimiter for array values (.comma, .tab, .pipe)

    // v1.5+ Key Folding (Encoder)
    .key_folding = .off,               // .off or .safe
    .flatten_depth = null,             // Max depth to fold (null = unlimited)

    // v1.5+ Path Expansion (Decoder)
    .expand_paths = .off,              // .off or .safe

    // Type handling
    .coerce_types = true,              // Auto-convert numeric strings to numbers

    // Security
    .max_depth = null,                 // Max nesting depth (null = 256)
};

Encode-Specific Options

const encode_opts = toon.EncodeOptions{
    .indent_size = 2,
    .document_delimiter = .comma,
    .array_delimiter = .comma,
    .key_folding = .off,
    .flatten_depth = null,
};

Decode-Specific Options

const decode_opts = toon.DecodeOptions{
    .strict = true,
    .indent_size = 2,
    .expand_paths = .off,
    .coerce_types = true,
    .max_depth = null,
};

API Reference

Types

Value

A tagged union representing TOON values:

pub const Value = union(enum) {
    null,
    bool: bool,
    number: f64,
    string: []const u8,
    array: []Value,
    object: Object,
};

DecodedValue

Result from decoding, includes the value and backing buffer:

pub const DecodedValue = struct {
    value: Value,
    buffer: []const u8,

    pub fn deinit(self: DecodedValue, allocator: Allocator) void;
};

Functions

Core Encoding/Decoding

Function Description
encode(allocator, value, options) Encode a Value to TOON string
encodeWithOptions(allocator, value, encode_opts) Encode with EncodeOptions
encodeWithContext(allocator, value, options) Encode with rich error context (returns EncodeResultWithContext)
encodeWithContextAndOptions(allocator, value, encode_opts) Encode with rich error context using EncodeOptions
decode(allocator, data, options) Decode TOON string to Value
decodeWithOptions(allocator, data, decode_opts) Decode with DecodeOptions
decodeWithContext(allocator, data, options) Decode with rich error context (returns DecodeResultWithContext)
decodeWithContextAndOptions(allocator, data, decode_opts) Decode with rich error context using DecodeOptions
validate(allocator, data, options) Validate without returning value
validateWithOptions(allocator, data, decode_opts) Validate with DecodeOptions

Streaming API

Function Description
streamingEncode(allocator, writer, value, options) Stream encode to writer
streamingEncodeWithOptions(allocator, writer, value, encode_opts) Stream encode with EncodeOptions
streamParse(allocator, reader, options, callback, user_data) Stream parse with callback
createStreamingEncoder(allocator, writer, options) Create streaming encoder instance
createStreamingEncoderWithOptions(allocator, writer, encode_opts) Create streaming encoder with EncodeOptions
createStreamingDecoder(allocator, reader, options) Create streaming decoder instance
createStreamingDecoderWithOptions(allocator, reader, decode_opts) Create streaming decoder with DecodeOptions

I/O Helpers

Function Description
writeValue(writer, value, options) Write encoded value directly to an std.Io.Writer
readValue(reader, allocator, options) Read and decode value from an std.Io.Reader

Type Conversion

Function Description
encodeAlloc(allocator, value, options) Encode any Zig struct/type to TOON string
parseFromSlice(T, allocator, data, options) Parse TOON string into a Zig type T

Error Helpers

Function Description
errorMessage(err) Get human-readable message for a ToonError
errorAt(line, err) Create a ParseError at a specific line
errorAtPos(line, column, err) Create a ParseError at a specific line and column
errorWithSource(line, source_line, err) Create a ParseError with source line context
errorWithSuggestion(line, err, suggestion) Create a ParseError with a fix suggestion

Utility Functions

Function Description
looksNumeric(s) Check if a string looks like a number (needs quoting)
hasLeadingZeroDecimal(s) Check if string has leading zeros (e.g., "007")
isValidUnquotedKey(key) Check if a key can be written without quotes
isIdentifierSegment(s) Check if string is valid for key folding/expansion
areAllSegmentsIdentifiers(path) Check if all segments of a dotted path are valid identifiers

Error Types

pub const ToonError = error{
    // Syntax errors
    MissingColon,
    InvalidHeader,
    InvalidEscape,
    UnterminatedString,
    InvalidIndentation,
    TabsInIndentation,
    UnsupportedControlChar,
    InvalidKey,
    InvalidNumber,

    // Structural errors
    LengthMismatch,
    WidthMismatch,
    BlankLineInArray,
    UnexpectedIndentation,
    InvalidListItem,
    DelimiterMismatch,

    // Path expansion errors (v1.5+)
    PathExpansionConflict,

    // Security errors
    MaxDepthExceeded,

    // General errors
    UnexpectedEof,
    InvalidInput,
};

Constants

Constant Value Description
MAX_DEPTH 256 Default maximum nesting depth

Callback Types

/// Progress callback for streaming encoder
/// Called after each line is written with line number (1-based) and cumulative bytes written
pub const ProgressCallback = *const fn (line_num: usize, bytes_written: usize) void;

/// Value callback for streaming decoder
/// Called for each parsed value with its path and the value
pub const ValueCallback = *const fn (path: []const []const u8, value: Value, user_data: ?*anyopaque) anyerror!void;

StreamingEncoder Methods

Method Description
init(allocator, writer, options) Initialize encoder
initWithCallback(allocator, writer, options, callback) Initialize with progress callback
encode(value) Encode a value to the writer
encodeWithCallback(value, callback) Encode with progress callback
getLineCount() Get total lines written
getBytesWritten() Get total bytes written

StreamingDecoder Methods

Method Description
init(allocator, reader, options) Initialize decoder
deinit() Clean up resources
readAll() Read and parse entire stream, return complete Value
readLine() Read next line from stream
parseLine(line, callback, user_data) Parse a single line, invoke callback for values
streamParse(callback, user_data) Parse entire stream with callbacks
getLastError() Get the last error that occurred
getCurrentLine() Get current line number
getState() Get current parsing state
reset(new_reader) Reset decoder for a new stream

Rich Error Types

/// Generic result type for error context propagation
pub fn Result(comptime T: type) type {
    return union(enum) {
        ok: T,
        err: ResultError,

        pub fn success(value: T) Self;
        pub fn failure(code: ToonError, ctx: ErrorContext) Self;
        pub fn failureAtLine(code: ToonError, line: usize) Self;
        pub fn isOk(self) bool;
        pub fn isError(self) bool;
        pub fn unwrap(self) ToonError!T;
        pub fn getErrorContext(self) ?ErrorContext;
        pub fn getParseError(self) ?ParseError;
        pub fn formatError(self, allocator) !?[]const u8;
    };
}

/// Result with error context for decoding
pub const DecodeResultWithContext = struct {
    value: ?Value,
    buffer: ?[]const u8,
    error_info: ?struct {
        code: ToonError,
        context: ErrorContext,
    },

    pub fn isOk(self) bool;
    pub fn isError(self) bool;
    pub fn getErrorCode(self) ?ToonError;
    pub fn getErrorContext(self) ?ErrorContext;
    pub fn getParseError(self) ?ParseError;
    pub fn formatError(self, allocator) !?[]const u8;
    pub fn unwrap(self) ToonError!Value;
    pub fn deinit(self, allocator) void;
};

/// Result with error context for encoding
pub const EncodeResultWithContext = struct {
    output: ?[]const u8,
    error_info: ?struct {
        code: ToonError,
        context: ErrorContext,
    },

    pub fn isOk(self) bool;
    pub fn isError(self) bool;
    pub fn unwrap(self) ToonError![]const u8;
    pub fn success(output: []const u8) EncodeResultWithContext;
    pub fn failure(code: ToonError, ctx: ErrorContext) EncodeResultWithContext;
};

/// Error context with location information
pub const ErrorContext = struct {
    line: usize,           // 1-based line number
    column: usize,         // 1-based column number
    source_line: ?[]const u8,  // The actual source line
    suggestion: ?[]const u8,   // Hint for fixing the error
};

/// Parse error combining error code with context
pub const ParseError = struct {
    err: ToonError,
    context: ErrorContext,

    pub fn toString(self, allocator) ![]const u8;
};

Numeric Precision

TOON uses f64 (64-bit floating point) for all numeric values, matching the JSON data model. This has some implications:

Precision Limits

  • Safe integer range: Integers from -9007199254740991 to 9007199254740991 (±2^53 - 1) are exactly representable
  • Beyond safe range: Larger integers may lose precision when round-tripped
  • Decimal precision: Approximately 15-17 significant decimal digits

Examples

// These are exactly representable
const safe_int: f64 = 9007199254740991;  // 2^53 - 1

// This loses precision
const too_large: f64 = 100000000000000000000;  // 1e20
// Actual value stored may differ from intended value

Recommendations

  1. For large integers that must be exact, store them as quoted strings
  2. For financial data, consider using fixed-point representations as strings
  3. For UUIDs or large IDs, use strings rather than numbers

Special Values

Input Output
NaN Encoded as null
+Infinity Encoded as null
-Infinity Encoded as null
-0 Normalized to 0

Security Considerations

Max Depth Limit

To prevent stack overflow attacks from deeply nested malicious input, a maximum depth limit is enforced:

// Default limit is 256 levels
const result = try toon.decode(allocator, malicious_input, .{});

// Custom limit for more restrictive environments
var opts = toon.Options{};
opts.max_depth = 32;  // Only allow 32 levels of nesting
const result = try toon.decode(allocator, data, opts);

When the depth limit is exceeded, ToonError.MaxDepthExceeded is returned.

Input Validation

  • Always use strict mode (strict = true, the default) when processing untrusted input
  • Strict mode validates:
    • Array length declarations match actual content
    • Indentation is consistent
    • No blank lines inside arrays
    • Delimiter consistency

Specification Compliance

This implementation targets TOON v2.0 and includes:

Feature Status
Core encoding/decoding
Primitive arrays
Tabular arrays
List arrays
Arrays of arrays
Delimiter support (comma, tab, pipe)
Strict mode validation
Key folding (v1.5+)
Path expansion (v1.5+)
Flatten depth control
Blank line validation
Type coercion control
Max depth security limit
Separate encode/decode options
Validation-only mode
Streaming encoder
Streaming decoder
Rich error context
GitHub Actions CI

Building and Testing

# Run unit tests
zig build test

# Run fixture tests (official spec fixtures)
zig build fixture-test

# Run streaming module tests
zig build test-streaming

# Run error module tests
zig build test-errors

# Run utils module tests
zig build test-utils

# Run all tests
zig build test-all

# Build library and CLI
zig build

# Build release
zig build -Doptimize=ReleaseSafe

Continuous Integration

This project uses GitHub Actions for CI/CD:

  • Test: Runs on every push/PR on Ubuntu, Windows, and macOS
  • Build: Builds the CLI binary for all platforms
  • Release Build: Creates optimized release artifacts
  • Lint: Checks code formatting with zig fmt

See .github/workflows/ci.yml for the full configuration.

Test Coverage

  • Unit tests: 100+ tests covering all features
  • Fixture tests: 340 official spec fixtures (196 decode + 144 encode)
  • Streaming tests: Tests for streaming encoder and decoder
  • Error tests: Tests for rich error context and helpers
  • Utils tests: Tests for shared utility functions

Requirements

  • Zig 0.15.0 or later

License

MIT License - see LICENSE for details.

Related Projects

About

Community-driven Zig implementation of TOON

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages