Skip to content

SmallThingz/html_parser

Repository files navigation

🚀 htmlparser

High-throughput, destructive HTML parser + CSS selector engine for Zig.

zig license mode

⚠️ Conformance Warning

Performance numbers are not conformance claims. The parser is intentionally permissive and currently does not fully match browser-grade tree-construction behavior.

🏁 Performance

See the latest benchmark snapshot for more details

Source: bench/results/latest.json (stable profile).

Parse Throughput (Average Across Fixtures)

ours     │████████████████████│ 1211.13 MB/s (100.00%)
lol-html │███████████████░░░░░│ 920.47 MB/s (76.00%)
lexbor   │████░░░░░░░░░░░░░░░░│ 215.87 MB/s (17.82%)

Conformance Snapshot

Profile nwmatcher qwery_contextual html5lib subset WHATWG HTML parsing WPT HTML parsing
strictest/fastest 20/20 (0 failed) 54/54 (0 failed) 521/600 (79 failed) 417/500 (83 failed) 417/500 (83 failed)

Source: bench/results/external_suite_report.json

⚡ Features

  • 🔎 CSS selector queries: comptime, runtime, and cached runtime selectors.
  • 🧭 DOM navigation: parent, siblings, first/last child, and children iteration.
  • 💤 Lazy decode/normalize path: attribute/entity decode and text normalization happen on query-time APIs.
  • 🧪 Debug tooling: selector mismatch diagnostics and instrumentation wrappers.
  • 🧰 Parse profiles: strictest and fastest option bundles for benchmarks/workloads.
  • 🧵 Mutable-input parser model optimized for throughput.

🚀 Quick Start

const std = @import("std");
const html = @import("htmlparser");
const options: html.ParseOptions = .{};
const Document = options.GetDocument();

test "basic parse + query" {
    var doc = Document.init(std.testing.allocator);
    defer doc.deinit();

    var input = "<div id='app'><a class='nav' href='/docs'>Docs</a></div>".*;
    try doc.parse(&input, .{});

    const a = doc.queryOne("div#app > a.nav") orelse return error.TestUnexpectedResult;
    try std.testing.expectEqualStrings("/docs", a.getAttributeValue("href").?);
}

📚 Documentation

🧪 Build and Validation

zig build test
zig build docs-check
zig build examples-check
zig build ship-check

📎 Examples

  • examples/basic_parse_query.zig
  • examples/runtime_selector.zig
  • examples/cached_selector.zig
  • examples/query_time_decode.zig
  • examples/inner_text_options.zig

📜 License

MIT. See LICENSE.

About

A really fast but not full compliant html parser written in zig with GiB/s+ throughput

Topics

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages