Shakeproof Benchmark CLI

Shakeproof Benchmark CLI is an open-source benchmarking tool designed to compare traditional web scraping against the LangShake Protocol—a new standard for AI-optimized, verifiable, structured content delivery.

This project empowers developers, webmasters, and AI platform integrators to validate, trust, and optimize the way web data is shared with LLMs and agents.

What Is LangShake?

LangShake introduces .well-known/llm.json and per-page JSON modules to allow any website to expose clean, verifiable, schema.org-compliant data—without bloating the HTML or risking misinterpretation.

This CLI measures:

Extraction accuracy
Speed
Trustworthiness (via checksum & Merkle tree validation)
Real-world crawl performance (across static and dynamic content)

Features

Compare Crawling Methods

Traditional Scraping: Extracts Schema.org data from raw HTML (including dynamic React/Next.js content).
LangShake Protocol: Uses .well-known/llm.json and verified JSON modules for direct data access.

Validates Integrity

Verifies each JSON module's SHA-256 checksum
Recalculates and confirms the Merkle root from all modules

Benchmarking Metrics

Extraction time (per page and per method)
Schema match validation
Trust pass/fail reports
Resource usage metrics (CPU, memory, bandwidth)

Extensive Testing

Fixture-driven Vitest test suite (real extraction, error cases, checksum logic)
CLI and SDK are covered with integration and unit tests

Security Features

URL Validation: Prevents data: URLs to protect against CVE-2025-58754 DoS attacks
Protocol Restrictions: Only allows HTTP, HTTPS, and file: URLs for safe crawling
Input Sanitization: Validates all user-provided URLs before processing

Installation

git clone https://github.com/langshake/shake-proof
cd shake-proof
npm install
npm link   # For global CLI access (development)

Usage

CLI

shakeproof --url https://example.com --json

Options:

--url <target>: Required. Domain or full URL to benchmark.
--method <type>: traditional, langshake, or both (default: both)
--json: Outputs structured results as JSON
--output <file>: Save output to file (default: output/<domain>.json)
--concurrency <num>: Max parallel page fetches (default: 5)

SDK

import { runBenchmark } from 'shakeproof-benchmark';

const result = await runBenchmark({
  url: 'https://example.com',
  method: 'both',
  debug: true
});

console.log(result.json);  // Machine-readable
console.log(result.human); // Human-readable summary

Output Format (JSON)

{
  "domainRoot": "https://example.com",
  "pages": [
    {
      "url": "https://example.com/page1",
      "langshake": [ { /* ...schema.org data... */ } ],
      "traditional": [ { /* ...schema.org data... */ } ],
      "comparison": {
        "schemasMatch": true,
        "langshakeChecksum": "...",
        "langshakeChecksumOriginal": "...",
        "langshakeChecksumValid": true,
        "traditionalChecksum": "...",
        "traditionalChecksumMatchesLangshake": true
      }
    }
  ],
  "summary": {
    "totalPages": 8,
    "allMatch": true,
    "details": "All schemas match.",
    "merkleRootLangshake": "...",
    "merkleRootTraditional": "...",
    "merkleRootLlmJson": "...",
    "merkleRootLangshakeValid": true,
    "merkleRootTraditionalValid": true,
    "merkleRootsMatch": true
  },
  "metrics": {
    "langshake": { /* ... metrics data ... */ },
    "traditional": { /* ... metrics data ... */ }
  }
}

Metrics Collected

Category	Metric
⚡ Speed	Avg page extraction time, total duration, requests per second (RPS)
🧠 Accuracy	Schema match (true/false), extraction correctness
🔐 Trust	Checksum/Merkle root verification, validation status
📊 Resources	CPU usage (user/system), memory usage (start/end/peak), network (bytes in/out), disk I/O
🌐 Network	HTTP status codes, total requests, average request time
❗ Errors	Error count, error details (per URL and message)
🧵 Concurrency	Max parallel requests observed

Shakeproof Benchmark Report

After each benchmark run, Shakeproof automatically generates a detailed markdown report summarizing the comparison between LangShake and traditional crawling. This report includes side-by-side metrics, performance savings, per-page checksums, and Merkle root validation, providing a clear, human-readable overview of extraction speed, resource usage, and data integrity.

See an example: Shakeproof Benchmark Report Sample

Architecture Overview

shake-proof/
├── src/
│   ├── crawlers/
│   │   ├── traditional.js    # HTML-based extraction (Cheerio + Selenium)
│   │   └── langshake.js      # JSON-based validation with Merkle tree
│   ├── benchmark/
│   │   └── compare.js        # Domain-wide and per-page benchmarking logic
│   ├── utils/
│   │   ├── generateReport.js # Markdown/HTML report generation
│   │   ├── merkle.js         # Checksum and Merkle root utilities
│   │   └── metrics.js        # Resource usage and metrics collection
│   ├── cli/
│   │   └── menu.js           # CLI entry and argument parsing
│   └── index.js              # SDK entry point (runBenchmark)
├── tests/
│   ├── crawlers/
│   │   ├── traditional.test.js
│   │   └── langshake.test.js
│   ├── benchmark/
│   │   └── compare.test.js
│   ├── utils/
│   │   ├── generateReport.test.js
│   │   └── metrics.test.js
│   ├── cli/
│   │   └── menu.test.js
│   └── fixtures/
│       ├── traditional/      # HTML fixture files for traditional crawler
│       └── langshake/        # JSON fixture files for langshake protocol

Testing

Run all tests:

npm test

Test coverage includes:

Traditional extraction (static + dynamic HTML)
LangShake crawler (checksum, malformed JSON, Merkle validation)
Benchmark engine (pass/fail cases, mixed outcomes)
CLI user flows (mocked)
Fixture checksum recalculation

About the LangShake Protocol

LangShake is a dual-layer micro-standard for machine-readable web content:

.well-known/llm.json: Declares site-wide structured data modules & metadata
Modular JSON files: Contain pure, schema.org-compliant JSON-LD arrays with checksums
Merkle root validation: Ensures integrity across modules

Learn more: whitepaper

Companion Tool: LangshakeIt CLI

To generate .well-known/llm.json and the per-page JSON-LD modules used by this benchmark tool, use our sister project: LangshakeIt CLI.

LangshakeIt is the easiest way to make your website AI- and LLM-friendly by extracting and publishing structured, verifiable data for every page.

What It Does

Extracts Schema.org-compliant JSON-LD from your built static site (no framework lock-in)
Outputs per-page JSON files (with checksums) and a global .well-known/llm.json index
Automatically calculates and embeds a Merkle root to ensure integrity
Supports optional LLM context via llm_context.json (e.g., ethical principles, usage notes)
Includes smart caching and auto-detection of your site's public base URL

Get Involved

LangShake is fully open source (MIT) and community-driven.

We welcome:

Web developers who want to expose AI-friendly content
Toolmakers who want to integrate LangShake support
Contributors to help expand crawler compatibility or reporting

GitHub: github.com/langshake

Roadmap

Add resource usage and impact profiling (CPU, memory)
Support fallback sitemaps when .llm.json is missing
Integrate with third-party SEO tools
Submit LangShake Sitemap extension to W3C

License

MIT — Free to use, fork, improve, and adapt.

Thanks

This project was inspired by the growing need for verifiable, trustworthy, and machine-optimized content delivery. We believe LangShake can be the robots.txt of the AI era.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
src		src
templates		templates
tests		tests
.gitignore		.gitignore
README.md		README.md
eslint.config.mjs		eslint.config.mjs
package-lock.json		package-lock.json
package.json		package.json
task.md		task.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Shakeproof Benchmark CLI

What Is LangShake?

Features

Compare Crawling Methods

Validates Integrity

Benchmarking Metrics

Extensive Testing

Security Features

Installation

Usage

CLI

SDK

Output Format (JSON)

Metrics Collected

Shakeproof Benchmark Report

Architecture Overview

Testing

About the LangShake Protocol

Companion Tool: LangshakeIt CLI

What It Does

Get Involved

Roadmap

License

Thanks

About

Uh oh!

Releases

Packages

Uh oh!

Languages

langshake/shake-proof

Folders and files

Latest commit

History

Repository files navigation

Shakeproof Benchmark CLI

What Is LangShake?

Features

Compare Crawling Methods

Validates Integrity

Benchmarking Metrics

Extensive Testing

Security Features

Installation

Usage

CLI

SDK

Output Format (JSON)

Metrics Collected

Shakeproof Benchmark Report

Architecture Overview

Testing

About the LangShake Protocol

Companion Tool: LangshakeIt CLI

What It Does

Get Involved

Roadmap

License

Thanks

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages