The easiest way to make your website AI- and LLM-friendly.
LangshakeIt generates verifiable, Schema.org-compliant JSON-LD for every page, plus a global .well-known/llm.json index for AI agents.
- π Automatic structured data extraction from built HTML (no framework lock-in)
- ποΈ Per-page JSON-LD and a global
.well-known/llm.jsonindex - π Checksum & Merkle root validation for data integrity
- β‘ Smart caching: only updates changed files
- π οΈ Config auto-update: always reflects your real public base URL
- π§ͺ Fully tested: robust integration and unit tests
git clone https://github.com/langshake/langshake-it
cd langshake-it
npm install
npm link # For global CLI access (development)# 1. Initialize (creates config/context files)
langshakeit init
# 2. Build your static site (e.g., Next.js, Astro, etc.)
npm run build
# 3. Run LangshakeIt
langshakeit --input out --out public/langshake --llm public/.well-known/llm.json- Your per-page JSON-LD will be in
langshake/ - Your global index will be in
.well-known/llm.json - LLM/AI agents will discover your site via the standard
.well-known/llm.jsonfile
No arguments are needed if your config file langshake.config.json is set up - which happens on first run.
langshakeit [options]| Option | Description | Default |
|---|---|---|
--input |
Directory to scan for built HTML files | out |
--out |
Output directory for JSON-LD files | public/langshake |
--llm |
Path to .well-known/llm.json |
public/.well-known/llm.json |
--build |
Build command to run before extraction | e.g., "npm run build" |
--base-url |
Fallback base URL if not auto-detected | http://localhost |
--force |
Force rebuild all files | false |
--dry-run |
Show what would be done without writing files | false |
--verbose |
Enable verbose output | false |
All options are saved to langshake.config.json and auto-updated after each run.
Tip: If your build command contains spaces (e.g., npm run build), wrap it in quotes: --build "npm run build".
LangshakeIt outputs each page's extracted JSON-LD in a universal, verifiable format:
- The output is always an array, even if there is only one JSON-LD object.
- The last element of the array is always an object
{ "checksum": "..." }. - The checksum is calculated from the array of JSON-LD objects (excluding the checksum object itself).
- The original JSON-LD objects are not mutated or wrapped.
Extracted:
[
{
"@context": "https://schema.org",
"@type": "WebPage",
"name": "Xavier MaciΓ 's Portfolio"
// ...other fields...
}
]Output:
[
{
"@context": "https://schema.org",
"@type": "WebPage",
"name": "Xavier MaciΓ 's Portfolio"
// ...other fields...
},
{
"checksum": "d331a28b4568528974860d703cde8b1dac5275e82449ece217c51e4b6882eee4"
}
]Extracted:
[
{ "@type": "WebPage", "name": "A" },
{ "@type": "WebPage", "name": "B" }
]Output:
[
{ "@type": "WebPage", "name": "A" },
{ "@type": "WebPage", "name": "B" },
{ "checksum": "..." }
]- Read the file as an array.
- Remove the last element (the checksum object).
- Calculate the checksum on the remaining array.
- Compare to the value in the removed checksum object.
This format is universal, easy to verify, and works for both single and multiple JSON-LD objects.
To provide additional site-level context, principles, or usage notes for LLMs, edit the llm_context.json file in your project root. This file is based on the provided llm_context.example.json:
{
"summary": "Langshake exposes structured, verifiable content for AI and LLM agents. This site provides Schema.org-compliant JSON-LD for every page, plus a global index for discovery and verification.",
"principles": [
"Transparency: All structured data is open and verifiable.",
"Accuracy: Content is kept in sync with the site and validated against Schema.org.",
"Privacy: No personal or sensitive data is exposed in the index."
],
"usage_notes": [
"LLM agents should use the .well-known/llm.json index to discover available modules and verify their integrity.",
"Each module's JSON-LD file includes a checksum for tamper detection.",
"The Merkle root in llm.json allows for efficient verification of all modules."
]
}- How to use:
- Copy
llm_context.example.jsontollm_context.jsonin your project root. - Edit
llm_context.jsonto add your own summary, principles, usage notes, or whatever is useful to provide more context to LLMs - This context will be included in your
.well-known/llm.jsonand is visible to LLMs and AI agents. - Caution with Context Fields: Fields like
llm_context.jsonare unverified and should not be used for factual reasoning or truth-grounding by default.
- Copy
To automatically run LangshakeIt after every site build, add the following to your package.json, example Next.js:
"scripts": {
"dev": "next dev",
"build": "next build",
"langshake": "langshakeit",
"postbuild": "POSTBUILD=1 npm run langshake",
"start": "next start",
"lint": "next lint"
}- On Windows, use:
"postbuild": "set POSTBUILD=1 && npm run langshake"
Now, whenever you run npm run build, LangshakeIt will run automatically after your site is built, keeping your structured data up to date with no extra steps.
Warning: If you run
langshakeit --build "npm run build"and also have a postbuild script that runs LangshakeIt, it will cause LangshakeIt to run twice for every build. Use only one approach.
Tip: Setting
POSTBUILD=1in your postbuild script will suppress the build warning from LangshakeIt, since the build has already run.
- Scans your built HTML for all pages in the
--inputdirectory. - Extracts all JSON-LD and writes per-page files.
- Builds a global
.well-known/llm.jsonwith module URLs, Merkle root, and metadata. - Auto-detects your public base URL by checking (in order):
robots.txt(Sitemap:line)sitemap.xml(first<loc>)- JSON-LD in your home page
--base-urlconfig/CLI option
- LLM/AI discovery is handled solely via the standard
.well-known/llm.jsonfile.- This approach follows web standards (RFC 8615) and has zero SEO impact.
/
βββ src/
β βββ cli/
β β βββ index.js
β βββ core/
β β βββ scanPages.js
β β βββ generateSchema.js
β β βββ writeJsonLD.js
β β βββ buildLLMIndex.js
β β βββ cache.js
βββ public/
β βββ langshake/
β βββ .well-known/llm.json
βββ tests/
β βββ fixtures/
β β βββ simple-site/
β β β βββ src/pages/
β β β βββ about.html
β β β βββ contact.mdx
β β βββ html/
β β βββ benchmark.html
β βββ scanPages.test.js
β βββ generateSchema.test.js
β βββ writeJsonLD.test.js
β βββ cache.test.js
β βββ buildLLMIndex.test.js
β βββ integration.cli.test.js
βββ .langshake-cache.json
βββ langshake.config.json
βββ package.json
βββ README.md
Run all tests with:
npx vitestIntegration tests use a real Next.js fixture site to ensure realistic, end-to-end extraction. All temp files and node_modules are cleaned up after each run.
- Only built
.htmlfiles are processed (no.jsx/.mdxsource parsing) - Extraction is robust to missing or malformed files
- The CLI never overwrites or deletes user files except the generated json files
- Uses
zodfor data validation - Enforces code style with ESLint and Prettier
- All code is modular, with clear separation of CLI, core logic, and tests
LangShake is a dual-layer micro-standard for machine-readable web content:
- .well-known/llm.json: Declares site-wide structured data modules & metadata
- Modular JSON files: Contain pure, schema.org-compliant JSON-LD arrays with checksums
- Merkle root validation: Ensures integrity across modules
Learn more: whitepaper
After generating your .well-known/llm.json and per-page JSON-LD modules with LangshakeIt, you can verify, validate, and benchmark them using Shakeproof CLI β the official LangShake protocol testing suite.
Shakeproof ensures your structured data is not only well-formed, but also trustworthy, accurate, and LLM-ready.
- Compares your LangShake modules with traditional HTML-extracted Schema.org data
- Verifies checksums and recalculates the Merkle root to ensure data integrity
- Benchmarks structured data extraction speed, accuracy, and trustworthiness
- Reports errors, mismatches, and trust failures across your entire site
- Provides both CLI and programmatic SDK usage
Use LangshakeIt to generate trusted structured data.
Use Shakeproof to validate and prove it.
GitHub Repo: github.com/langshake/shakeproof
LangShake is fully open source (MIT) and community-driven.
We welcome:
- Web developers who want to expose AI-friendly content
- Toolmakers who want to integrate LangShake support
- Contributors to help expand crawler compatibility or reporting
GitHub: github.com/langshake
MIT β Free to use, fork, improve, and adapt.
This project was inspired by the growing need for verifiable, trustworthy, and machine-optimized content delivery. We believe LangShake can be the robots.txt of the AI era.