Curate and index documentation from any website into collections like tailwind/, horses/, etc. Reference collection indexes in your AI chats (e.g. @tailwind/INDEX.xml what's a utility?) so that only relevant docs are analysed. Much cleaner than a web-fetch and more focussed than a web-search. Keep your AI context sharp.
Available collections in this repo:
| Collection | Collection Index | Description | Scraped | Source |
|---|---|---|---|---|
📦 biome/ |
📄 biome/INDEX.xml |
Fast linter/formatter | 2025-11-04 | Official |
📦 claudecode/ |
📄 claudecode/INDEX.xml |
Anthropic Claude Code | 2026-01-07 | Official |
📦 claudeplat/ |
📄 claudeplat/INDEX.xml |
Anthropic Claude Platform | 2026-01-07 | Official |
📦 clerk/ |
📄 clerk/INDEX.xml |
Authentication | 2025-12-03 | Official |
📦 convex/ |
📄 convex/INDEX.xml |
Reactive database | 2026-01-07 | Official |
🪝 lefthook/ |
📄 lefthook/INDEX.xml |
Git hooks manager | 2025-11-24 | Official |
📦 marimo/ |
📄 marimo/INDEX.xml |
Reactive Python notebooks | 2025-11-11 | Official |
📦 nextjs/ |
📄 nextjs/INDEX.xml |
React framework | 2025-12-02 | Official |
📦 playwright/ |
📄 playwright/INDEX.xml |
Browser testing | 2025-11-07 | Official |
📦 shadcn/ |
📄 shadcn/INDEX.xml |
React UI components | 2025-12-16 | Official, Guide |
📦 shiny/ |
📄 shiny/INDEX.xml |
Python web apps | 2025-11-02 | Official |
📦 tailwind/ |
📄 tailwind/INDEX.xml |
CSS framework | 2025-10-15 | Official |
📦 tailwindplus/ |
📄 tailwindplus/INDEX.xml |
Paid UI Components | 2025-11-16 | Official |
📦 uv/ |
📄 uv/INDEX.xml |
Python projects | 2026-01-16 | Official |
📦 vercel/ |
📄 vercel/INDEX.xml |
Deployment platform | 2025-10-20 | Official |
📦 vitest/ |
📄 vitest/INDEX.xml |
Testing framework | 2025-11-05 | Official |
📦 zustand/ |
📄 zustand/INDEX.xml |
State management | 2026-01-03 | Official |
Curate your own collections. The lefthook collection is non-standard, docs directly downloaded from GitHub. For Anthropic docs use this tool.
# 1. Install UV
# 👉 https://docs.astral.sh/uv/getting-started/installation/
# 2. Clone repository
git clone https://github.com/michellepace/docs-for-ai.git
cd docs-for-ai
# 3. Get free FireCrawl API key
# Visit: https://www.firecrawl.dev/app/api-keys
# 4. Add to your shell profile
echo 'export API_KEY_MCP_FIRECRAWL=your-api-key-here' >> ~/.zshrc
source ~/.zshrc # Use ~/.bashrc if that's your shellImportant
Edit the paths in .claude/commands/ask-docs.md to match your local setup. To use from anywhere, move it to ~/.claude/commands/.
| Slash Command | Purpose | .md Files | INDEX <source> |
|---|---|---|---|
/curate-doc <collection> <url> |
Add new or re-scrape | ✅ Write | ✅ Add/update INDEX.xml |
/rescrape-docs <collection> |
Re-scrape all docs | ✅ Write all | ✅ Selective update INDEX.xml |
/improve-index-xml <collection> |
Batch improve descriptions | 📖 Read | ✅ Update INDEX.xml |
/ask-docs <collection> <question> |
Query any collection | Docs analysed | Relevant docs identified |
Assume tailwind was not already a collection in this repo:
# Start a new collection
/curate-doc tailwind https://tailwindcss.com/docs/customizing-colors
# → Creates tailwind/ collection directory, with README.md + INDEX.xml, and first curated doc
# Re-scrape existing doc (refresh content from same URL)
/curate-doc tailwind https://tailwindcss.com/docs/customizing-colors
# → Re-scrapes, writes .md file, replaces source in INDEX.xml
# Curate a new doc into collection
/curate-doc tailwind https://tailwindcss.com/docs/styling-with-utility-classes
# → Scrapes page into collection, writes .md file, adds source to INDEX.xml
# Re-scrape all docs in collection
/rescrape-docs tailwind
# → Re-scrapes all URLs in INDEX.xml, writes all .md files, updates descriptions for changed content
# ✨ Use the docs
/ask-docs tailwind Please evaluate my project for correct usage of utility classes?
# → Searches tailwind/INDEX.xml for relevant docs, analyses these, gives you an answerWorkflow: Python script scrapes URL → writes .md file → creates INDEX.xml entry with PLACEHOLDER description → Claude Code generates semantic description.
The /curate-doc command always regenerates the description, whereas /rescrape-docs only regenerates descriptions for files with content changes.
Directory Structure:
uv/
├── INDEX.xml # Index of all docs
├── README.md
├── api-reference.md # Scraped doc
├── getting-started.md # Scraped doc
└── ...
INDEX.xml Schema:
<docs_index>
<source>
<title>Hello Document Title</title>
<description>20-30 word dense summary optimised for semantic search...</description>
<source_url>https://docs.example.com/hello</source_url>
<local_file>hello-document-title.md</local_file>
<scraped_at>2025-10-15</scraped_at>
</source>
<!-- Multiple <source> entries, one per .md file -->
</docs_index>Scripts use FireCrawl Python SDK. MCP server also configured (.mcp.json, .claude/settings.json).
Instead of crawling, rather go to GitHub and automate downloading and index creation. Docs are much cleaner than crawling. Keep .mdx files as-is; do not convert to .md. Trade-off: bulk downloads bloat the index; curating individually keeps focus.
Instruction given to Claude Code and successfully run on uv/ directory to update all documents via direct HTTP fetch (Python script), so no scraping, 100% clean, and no Firecrawl tokens.
Adding this as a note for later to refactor to this method. (The screenshot mentions curl but we used Python's urllib.request.)
