Skip to content

Curate and index clean docs for clean AI context to ask questions against docs.

Notifications You must be signed in to change notification settings

michellepace/docs-for-ai

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

220 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Curate Docs For AI (with Claude Code)

Curate and index documentation from any website into collections like tailwind/, horses/, etc. Reference collection indexes in your AI chats (e.g. @tailwind/INDEX.xml what's a utility?) so that only relevant docs are analysed. Much cleaner than a web-fetch and more focussed than a web-search. Keep your AI context sharp.

Terminal showing three-step workflow: (1) Running /curate-doc biome command, (2) Curation success output showing scraped documentation and generated INDEX.xml entry, (3) Use /ask-docs to query docs. Handwritten annotations highlight each step.

Complete workflow: curate → auto scrape → "/ask-docs biome Validate my config file please"

📦 Repo Collections

Available collections in this repo:

Collection Collection Index Description Scraped Source
📦 biome/ 📄 biome/INDEX.xml Fast linter/formatter 2025-11-04 Official
📦 claudecode/ 📄 claudecode/INDEX.xml Anthropic Claude Code 2026-01-07 Official
📦 claudeplat/ 📄 claudeplat/INDEX.xml Anthropic Claude Platform 2026-01-07 Official
📦 clerk/ 📄 clerk/INDEX.xml Authentication 2025-12-03 Official
📦 convex/ 📄 convex/INDEX.xml Reactive database 2026-01-07 Official
🪝 lefthook/ 📄 lefthook/INDEX.xml Git hooks manager 2025-11-24 Official
📦 marimo/ 📄 marimo/INDEX.xml Reactive Python notebooks 2025-11-11 Official
📦 nextjs/ 📄 nextjs/INDEX.xml React framework 2025-12-02 Official
📦 playwright/ 📄 playwright/INDEX.xml Browser testing 2025-11-07 Official
📦 shadcn/ 📄 shadcn/INDEX.xml React UI components 2025-12-16 Official, Guide
📦 shiny/ 📄 shiny/INDEX.xml Python web apps 2025-11-02 Official
📦 tailwind/ 📄 tailwind/INDEX.xml CSS framework 2025-10-15 Official
📦 tailwindplus/ 📄 tailwindplus/INDEX.xml Paid UI Components 2025-11-16 Official
📦 uv/ 📄 uv/INDEX.xml Python projects 2026-01-16 Official
📦 vercel/ 📄 vercel/INDEX.xml Deployment platform 2025-10-20 Official
📦 vitest/ 📄 vitest/INDEX.xml Testing framework 2025-11-05 Official
📦 zustand/ 📄 zustand/INDEX.xml State management 2026-01-03 Official

Curate your own collections. The lefthook collection is non-standard, docs directly downloaded from GitHub. For Anthropic docs use this tool.


🚀 Setup

# 1. Install UV
# 👉 https://docs.astral.sh/uv/getting-started/installation/

# 2. Clone repository
git clone https://github.com/michellepace/docs-for-ai.git
cd docs-for-ai

# 3. Get free FireCrawl API key
# Visit: https://www.firecrawl.dev/app/api-keys

# 4. Add to your shell profile
echo 'export API_KEY_MCP_FIRECRAWL=your-api-key-here' >> ~/.zshrc
source ~/.zshrc  # Use ~/.bashrc if that's your shell

📖 Usage via Slash Commands

Important

Edit the paths in .claude/commands/ask-docs.md to match your local setup. To use from anywhere, move it to ~/.claude/commands/.

Slash Command Purpose .md Files INDEX <source>
/curate-doc <collection> <url> Add new or re-scrape ✅ Write ✅ Add/update INDEX.xml
/rescrape-docs <collection> Re-scrape all docs ✅ Write all ✅ Selective update INDEX.xml
/improve-index-xml <collection> Batch improve descriptions 📖 Read ✅ Update INDEX.xml
/ask-docs <collection> <question> Query any collection Docs analysed Relevant docs identified

💡 Usage Example

Assume tailwind was not already a collection in this repo:

# Start a new collection
/curate-doc tailwind https://tailwindcss.com/docs/customizing-colors
# → Creates tailwind/ collection directory, with README.md + INDEX.xml, and first curated doc

# Re-scrape existing doc (refresh content from same URL)
/curate-doc tailwind https://tailwindcss.com/docs/customizing-colors
# → Re-scrapes, writes .md file, replaces source in INDEX.xml

# Curate a new doc into collection
/curate-doc tailwind https://tailwindcss.com/docs/styling-with-utility-classes
# → Scrapes page into collection, writes .md file, adds source to INDEX.xml

# Re-scrape all docs in collection
/rescrape-docs tailwind
# → Re-scrapes all URLs in INDEX.xml, writes all .md files, updates descriptions for changed content

# ✨ Use the docs
/ask-docs tailwind Please evaluate my project for correct usage of utility classes?
# → Searches tailwind/INDEX.xml for relevant docs, analyses these, gives you an answer

🏗️ How This Repo Works

Workflow: Python script scrapes URL → writes .md file → creates INDEX.xml entry with PLACEHOLDER description → Claude Code generates semantic description. The /curate-doc command always regenerates the description, whereas /rescrape-docs only regenerates descriptions for files with content changes.

Directory Structure:

uv/
├── INDEX.xml               # Index of all docs
├── README.md
├── api-reference.md        # Scraped doc
├── getting-started.md      # Scraped doc
└── ...

INDEX.xml Schema:

<docs_index>
  <source>
    <title>Hello Document Title</title>
    <description>20-30 word dense summary optimised for semantic search...</description>
    <source_url>https://docs.example.com/hello</source_url>
    <local_file>hello-document-title.md</local_file>
    <scraped_at>2025-10-15</scraped_at>
  </source>
  <!-- Multiple <source> entries, one per .md file -->
</docs_index>

Scripts use FireCrawl Python SDK. MCP server also configured (.mcp.json, .claude/settings.json).


👉 Notes to Improve later

Old Idea

Instead of crawling, rather go to GitHub and automate downloading and index creation. Docs are much cleaner than crawling. Keep .mdx files as-is; do not convert to .md. Trade-off: bulk downloads bloat the index; curating individually keeps focus.

New Idea (2026.01.16) — use llms.txt + direct fetch

Instruction given to Claude Code and successfully run on uv/ directory to update all documents via direct HTTP fetch (Python script), so no scraping, 100% clean, and no Firecrawl tokens.

Claude Code terminal showing user prompt to assess llms.txt approach: explains that instead of FireCrawl scraping (which isn't always clean), match INDEX.xml source_url entries to llms.txt markdown URLs and curl content directly. Shows Claude reading README.md, uv/llms.txt, and uv/INDEX.xml files.

Refactor to use llms.txt + direct fetch

Adding this as a note for later to refactor to this method. (The screenshot mentions curl but we used Python's urllib.request.)

About

Curate and index clean docs for clean AI context to ask questions against docs.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •