-
Notifications
You must be signed in to change notification settings - Fork 61
examples[pdf]: limit context length #270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
PDF viewer with PDF.js featuring: - Chunked binary loading with progress bar - Text extraction for AI context - arXiv paper support (fetch by ID) - Page navigation with keyboard shortcuts - Zoom controls (including Ctrl+0 reset) - Fullscreen mode support - Horizontal swipe for page changes (disabled when zoomed) - Page persistence in localStorage - Text selection via PDF.js TextLayer - Clickable title link to source URL - Rounded corners and subtle border styling
- Accept any HTTP(s) URLs instead of ArXiv-only - Use HTTP Range requests for chunked binary loading - Remove ArXiv-specific code (arxiv.ts, metadata fetching) - Remove CLAUDE.md index generation - Flatten hierarchical folder structure to simple entries list - Remove dead code: getPdfSummary, httpFileSizes - Simplify base64 encoding using Buffer - Simplify chunk extraction using slice() - Consolidate DEFAULT_PDF_URL constant The server now works with any PDF URL, not just arXiv papers. HTTP Range requests stream chunks on-demand when supported.
- Add pdfTitle to updateModelContext structuredContent - Include selection position (text, start, end) when text is selected - Add debounced selectionchange listener to update context on selection
The UI needs the default value in the schema to show it properly.
- Remove hard-coded test paths from main()
- Remove unused resources: pdfs://metadata/{pdfId}, pdfs://content/{pdfId}
- Remove unused metadata fields: subject, creator, producer, creationDate, modDate
- Remove unused entry fields: relativePath, estimatedTextSize
- Remove filterEntriesByFolder and folder filter from list_pdfs
- Remove redundant output schema validation (trust typed returns)
- Simplify scanDirectory and createLocalEntry signatures
Total: 1836 → 1666 lines (-170 lines, -9%)
Simplified the example to focus on key MCP Apps SDK patterns: - Chunked data through size-limited tool calls - Model context updates (page text + selection) - Display modes (fullscreen vs inline) - External links (openLink) Changes: - Remove local file support (HTTP URLs only) - Restrict dynamic URLs to arxiv.org for security - Simplify types: url instead of sourcePath/sourceType - Simplify indexer: 168 → 44 lines - Simplify loader: 318 → 171 lines - Simplify server: 337 → 233 lines - Fix selection text normalization - Rewrite README with didactic focus Total: 1836 → 1236 lines (-33%)
- Local paths are converted to file:// URLs on startup - file:// URLs must be in the initial list (strict validation) - Dynamic URLs still restricted to arxiv.org only - Updated README with local file examples
- Add logging to selectionchange handler to verify it fires - Add fallback matching without spaces (TextLayer spans may lack spaces) - Log selection detection success/failure for debugging The issue: PDF.js TextLayer renders text as positioned spans without space characters between them. When selecting across spans: - pageText has spaces (items joined with ' ') - sel.toString() may not have spaces - indexOf fails to match The fix tries exact match first, then falls back to spaceless matching.
Model context now looks like: ```markdown --- url: https://arxiv.org/pdf/... page: 5/144 --- Page text with <pdf-selection>selected text</pdf-selection> inline. ``` This is cleaner for the model to parse and includes the source URL.
Added two well-designed helpers:
formatPageContent(text, maxLength, selection?)
- Centers truncation window around selection if present
- Adds <truncated-content/> markers at elision points
- Wraps selection in <pdf-selection> tags
- Allocates 60% context before, 40% after for readability
findSelectionInText(pageText, selectedText)
- Tries exact match first
- Falls back to spaceless match for TextLayer quirks
- Returns { start, end } or undefined
Example output with selection:
```
<truncated-content/>
...context before... <pdf-selection>selected text</pdf-selection> ...context after...
<truncated-content/>
```
When selection is too large for the budget: <truncated-content/><pdf-selection><truncated-content/>start...end<truncated-content/></pdf-selection><truncated-content/> This keeps the selection structure intact while showing beginning and end.
…r as default - Remove read_pdf_text tool (viewer extracts text client-side with pdfjs) - Remove PdfTextChunk and ReadPdfTextInput types - Remove loadPdfTextChunk from pdf-loader - Change default PDF to 'Attention Is All You Need' (1706.03762) - Update README with modest language
…isplay_pdf Major simplifications: - Use URL directly as identifier (no hashing) - Remove displayName - show elided URL with full URL as tooltip - Rename view_pdf to display_pdf with better description - Update all references from pdfId to url - Simplify storage key and model context The tool description now explains it displays an interactive viewer in the chat.
arxiv.org/abs/... -> arxiv.org/pdf/... Applied both at startup and when loading dynamic URLs.
Account for devicePixelRatio when rendering canvas: - Scale canvas dimensions by dpr - Scale context by dpr - Keep CSS size at logical pixels
Fixes 'PDF not found' error when server restarts between display_pdf (which adds the entry) and read_pdf_bytes (which previously only looked up existing entries). Now read_pdf_bytes mirrors display_pdf's logic and dynamically adds arxiv URLs to the index.
antonpk1
previously approved these changes
Jan 14, 2026
@modelcontextprotocol/ext-apps
@modelcontextprotocol/server-basic-react
@modelcontextprotocol/server-basic-vanillajs
@modelcontextprotocol/server-budget-allocator
@modelcontextprotocol/server-cohort-heatmap
@modelcontextprotocol/server-customer-segmentation
@modelcontextprotocol/server-map
@modelcontextprotocol/server-pdf
@modelcontextprotocol/server-scenario-modeler
@modelcontextprotocol/server-shadertoy
@modelcontextprotocol/server-sheet-music
@modelcontextprotocol/server-system-monitor
@modelcontextprotocol/server-threejs
@modelcontextprotocol/server-transcript
@modelcontextprotocol/server-video-resource
@modelcontextprotocol/server-wiki-explorer
commit: |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Some hosts have stringent limits