diff --git a/README.md b/README.md index f8d60acb..0e87fb1f 100644 --- a/README.md +++ b/README.md @@ -51,9 +51,27 @@ A powerful, feature-rich command-line interface for interacting with Model Conte - **Session Persistence**: Save/load/list conversation sessions with auto-save every 10 turns (`/sessions`) - **Conversation Export**: Export conversations as Markdown or JSON with metadata (`/export`) +### Dashboard (Real-Time Browser UI) +- **`--dashboard` flag**: Launch a real-time browser dashboard alongside chat mode +- **Agent Terminal**: Live conversation view with message bubbles, streaming tokens, and attachment rendering +- **Activity Stream**: Tool call/result pairs, reasoning steps, and user attachment events +- **Plan Viewer**: Visual execution plan progress with DAG rendering +- **Tool Registry**: Browse discovered tools, trigger execution from the browser +- **Config Panel**: View and switch providers, models, and system prompt +- **File Attachments**: "+" button for browser file upload, drag-and-drop, and clipboard paste + +### Multi-Modal Attachments +- **`/attach` command**: Stage files for the next message — images, text/code, and audio (aliases: `/file`, `/image`) +- **`--attach` CLI flag**: Attach files to the first message (repeatable: `--attach img.png --attach code.py`) +- **Inline `@file:` references**: Mention `@file:path/to/file` anywhere in a message to attach it +- **Image URL detection**: HTTP/HTTPS image URLs in messages are automatically sent as vision content +- **Supported formats**: PNG, JPEG, GIF, WebP, HEIC (images), MP3, WAV (audio), plus 25+ text/code extensions +- **Dashboard rendering**: Image thumbnails, expandable text previews, audio players, file badges +- **Browser upload**: "+" button in dashboard chat input with drag-and-drop and clipboard paste support + ### Code Quality - **Core/UI Separation**: Core modules use `logging` only — no UI imports -- **3,800+ tests**: Comprehensive test suite with branch coverage, integration tests, and 60% minimum threshold +- **4,300+ tests**: Comprehensive test suite with branch coverage, integration tests, and 60% minimum threshold - **15 Architecture Principles**: Documented and enforced (see [architecture.md](architecture.md)) - **Full [Roadmap](roadmap.md)**: Tiers 1-6 complete, Tiers 7-12 planned (traces, memory scopes, skills, scheduling, multi-agent) @@ -82,6 +100,7 @@ The MCP CLI is built on a modular architecture with clean separation of concerns - **Performance Metrics**: Response timing, words/second, and execution statistics - **Rich Formatting**: Markdown rendering, syntax highlighting, and progress indicators - **Token Usage Tracking**: Per-turn and cumulative API token usage with `/usage` command +- **Multi-Modal Attachments**: Attach images, text files, and audio to messages via `/attach`, `--attach`, `@file:` refs, or browser upload - **Session Persistence**: Auto-save and manual save/load of conversation sessions - **Conversation Export**: Export to Markdown or JSON with metadata and token usage @@ -161,6 +180,8 @@ Comprehensive documentation is available in the `docs/` directory: ### Specialized Documentation - **[Execution Plans](docs/PLANNING.md)** - Plan creation, parallel execution, variable resolution, checkpointing, guards, and re-planning +- **[Dashboard](docs/DASHBOARD.md)** - Real-time browser UI with agent terminal, activity stream, and file uploads +- **[Attachments](docs/ATTACHMENTS.md)** - Multi-modal file attachments: images, text, audio, and browser upload - **[MCP Apps](docs/MCP_APPS.md)** - Interactive browser UIs served by MCP servers (SEP-1865) - **[OAuth Authentication](docs/OAUTH.md)** - OAuth flows, storage backends, and MCP server integration - **[Streaming Integration](docs/STREAMING.md)** - Real-time response streaming architecture @@ -290,6 +311,9 @@ Global options available for all modes and commands: - `--vm`: [Experimental] Enable AI virtual memory for context management - `--vm-budget`: Token budget for conversation events in VM mode (default: 128000, on top of system prompt) - `--vm-mode`: VM mode — `passive` (default), `relaxed`, or `strict` +- `--dashboard`: Launch a real-time browser dashboard UI alongside chat mode +- `--attach`: Attach files to the first message (repeatable: `--attach img.png --attach code.py`) +- `--plan-tools`: Enable model-driven planning — the LLM autonomously creates and executes multi-step plans ### Environment Variables @@ -332,6 +356,12 @@ mcp-cli --server sqlite --model qwen2.5-coder # Switch to cloud providers (requires API keys) mcp-cli chat --server sqlite --provider openai --model gpt-5 mcp-cli chat --server sqlite --provider anthropic --model claude-4-5-sonnet + +# Launch with real-time browser dashboard +mcp-cli --server sqlite --dashboard + +# Attach files to the first message +mcp-cli --server sqlite --attach image.png --attach data.csv ``` ### 2. Interactive Mode @@ -527,6 +557,23 @@ mcp-cli --server sqlite --provider anthropic --model claude-4-5-opus **Note**: Servers added via `/server add` are stored in `~/.mcp-cli/preferences.json` and persist across sessions. Project servers remain in `server_config.json`. +#### Multi-Modal Attachments +```bash +/attach image.png # Stage an image for the next message +/attach code.py # Stage a text file +/attach list # Show currently staged files +/attach clear # Clear staged files +/file data.csv # Alias for /attach +/image screenshot.heic # Alias for /attach + +# Inline file references (in any message) +@file:screenshot.png describe what you see +@file:data.csv summarize this data + +# Image URLs are auto-detected +https://example.com/photo.jpg what is in this image? +``` + #### Conversation Management ```bash /conversation # Show conversation history @@ -613,6 +660,14 @@ See [Token Management Guide](docs/TOKEN_MANAGEMENT.md) for comprehensive documen - Verbose and compact display modes - Complete execution history and timing +#### Multi-Modal Attachments +- Attach images, text files, and audio to any message +- `/attach` command with staging, list, and clear (aliases: `/file`, `/image`) +- Inline `@file:path` references in any message +- `--attach` CLI flag for first-message attachments +- Browser "+" button with drag-and-drop and clipboard paste (with `--dashboard`) +- Dashboard renders thumbnails, text previews, and audio players + #### Provider Integration - Seamless switching between providers - Model-specific optimizations diff --git a/docs/ATTACHMENTS.md b/docs/ATTACHMENTS.md new file mode 100644 index 00000000..a87de92b --- /dev/null +++ b/docs/ATTACHMENTS.md @@ -0,0 +1,201 @@ +# Multi-Modal Attachments + +MCP CLI supports attaching images, text files, and audio to messages. Attachments are converted to content blocks that multimodal LLMs can process (vision, text analysis, audio understanding). + +## Quick Start + +```bash +# Attach files to the first message via CLI flag +mcp-cli --server sqlite --attach photo.png --attach data.csv + +# In chat, use the /attach command +/attach screenshot.png +/attach code.py +Tell me what you see and review the code + +# Or use inline @file: references +@file:image.png describe what's in this image +``` + +## Three Ways to Attach + +### 1. `/attach` Command (Chat Mode) + +Stage files before sending a message: + +```bash +/attach photo.png # Stage an image +/attach src/main.py # Stage a code file +/attach recording.mp3 # Stage an audio file +``` + +Aliases: `/file`, `/image` + +**Manage staging:** +```bash +/attach list # Show currently staged files +/attach clear # Clear all staged files +``` + +Staged files are sent with your next message and automatically cleared. + +### 2. `--attach` CLI Flag + +Attach files to the first message when starting chat: + +```bash +mcp-cli --server sqlite --attach image.png +mcp-cli --server sqlite --attach img.png --attach data.csv --attach code.py +``` + +The flag is repeatable — use it multiple times for multiple files. + +### 3. Inline `@file:` References + +Reference files directly in any message: + +```bash +@file:screenshot.png describe what you see +@file:report.txt @file:data.csv compare these two files +Look at @file:diagram.png and explain the architecture +``` + +The `@file:` prefix is removed from the message text before sending. + +### Image URL Detection + +HTTP/HTTPS image URLs in messages are automatically detected and sent as vision content: + +``` +https://example.com/chart.png what does this chart show? +``` + +Supported URL patterns: `.png`, `.jpg`, `.jpeg`, `.gif`, `.webp` + +## Supported File Types + +### Images +| Extension | MIME Type | +|-----------|-----------| +| `.png` | `image/png` | +| `.jpg`, `.jpeg` | `image/jpeg` | +| `.gif` | `image/gif` | +| `.webp` | `image/webp` | +| `.heic` | `image/heic` | + +Images are base64-encoded and sent as `image_url` content blocks with configurable detail level. + +### Audio +| Extension | MIME Type | +|-----------|-----------| +| `.mp3` | `audio/mpeg` | +| `.wav` | `audio/wav` | + +Audio is base64-encoded and sent as `input_audio` content blocks. + +### Text & Code +| Extension | MIME Type | +|-----------|-----------| +| `.txt` | `text/plain` | +| `.md` | `text/markdown` | +| `.csv` | `text/csv` | +| `.json` | `application/json` | +| `.html` | `text/html` | +| `.xml` | `text/xml` | +| `.yaml`, `.yml` | `text/yaml` | +| `.py` | `text/plain` | +| `.js`, `.jsx` | `text/plain` | +| `.ts`, `.tsx` | `text/plain` | +| `.sh`, `.bash` | `text/plain` | +| `.rs` | `text/plain` | +| `.go` | `text/plain` | +| `.java` | `text/plain` | +| `.c`, `.cpp` | `text/plain` | +| `.h`, `.hpp` | `text/plain` | +| `.rb` | `text/plain` | +| `.swift` | `text/plain` | +| `.kt` | `text/plain` | +| `.sql` | `text/plain` | +| `.toml` | `text/plain` | +| `.ini`, `.cfg` | `text/plain` | +| `.env` | `text/plain` | +| `.log` | `text/plain` | + +Text files are read as UTF-8 (with Latin-1 fallback) and wrapped in labeled text blocks. + +## Size Limits + +| Limit | Value | +|-------|-------| +| Maximum file size | 20 MB | +| Maximum attachments per message | 10 | + +These defaults are configured in `src/mcp_cli/config/defaults.py`. + +## Browser Upload (Dashboard) + +When using `--dashboard`, the agent terminal provides browser-based file attachment: + +### "+" Button +Click the "+" button next to the chat input to open a file picker. Select one or more files to stage them. + +### Drag and Drop +Drag files from your file manager onto the chat area. A drop overlay appears to confirm the target. + +### Clipboard Paste +Paste images directly into the chat input (Ctrl/Cmd+V). Screenshots and copied images are automatically staged. + +### Staging Strip +Staged files appear as removable badges above the chat input: +- Image files show a small thumbnail preview +- All files show the filename with a "x" remove button +- Click "x" to remove a file before sending + +Files are sent when you press Enter or click Send. The staging strip clears automatically. + +## Dashboard Rendering + +When messages with attachments appear in the dashboard, they render as: + +- **Small images (<100KB)**: Inline thumbnail previews (max 200x150px) +- **Large images (>100KB)**: Metadata badge showing filename and size +- **URL images**: Thumbnail loaded from the URL +- **Text files**: Expandable preview showing the first 2000 characters +- **Audio files**: HTML5 `