Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
57 changes: 56 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,9 +51,27 @@ A powerful, feature-rich command-line interface for interacting with Model Conte
- **Session Persistence**: Save/load/list conversation sessions with auto-save every 10 turns (`/sessions`)
- **Conversation Export**: Export conversations as Markdown or JSON with metadata (`/export`)

### Dashboard (Real-Time Browser UI)
- **`--dashboard` flag**: Launch a real-time browser dashboard alongside chat mode
- **Agent Terminal**: Live conversation view with message bubbles, streaming tokens, and attachment rendering
- **Activity Stream**: Tool call/result pairs, reasoning steps, and user attachment events
- **Plan Viewer**: Visual execution plan progress with DAG rendering
- **Tool Registry**: Browse discovered tools, trigger execution from the browser
- **Config Panel**: View and switch providers, models, and system prompt
- **File Attachments**: "+" button for browser file upload, drag-and-drop, and clipboard paste

### Multi-Modal Attachments
- **`/attach` command**: Stage files for the next message — images, text/code, and audio (aliases: `/file`, `/image`)
- **`--attach` CLI flag**: Attach files to the first message (repeatable: `--attach img.png --attach code.py`)
- **Inline `@file:` references**: Mention `@file:path/to/file` anywhere in a message to attach it
- **Image URL detection**: HTTP/HTTPS image URLs in messages are automatically sent as vision content
- **Supported formats**: PNG, JPEG, GIF, WebP, HEIC (images), MP3, WAV (audio), plus 25+ text/code extensions
- **Dashboard rendering**: Image thumbnails, expandable text previews, audio players, file badges
- **Browser upload**: "+" button in dashboard chat input with drag-and-drop and clipboard paste support

### Code Quality
- **Core/UI Separation**: Core modules use `logging` only — no UI imports
- **3,800+ tests**: Comprehensive test suite with branch coverage, integration tests, and 60% minimum threshold
- **4,300+ tests**: Comprehensive test suite with branch coverage, integration tests, and 60% minimum threshold
- **15 Architecture Principles**: Documented and enforced (see [architecture.md](architecture.md))
- **Full [Roadmap](roadmap.md)**: Tiers 1-6 complete, Tiers 7-12 planned (traces, memory scopes, skills, scheduling, multi-agent)

Expand Down Expand Up @@ -82,6 +100,7 @@ The MCP CLI is built on a modular architecture with clean separation of concerns
- **Performance Metrics**: Response timing, words/second, and execution statistics
- **Rich Formatting**: Markdown rendering, syntax highlighting, and progress indicators
- **Token Usage Tracking**: Per-turn and cumulative API token usage with `/usage` command
- **Multi-Modal Attachments**: Attach images, text files, and audio to messages via `/attach`, `--attach`, `@file:` refs, or browser upload
- **Session Persistence**: Auto-save and manual save/load of conversation sessions
- **Conversation Export**: Export to Markdown or JSON with metadata and token usage

Expand Down Expand Up @@ -161,6 +180,8 @@ Comprehensive documentation is available in the `docs/` directory:

### Specialized Documentation
- **[Execution Plans](docs/PLANNING.md)** - Plan creation, parallel execution, variable resolution, checkpointing, guards, and re-planning
- **[Dashboard](docs/DASHBOARD.md)** - Real-time browser UI with agent terminal, activity stream, and file uploads
- **[Attachments](docs/ATTACHMENTS.md)** - Multi-modal file attachments: images, text, audio, and browser upload
- **[MCP Apps](docs/MCP_APPS.md)** - Interactive browser UIs served by MCP servers (SEP-1865)
- **[OAuth Authentication](docs/OAUTH.md)** - OAuth flows, storage backends, and MCP server integration
- **[Streaming Integration](docs/STREAMING.md)** - Real-time response streaming architecture
Expand Down Expand Up @@ -290,6 +311,9 @@ Global options available for all modes and commands:
- `--vm`: [Experimental] Enable AI virtual memory for context management
- `--vm-budget`: Token budget for conversation events in VM mode (default: 128000, on top of system prompt)
- `--vm-mode`: VM mode — `passive` (default), `relaxed`, or `strict`
- `--dashboard`: Launch a real-time browser dashboard UI alongside chat mode
- `--attach`: Attach files to the first message (repeatable: `--attach img.png --attach code.py`)
- `--plan-tools`: Enable model-driven planning — the LLM autonomously creates and executes multi-step plans

### Environment Variables

Expand Down Expand Up @@ -332,6 +356,12 @@ mcp-cli --server sqlite --model qwen2.5-coder
# Switch to cloud providers (requires API keys)
mcp-cli chat --server sqlite --provider openai --model gpt-5
mcp-cli chat --server sqlite --provider anthropic --model claude-4-5-sonnet

# Launch with real-time browser dashboard
mcp-cli --server sqlite --dashboard

# Attach files to the first message
mcp-cli --server sqlite --attach image.png --attach data.csv
```

### 2. Interactive Mode
Expand Down Expand Up @@ -527,6 +557,23 @@ mcp-cli --server sqlite --provider anthropic --model claude-4-5-opus

**Note**: Servers added via `/server add` are stored in `~/.mcp-cli/preferences.json` and persist across sessions. Project servers remain in `server_config.json`.

#### Multi-Modal Attachments
```bash
/attach image.png # Stage an image for the next message
/attach code.py # Stage a text file
/attach list # Show currently staged files
/attach clear # Clear staged files
/file data.csv # Alias for /attach
/image screenshot.heic # Alias for /attach

# Inline file references (in any message)
@file:screenshot.png describe what you see
@file:data.csv summarize this data

# Image URLs are auto-detected
https://example.com/photo.jpg what is in this image?
```

#### Conversation Management
```bash
/conversation # Show conversation history
Expand Down Expand Up @@ -613,6 +660,14 @@ See [Token Management Guide](docs/TOKEN_MANAGEMENT.md) for comprehensive documen
- Verbose and compact display modes
- Complete execution history and timing

#### Multi-Modal Attachments
- Attach images, text files, and audio to any message
- `/attach` command with staging, list, and clear (aliases: `/file`, `/image`)
- Inline `@file:path` references in any message
- `--attach` CLI flag for first-message attachments
- Browser "+" button with drag-and-drop and clipboard paste (with `--dashboard`)
- Dashboard renders thumbnails, text previews, and audio players

#### Provider Integration
- Seamless switching between providers
- Model-specific optimizations
Expand Down
201 changes: 201 additions & 0 deletions docs/ATTACHMENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,201 @@
# Multi-Modal Attachments

MCP CLI supports attaching images, text files, and audio to messages. Attachments are converted to content blocks that multimodal LLMs can process (vision, text analysis, audio understanding).

## Quick Start

```bash
# Attach files to the first message via CLI flag
mcp-cli --server sqlite --attach photo.png --attach data.csv

# In chat, use the /attach command
/attach screenshot.png
/attach code.py
Tell me what you see and review the code

# Or use inline @file: references
@file:image.png describe what's in this image
```

## Three Ways to Attach

### 1. `/attach` Command (Chat Mode)

Stage files before sending a message:

```bash
/attach photo.png # Stage an image
/attach src/main.py # Stage a code file
/attach recording.mp3 # Stage an audio file
```

Aliases: `/file`, `/image`

**Manage staging:**
```bash
/attach list # Show currently staged files
/attach clear # Clear all staged files
```

Staged files are sent with your next message and automatically cleared.

### 2. `--attach` CLI Flag

Attach files to the first message when starting chat:

```bash
mcp-cli --server sqlite --attach image.png
mcp-cli --server sqlite --attach img.png --attach data.csv --attach code.py
```

The flag is repeatable — use it multiple times for multiple files.

### 3. Inline `@file:` References

Reference files directly in any message:

```bash
@file:screenshot.png describe what you see
@file:report.txt @file:data.csv compare these two files
Look at @file:diagram.png and explain the architecture
```

The `@file:` prefix is removed from the message text before sending.

### Image URL Detection

HTTP/HTTPS image URLs in messages are automatically detected and sent as vision content:

```
https://example.com/chart.png what does this chart show?
```

Supported URL patterns: `.png`, `.jpg`, `.jpeg`, `.gif`, `.webp`

## Supported File Types

### Images
| Extension | MIME Type |
|-----------|-----------|
| `.png` | `image/png` |
| `.jpg`, `.jpeg` | `image/jpeg` |
| `.gif` | `image/gif` |
| `.webp` | `image/webp` |
| `.heic` | `image/heic` |

Images are base64-encoded and sent as `image_url` content blocks with configurable detail level.

### Audio
| Extension | MIME Type |
|-----------|-----------|
| `.mp3` | `audio/mpeg` |
| `.wav` | `audio/wav` |

Audio is base64-encoded and sent as `input_audio` content blocks.

### Text & Code
| Extension | MIME Type |
|-----------|-----------|
| `.txt` | `text/plain` |
| `.md` | `text/markdown` |
| `.csv` | `text/csv` |
| `.json` | `application/json` |
| `.html` | `text/html` |
| `.xml` | `text/xml` |
| `.yaml`, `.yml` | `text/yaml` |
| `.py` | `text/plain` |
| `.js`, `.jsx` | `text/plain` |
| `.ts`, `.tsx` | `text/plain` |
| `.sh`, `.bash` | `text/plain` |
| `.rs` | `text/plain` |
| `.go` | `text/plain` |
| `.java` | `text/plain` |
| `.c`, `.cpp` | `text/plain` |
| `.h`, `.hpp` | `text/plain` |
| `.rb` | `text/plain` |
| `.swift` | `text/plain` |
| `.kt` | `text/plain` |
| `.sql` | `text/plain` |
| `.toml` | `text/plain` |
| `.ini`, `.cfg` | `text/plain` |
| `.env` | `text/plain` |
| `.log` | `text/plain` |

Text files are read as UTF-8 (with Latin-1 fallback) and wrapped in labeled text blocks.

## Size Limits

| Limit | Value |
|-------|-------|
| Maximum file size | 20 MB |
| Maximum attachments per message | 10 |

These defaults are configured in `src/mcp_cli/config/defaults.py`.

## Browser Upload (Dashboard)

When using `--dashboard`, the agent terminal provides browser-based file attachment:

### "+" Button
Click the "+" button next to the chat input to open a file picker. Select one or more files to stage them.

### Drag and Drop
Drag files from your file manager onto the chat area. A drop overlay appears to confirm the target.

### Clipboard Paste
Paste images directly into the chat input (Ctrl/Cmd+V). Screenshots and copied images are automatically staged.

### Staging Strip
Staged files appear as removable badges above the chat input:
- Image files show a small thumbnail preview
- All files show the filename with a "x" remove button
- Click "x" to remove a file before sending

Files are sent when you press Enter or click Send. The staging strip clears automatically.

## Dashboard Rendering

When messages with attachments appear in the dashboard, they render as:

- **Small images (<100KB)**: Inline thumbnail previews (max 200x150px)
- **Large images (>100KB)**: Metadata badge showing filename and size
- **URL images**: Thumbnail loaded from the URL
- **Text files**: Expandable preview showing the first 2000 characters
- **Audio files**: HTML5 `<audio>` player with playback controls

The activity stream shows attachment events as badge cards with a paperclip icon, filenames, and total size.

## How It Works

### Content Block Construction

Each file type produces specific OpenAI-compatible content blocks:

- **Images** → `{"type": "image_url", "image_url": {"url": "data:image/png;base64,...", "detail": "auto"}}`
- **Audio** → `{"type": "input_audio", "input_audio": {"data": "...", "format": "mp3"}}`
- **Text** → `{"type": "text", "text": "--- filename ---\n...\n--- end filename ---"}`

When attachments are present, the user message `content` field becomes a list of content blocks (multimodal format) instead of a plain string.

### Attachment Staging

The `AttachmentStaging` class on `ChatContext` manages the staging lifecycle:

1. Files are staged via `/attach`, `--attach`, or browser upload
2. The chat loop calls `drain()` to collect and clear staged files
3. Combined with inline `@file:` refs and detected image URLs
4. `build_multimodal_content()` assembles the final content block list
5. If no attachments exist, the message stays as a plain string (backward compatible)

### Dashboard Descriptors

To avoid sending large base64 payloads over WebSocket, the dashboard uses lightweight **attachment descriptors**:

- `display_name`, `size_bytes`, `mime_type`, `kind` (image/text/audio/unknown)
- `preview_url`: data URI for small images, HTTP URL for URL images, `None` for large files
- `text_preview`: first 2000 characters for text files
- `audio_data_uri`: data URI for small audio files

These thresholds are configured in `src/mcp_cli/config/defaults.py`:
- `DEFAULT_DASHBOARD_INLINE_IMAGE_THRESHOLD`: 100 KB
- `DEFAULT_DASHBOARD_TEXT_PREVIEW_CHARS`: 2000
44 changes: 44 additions & 0 deletions docs/COMMANDS.md
Original file line number Diff line number Diff line change
Expand Up @@ -377,6 +377,50 @@ Save and restore conversation sessions:

Sessions are stored as JSON in `~/.mcp-cli/sessions/`. Auto-save triggers every 10 turns by default.

### Multi-Modal Attachments

Stage files to include in your next message. Supports images, text/code files, and audio.

**Stage Files:**
```bash
/attach photo.png # Stage an image
/attach code.py # Stage a text/code file
/attach clip.mp3 # Stage an audio file
/file data.csv # Alias for /attach
/image screenshot.heic # Alias for /attach
```

**Manage Staging:**
```bash
/attach list # Show currently staged files
/attach clear # Clear all staged files
```

**Inline References (in any message):**
```bash
@file:screenshot.png describe what you see in this image
@file:report.txt @file:data.csv compare these two files
```

Image URLs in messages are automatically detected and sent as vision content.

**Supported File Types:**
- **Images:** `.png`, `.jpg`, `.jpeg`, `.gif`, `.webp`, `.heic`
- **Audio:** `.mp3`, `.wav`
- **Text/Code:** `.txt`, `.md`, `.csv`, `.json`, `.html`, `.xml`, `.yaml`, `.yml`, `.py`, `.js`, `.ts`, `.jsx`, `.tsx`, `.sh`, `.bash`, `.rs`, `.go`, `.java`, `.c`, `.cpp`, `.h`, `.hpp`, `.rb`, `.swift`, `.kt`, `.sql`, `.toml`, `.ini`, `.cfg`, `.env`, `.log`

**Size Limits:**
- Maximum file size: 20 MB per file
- Maximum attachments per message: 10

**CLI Flag:**
```bash
# Attach files to the first message (repeatable)
mcp-cli --server sqlite --attach image.png --attach data.csv
```

See [ATTACHMENTS.md](./ATTACHMENTS.md) for comprehensive attachments documentation.

### Conversation Export

Export conversations in structured formats:
Expand Down
Loading
Loading