Quickstart · Architecture · Features · API · Testing
An intelligent desktop assistant that observes your Windows activity in real time, classifies what you're doing, and can autonomously execute tasks like drafting emails, managing windows, and automating workflows — all running locally with zero cloud dependencies.
"Everyone else suggests. We execute." — myVISION.md
git clone https://github.com/nxtg-ai/DesktopAI.git
cd DesktopAI
python3 -m venv .venv && source .venv/bin/activate
pip install -r backend/requirements.txt
# Start the backend
make backend-devOpen http://localhost:8000 — you should see the glassmorphism UI with dark mode, chat, and notification bell.
# Install cross-compilation toolchain (WSL2, one-time)
sudo apt-get update && sudo apt-get install -y mingw-w64
rustup target add x86_64-pc-windows-gnu
# Build the collector
make collector-build
# Output: collector/target/x86_64-pc-windows-gnu/release/desktopai-collector.exeCopy desktopai-collector.exe to Windows and run:
$env:BACKEND_WS_URL = "ws://localhost:8000/ingest"
$env:BACKEND_HTTP_URL = "http://localhost:8000/api/events"
$env:IDLE_ENABLED = "1"
$env:UIA_ENABLED = "0"
./desktopai-collector.exeWhen connected, you'll see a notification: "DesktopAI can now see and control your desktop."
Tip: Windows → WSL2
localhostforwarding is typically automatic. If not, use the WSL2 VM IP fromwsl hostname -I.
# Run tests to verify everything works
make backend-test # 423 Python tests
cd collector && cargo test # 70 Rust tests# Install Ollama for AI-powered responses
curl -fsSL https://ollama.com/install.sh | sh
ollama pull qwen2.5:7b
ollama pull qwen2.5vl:7b
# Backend auto-detects Ollama at localhost:11434API_TOKEN=my-secret-token make backend-dev
# All /api/* endpoints now require: Authorization: Bearer my-secret-token|
Windows-native observer (9 modules, 70 tests). Hooks Win32 and UI Automation APIs for foreground windows, idle/active state, recursive UIA trees, and desktop screenshots. 9 desktop commands (click, type, scroll, etc.). Heartbeat ping/pong. Ships events over WebSocket with exponential backoff reconnection. |
Python brain running in WSL2 (423 unit tests). State management, SQLite persistence, activity classification, multi-turn chat with desktop context, notification engine (4 rules), 3 automation recipes, autonomy orchestration with approval gates, and bridge command execution to the collector. |
Glassmorphism UI with dark mode, multi-turn chat with conversation persistence, notification bell, recipe chips, keyboard shortcuts, agent vision panel, autonomy controls. Tauri native overlay for 3D avatar with state-driven animations. |
Key architectural decision: Rust owns the desktop (collector, Tauri, Win32/COM/UIA). Python owns the brain (LLM, chat, planning, orchestration). This is locked in — Python is I/O bound on LLM calls, Rust provides platform depth.
|
Closed-Loop Autonomy Observe → Plan → Execute → Verify. 9 desktop commands over WebSocket bridge. Vision agent with confidence gating. |
Multi-Turn Chat Conversational AI with desktop context, screenshot inclusion, recipe matching, and action intent detection. |
3 Personality Modes Co-pilot (calm), Assistant (proactive), Operator (silent execution). User selects or auto-adapts. |
Zero Cloud Dependencies Everything local. Ollama for LLM. Cloud FM is opt-in additive, never required. |
All capabilities
| Capability | Description |
|---|---|
| 3-Tier Autonomy | Supervised (every action pauses), Guided (routine free, novel pauses), Autonomous (full execution) |
| Kill Switch | Ctrl+Shift+X hotkey, UI button, API cancel. Instant halt mid-execution. |
| Session Greeting | Notification when collector connects: "DesktopAI can now see and control your desktop." |
| Heartbeat | Ping/pong between backend and collector (30s). Detects stale connections. |
| Context Insights | Detects app-toggle patterns ("switching between Outlook and Excel for 20 min") and deep focus |
| Notification Engine | 4 rules: idle detection, rapid app switching, session milestones, context insights |
| Desktop Recipes | 3 built-in automation recipes with keyword matching from chat |
| Trajectory Learning | Error lessons extracted from failed runs, fed back into planning |
| Security Hardening | Token auth, rate limiting, security headers, WebSocket connection limits |
| Dark Mode | CSS custom property toggle with localStorage persistence |
| Keyboard Shortcuts | / focus chat, Escape blur, Ctrl+Enter send, Ctrl+Shift+N new chat |
| Voice Control | Browser-native STT/TTS with live transcript |
| Ollama Integration | Vision + structured JSON output, auto-fallback, runtime model hot-swap |
| Browser Automation | Playwright via CDP for web-based task execution |
| Screenshot Capture | GDI-based with JPEG encoding, configurable downscaling |
| UI Telemetry | Frontend journey telemetry with session artifacts |
57 endpoints — click to expand
| Method | Endpoint | Description |
|---|---|---|
POST |
/api/chat |
Multi-turn chat with desktop context and screenshot |
GET |
/api/chat/conversations |
List conversations |
GET |
/api/chat/conversations/{id} |
Get conversation messages |
DELETE |
/api/chat/conversations/{id} |
Delete conversation |
GET |
/api/notifications |
List notifications |
GET |
/api/notifications/count |
Unread count |
POST |
/api/notifications/{id}/read |
Mark as read |
DELETE |
/api/notifications/{id} |
Delete notification |
GET |
/api/recipes |
Context-filtered automation recipes |
POST |
/api/recipes/{id}/run |
Execute a recipe |
GET |
/api/state |
Current window state |
GET |
/api/state/snapshot |
Desktop context as JSON |
POST |
/api/events |
Ingest event (HTTP) |
GET |
/api/events |
Recent events |
GET |
/api/collector |
Collector connection status |
POST |
/api/tasks |
Create task |
GET |
/api/tasks |
List tasks |
GET |
/api/tasks/{id} |
Get task |
POST |
/api/tasks/{id}/plan |
Attach plan steps |
POST |
/api/tasks/{id}/run |
Execute plan |
POST |
/api/tasks/{id}/approve |
Approve irreversible step |
POST |
/api/tasks/{id}/pause |
Pause |
POST |
/api/tasks/{id}/resume |
Resume |
POST |
/api/tasks/{id}/cancel |
Cancel |
POST |
/api/autonomy/runs |
Start autonomous run |
GET |
/api/autonomy/runs |
List runs |
GET |
/api/autonomy/runs/{id} |
Get run state/log |
POST |
/api/autonomy/runs/{id}/approve |
Approve next step |
POST |
/api/autonomy/runs/{id}/cancel |
Cancel run |
GET |
/api/autonomy/planner |
Planner mode status |
POST |
/api/autonomy/planner |
Set planner mode |
DELETE |
/api/autonomy/planner |
Reset to default |
POST |
/api/agent/run |
Start vision agent run |
GET |
/api/agent/bridge |
Bridge connection status |
GET |
/api/readiness/status |
Readiness summary |
POST |
/api/readiness/gate |
One-shot gate |
POST |
/api/readiness/matrix |
Multi-objective matrix |
GET |
/api/executor |
Executor runtime status |
GET |
/api/executor/preflight |
Executor readiness |
GET |
/api/health |
Health check (always public) |
GET |
/api/selftest |
Backend self-test |
GET |
/api/ollama |
Ollama diagnostics |
GET |
/api/ollama/models |
Installed models |
POST |
/api/ollama/model |
Set model override |
DELETE |
/api/ollama/model |
Clear override |
POST |
/api/ollama/probe |
Generate probe |
POST |
/api/classify |
Classify event |
POST |
/api/summarize |
Context summary |
POST |
/api/ui-telemetry |
Ingest UI telemetry |
GET |
/api/ui-telemetry |
List telemetry events |
GET |
/api/ui-telemetry/sessions |
List sessions |
POST |
/api/ui-telemetry/reset |
Clear telemetry |
GET |
/api/runtime-logs |
Runtime logs |
GET |
/api/runtime-logs/correlate |
Correlate to session |
POST |
/api/runtime-logs/reset |
Clear logs |
WS |
/ingest |
Collector event stream + heartbeat |
WS |
/ws |
UI real-time updates |
# Python backend (423 unit tests)
make backend-test # Fast: excludes integration
make backend-test-integration # Requires running Ollama
# Linting & type checking
ruff check backend/app/ backend/tests/ # 0 errors expected
pyright backend/app/ # 0 errors, 7 warnings (pre-existing)
# Rust collector (70 tests)
cd collector && cargo test
cd collector && cargo clippy --all-targets -- -D warnings
# UI (Playwright)
make ui-test # Headless browser tests
make ui-test-headed # Watch the browser journeyBackend environment variables
| Variable | Default | Description |
|---|---|---|
BACKEND_HOST |
0.0.0.0 |
Server bind address |
BACKEND_PORT |
8000 |
Server port |
API_TOKEN |
(empty) | Bearer token (empty = no auth) |
BACKEND_DB_PATH |
backend/data/desktopai.db |
SQLite database path |
OLLAMA_URL |
http://localhost:11434 |
Ollama API URL |
OLLAMA_MODEL |
qwen2.5:7b |
Default chat model |
OLLAMA_VISION_MODEL |
qwen2.5vl:7b |
Default vision model |
ACTION_EXECUTOR_MODE |
auto |
auto / bridge / simulated / playwright |
RATE_LIMIT_PER_MINUTE |
60 |
API rate limit per IP |
WS_MAX_CONNECTIONS |
50 |
Max WebSocket connections |
COLLECTOR_HEARTBEAT_INTERVAL_S |
30 |
Heartbeat ping interval |
NOTIFICATIONS_ENABLED |
true |
Enable notification engine |
AUTONOMY_PLANNER_MODE |
auto |
deterministic / auto / ollama_required |
Collector environment variables
| Variable | Default | Description |
|---|---|---|
BACKEND_WS_URL |
ws://localhost:8000/ingest |
WebSocket endpoint |
BACKEND_HTTP_URL |
http://localhost:8000/api/events |
HTTP fallback |
IDLE_ENABLED |
1 |
Enable idle/active events |
IDLE_THRESHOLD_MS |
60000 |
Idle timeout |
UIA_ENABLED |
0 |
Enable UI Automation snapshots |
ENABLE_SCREENSHOT |
0 |
Enable desktop screenshots |
COMMAND_ENABLED |
1 |
Enable remote command execution |
DesktopAI is local-first and privacy-preserving:
- No keystrokes are captured
- Screenshots are opt-in and disabled by default
- No data leaves your network in the core path
- Ollama runs locally — cloud LLM is opt-in additive
- UIA snapshots are optional, throttled, and depth-limited
Built by NXTG.AI — "We don't play with the cutting edge. We shape the bleeding edge."