Skip to content

nxtg-ai/DesktopAI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

130 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

DesktopAI

Quickstart  ·  Architecture  ·  Features  ·  API  ·  Testing

Rust Python FastAPI Tauri SQLite WSL2

Backend Tests Rust Tests Tests Zero cloud deps License


An intelligent desktop assistant that observes your Windows activity in real time, classifies what you're doing, and can autonomously execute tasks like drafting emails, managing windows, and automating workflows — all running locally with zero cloud dependencies.

"Everyone else suggests. We execute."myVISION.md


Quickstart

1. Backend (WSL2) — 2 minutes

git clone https://github.com/nxtg-ai/DesktopAI.git
cd DesktopAI
python3 -m venv .venv && source .venv/bin/activate
pip install -r backend/requirements.txt

# Start the backend
make backend-dev

Open http://localhost:8000 — you should see the glassmorphism UI with dark mode, chat, and notification bell.

2. Windows Collector — 3 minutes

# Install cross-compilation toolchain (WSL2, one-time)
sudo apt-get update && sudo apt-get install -y mingw-w64
rustup target add x86_64-pc-windows-gnu

# Build the collector
make collector-build
# Output: collector/target/x86_64-pc-windows-gnu/release/desktopai-collector.exe

Copy desktopai-collector.exe to Windows and run:

$env:BACKEND_WS_URL = "ws://localhost:8000/ingest"
$env:BACKEND_HTTP_URL = "http://localhost:8000/api/events"
$env:IDLE_ENABLED = "1"
$env:UIA_ENABLED = "0"
./desktopai-collector.exe

When connected, you'll see a notification: "DesktopAI can now see and control your desktop."

Tip: Windows → WSL2 localhost forwarding is typically automatic. If not, use the WSL2 VM IP from wsl hostname -I.

3. Verify — 30 seconds

# Run tests to verify everything works
make backend-test     # 423 Python tests
cd collector && cargo test   # 70 Rust tests

4. Optional: Local LLM

# Install Ollama for AI-powered responses
curl -fsSL https://ollama.com/install.sh | sh
ollama pull qwen2.5:7b
ollama pull qwen2.5vl:7b

# Backend auto-detects Ollama at localhost:11434

5. Optional: Authentication

API_TOKEN=my-secret-token make backend-dev
# All /api/* endpoints now require: Authorization: Bearer my-secret-token

Architecture

Rust Collector

Windows-native observer (9 modules, 70 tests). Hooks Win32 and UI Automation APIs for foreground windows, idle/active state, recursive UIA trees, and desktop screenshots. 9 desktop commands (click, type, scroll, etc.). Heartbeat ping/pong. Ships events over WebSocket with exponential backoff reconnection.

FastAPI Backend

Python brain running in WSL2 (423 unit tests). State management, SQLite persistence, activity classification, multi-turn chat with desktop context, notification engine (4 rules), 3 automation recipes, autonomy orchestration with approval gates, and bridge command execution to the collector.

Web UI + Tauri Avatar

Glassmorphism UI with dark mode, multi-turn chat with conversation persistence, notification bell, recipe chips, keyboard shortcuts, agent vision panel, autonomy controls. Tauri native overlay for 3D avatar with state-driven animations.

Key architectural decision: Rust owns the desktop (collector, Tauri, Win32/COM/UIA). Python owns the brain (LLM, chat, planning, orchestration). This is locked in — Python is I/O bound on LLM calls, Rust provides platform depth.


Features


Closed-Loop Autonomy
Observe → Plan → Execute → Verify. 9 desktop commands over WebSocket bridge. Vision agent with confidence gating.


Multi-Turn Chat
Conversational AI with desktop context, screenshot inclusion, recipe matching, and action intent detection.


3 Personality Modes
Co-pilot (calm), Assistant (proactive), Operator (silent execution). User selects or auto-adapts.


Zero Cloud Dependencies
Everything local. Ollama for LLM. Cloud FM is opt-in additive, never required.

All capabilities
Capability Description
3-Tier Autonomy Supervised (every action pauses), Guided (routine free, novel pauses), Autonomous (full execution)
Kill Switch Ctrl+Shift+X hotkey, UI button, API cancel. Instant halt mid-execution.
Session Greeting Notification when collector connects: "DesktopAI can now see and control your desktop."
Heartbeat Ping/pong between backend and collector (30s). Detects stale connections.
Context Insights Detects app-toggle patterns ("switching between Outlook and Excel for 20 min") and deep focus
Notification Engine 4 rules: idle detection, rapid app switching, session milestones, context insights
Desktop Recipes 3 built-in automation recipes with keyword matching from chat
Trajectory Learning Error lessons extracted from failed runs, fed back into planning
Security Hardening Token auth, rate limiting, security headers, WebSocket connection limits
Dark Mode CSS custom property toggle with localStorage persistence
Keyboard Shortcuts / focus chat, Escape blur, Ctrl+Enter send, Ctrl+Shift+N new chat
Voice Control Browser-native STT/TTS with live transcript
Ollama Integration Vision + structured JSON output, auto-fallback, runtime model hot-swap
Browser Automation Playwright via CDP for web-based task execution
Screenshot Capture GDI-based with JPEG encoding, configurable downscaling
UI Telemetry Frontend journey telemetry with session artifacts

API Reference

57 endpoints — click to expand
Method Endpoint Description
POST /api/chat Multi-turn chat with desktop context and screenshot
GET /api/chat/conversations List conversations
GET /api/chat/conversations/{id} Get conversation messages
DELETE /api/chat/conversations/{id} Delete conversation
GET /api/notifications List notifications
GET /api/notifications/count Unread count
POST /api/notifications/{id}/read Mark as read
DELETE /api/notifications/{id} Delete notification
GET /api/recipes Context-filtered automation recipes
POST /api/recipes/{id}/run Execute a recipe
GET /api/state Current window state
GET /api/state/snapshot Desktop context as JSON
POST /api/events Ingest event (HTTP)
GET /api/events Recent events
GET /api/collector Collector connection status
POST /api/tasks Create task
GET /api/tasks List tasks
GET /api/tasks/{id} Get task
POST /api/tasks/{id}/plan Attach plan steps
POST /api/tasks/{id}/run Execute plan
POST /api/tasks/{id}/approve Approve irreversible step
POST /api/tasks/{id}/pause Pause
POST /api/tasks/{id}/resume Resume
POST /api/tasks/{id}/cancel Cancel
POST /api/autonomy/runs Start autonomous run
GET /api/autonomy/runs List runs
GET /api/autonomy/runs/{id} Get run state/log
POST /api/autonomy/runs/{id}/approve Approve next step
POST /api/autonomy/runs/{id}/cancel Cancel run
GET /api/autonomy/planner Planner mode status
POST /api/autonomy/planner Set planner mode
DELETE /api/autonomy/planner Reset to default
POST /api/agent/run Start vision agent run
GET /api/agent/bridge Bridge connection status
GET /api/readiness/status Readiness summary
POST /api/readiness/gate One-shot gate
POST /api/readiness/matrix Multi-objective matrix
GET /api/executor Executor runtime status
GET /api/executor/preflight Executor readiness
GET /api/health Health check (always public)
GET /api/selftest Backend self-test
GET /api/ollama Ollama diagnostics
GET /api/ollama/models Installed models
POST /api/ollama/model Set model override
DELETE /api/ollama/model Clear override
POST /api/ollama/probe Generate probe
POST /api/classify Classify event
POST /api/summarize Context summary
POST /api/ui-telemetry Ingest UI telemetry
GET /api/ui-telemetry List telemetry events
GET /api/ui-telemetry/sessions List sessions
POST /api/ui-telemetry/reset Clear telemetry
GET /api/runtime-logs Runtime logs
GET /api/runtime-logs/correlate Correlate to session
POST /api/runtime-logs/reset Clear logs
WS /ingest Collector event stream + heartbeat
WS /ws UI real-time updates

Testing

# Python backend (423 unit tests)
make backend-test                              # Fast: excludes integration
make backend-test-integration                  # Requires running Ollama

# Linting & type checking
ruff check backend/app/ backend/tests/         # 0 errors expected
pyright backend/app/                           # 0 errors, 7 warnings (pre-existing)

# Rust collector (70 tests)
cd collector && cargo test
cd collector && cargo clippy --all-targets -- -D warnings

# UI (Playwright)
make ui-test                                   # Headless browser tests
make ui-test-headed                            # Watch the browser journey

Configuration

Backend environment variables
Variable Default Description
BACKEND_HOST 0.0.0.0 Server bind address
BACKEND_PORT 8000 Server port
API_TOKEN (empty) Bearer token (empty = no auth)
BACKEND_DB_PATH backend/data/desktopai.db SQLite database path
OLLAMA_URL http://localhost:11434 Ollama API URL
OLLAMA_MODEL qwen2.5:7b Default chat model
OLLAMA_VISION_MODEL qwen2.5vl:7b Default vision model
ACTION_EXECUTOR_MODE auto auto / bridge / simulated / playwright
RATE_LIMIT_PER_MINUTE 60 API rate limit per IP
WS_MAX_CONNECTIONS 50 Max WebSocket connections
COLLECTOR_HEARTBEAT_INTERVAL_S 30 Heartbeat ping interval
NOTIFICATIONS_ENABLED true Enable notification engine
AUTONOMY_PLANNER_MODE auto deterministic / auto / ollama_required
Collector environment variables
Variable Default Description
BACKEND_WS_URL ws://localhost:8000/ingest WebSocket endpoint
BACKEND_HTTP_URL http://localhost:8000/api/events HTTP fallback
IDLE_ENABLED 1 Enable idle/active events
IDLE_THRESHOLD_MS 60000 Idle timeout
UIA_ENABLED 0 Enable UI Automation snapshots
ENABLE_SCREENSHOT 0 Enable desktop screenshots
COMMAND_ENABLED 1 Enable remote command execution

Privacy

DesktopAI is local-first and privacy-preserving:

  • No keystrokes are captured
  • Screenshots are opt-in and disabled by default
  • No data leaves your network in the core path
  • Ollama runs locally — cloud LLM is opt-in additive
  • UIA snapshots are optional, throttled, and depth-limited

Built by NXTG.AI"We don't play with the cutting edge. We shape the bleeding edge."

About

Intelligent desktop assistant that observes, classifies, and autonomously acts on Windows activity. Rust collector + FastAPI backend + live web UI. 100% local-first.

Topics

Resources

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors