██╗ ██╗ ██╗███╗ ███╗███████╗███╗ ██╗███╗ ██╗ ██████╗ ███╗ ██╗
██║ ██║ ██║████╗ ████║██╔════╝████╗ ██║████╗ ████║██╔═══██╗████╗ ██║
██║ ██║ ██║██╔████╔██║█████╗ ██╔██╗ ██║██╔████╔██║██║ ██║██╔██╗ ██║
██║ ██║ ██║██║╚██╔╝██║██╔══╝ ██║╚██╗██║██║╚██╔╝██║██║ ██║██║╚██╗██║
███████╗╚██████╔╝██║ ╚═╝ ██║███████╗██║ ╚████║██║ ╚═╝ ██║╚██████╔╝██║ ╚████║
╚══════╝ ╚═════╝ ╚═╝ ╚═╝╚══════╝╚═╝ ╚═══╝╚═╝ ╚═╝ ╚═════╝ ╚═╝ ╚═══╝
Lightweight system monitoring with MQTT transport. Pure push architecture — agents send metrics, the console never reaches into your systems. Sets up as docker server in 60 seconds, add clients via one magic command in 10 seconds to start monitoring. No dashboard config, no hassle.
Console (central dashboard):
curl -sSL https://raw.githubusercontent.com/chriopter/lumenmon/main/console/install.sh | bashAgent (on each monitored host):
curl -sSL https://raw.githubusercontent.com/chriopter/lumenmon/main/agent/install.sh | bash
lumenmon-agent register '<invite-url>'
lumenmon-agent start┌─────────────┐ ┌─────────────┐
│ Agent │──────────────►│ Console │
├─────────────┤ MQTT/TLS ├─────────────┤
│ Collectors │──────────────►│ • MQTT 8884 │──► Web :8080
│ (see below) │ │ • SQLite │
└─────────────┘ │ • Flask │
(bare metal) └─────────────┘
(Docker)
| Rhythm | Interval | Typical collectors |
|---|---|---|
PULSE |
1s | cpu, heartbeat |
BREATHE |
60s | memory, disk |
CYCLE |
5m | mail, proxmox_* |
REPORT |
1h | hostname, lumenmon, version, debian_updates |
All collectors are plain Bash scripts under agent/collectors/.
Read this section as:
- Collector: script name.
- Publishes: metric names sent to MQTT.
- Interval: update cadence.
- Failure behavior: when/why a metric marks host state as degraded.
| Collector | Publishes | Interval | Failure behavior |
|---|---|---|---|
cpu |
generic_cpu |
1s (PULSE) |
stale only |
memory |
generic_memory |
60s (BREATHE) |
stale only |
disk |
generic_disk |
60s (BREATHE) |
stale only |
heartbeat |
generic_heartbeat |
1s (PULSE) |
stale only (drives online/offline) |
hostname |
generic_hostname |
1h (REPORT) |
stale only |
lumenmon |
generic_sys_os, generic_sys_kernel, generic_sys_uptime |
1h (REPORT) |
stale only |
version |
generic_agent_version |
1h (REPORT) |
stale only (UI may show update warning) |
mail |
mail_message |
5m (CYCLE) |
informational (message stream) |
zpool |
generic_zpool_total, generic_zpool_degraded |
5m (CYCLE) |
fails when any non-Proxmox pool is degraded |
| Collector | Publishes | Interval | Failure behavior |
|---|---|---|---|
updates |
debian_updates_total, debian_updates_security, debian_updates_release, debian_updates_age |
1h (REPORT) |
total/security only warn after updates are pending for >=24h; release >0 stays critical |
| Collector | Publishes | Interval | Failure behavior |
|---|---|---|---|
vms |
proxmox_vms_* |
5m (CYCLE) |
stale only |
containers |
proxmox_containers_* |
5m (CYCLE) |
stale only |
storage |
proxmox_storage_* |
5m (CYCLE) |
stale + bounds if configured |
zfs |
proxmox_zfs_* |
5m (CYCLE) |
stale + bounds (online drives vs total) |
zpool_health |
proxmox_zpool_* |
5m (CYCLE) |
degraded and upgrade-needed flags (max=0) |
| Collector | Publishes | Interval | Failure behavior |
|---|---|---|---|
datastore_count |
pbs_datastore_count |
5m (CYCLE) |
fails when no datastore is detected |
task_failures |
pbs_task_failures_24h |
5m (CYCLE) |
fails when errors/failures > 0 in last 24h |
backup_age |
pbs_backup_age_hours |
5m (CYCLE) |
fails when age exceeds 24h |
verify_age |
pbs_verify_age_hours |
5m (CYCLE) |
fails when age exceeds 168h |
sync_age |
pbs_sync_age_hours |
5m (CYCLE) |
fails when age exceeds 168h |
gc_age |
pbs_gc_age_hours |
5m (CYCLE) |
fails when age exceeds 168h |
| Collector | Publishes | Interval | Failure behavior |
|---|---|---|---|
temp |
hardware_temp_* |
5m (CYCLE) |
fails on temperature thresholds; negative sensor glitches are clamped to 0 |
pcie_errors |
hardware_pcie_* |
1h (REPORT) |
fails on PCIe/AER errors |
intel_gpu |
hardware_intel_gpu_* |
5m (CYCLE) |
fails on Intel GPU utilization thresholds |
vram |
hardware_gpu_vram_* |
5m (CYCLE) |
fails on VRAM usage thresholds |
smart_values |
hardware_smart_* |
1h (REPORT) |
fails on SMART health/temp/wear thresholds |
ssd_samsung |
hardware_samsung_* |
1h (REPORT) |
inventory/firmware visibility for Samsung SSDs |
Note: on virtualized guests, hardware collectors stay disabled by default. If GPU passthrough is detected, hardware_intel_gpu and hardware_vram are enabled automatically.
| Collector | Publishes | Interval | Failure behavior |
|---|---|---|---|
mullvad_active |
optional_mullvad_active |
opt-in | stale/bounds depend on local config |
Collector health is computed from:
- metric stale timeout (
intervalexceeded), and/or - min/max bounds violations (
min_value,max_value).
Entity (host) health rolls up from metric health. If any metric fails, entity status becomes degraded.
Non-collector checks also exist in the console API/UI layer:
- Mail staleness endpoint:
/api/messages/staleness(default 14 days / 336h, warning) - Alerting status endpoint:
/api/alerts/status
Console (lumenmon):
lumenmon # Show status
lumenmon invite # Generate agent invite
lumenmon logs # View logs
lumenmon update # Update container
lumenmon uninstall # Remove everythingAgent (lumenmon-agent):
lumenmon-agent # Show status
lumenmon-agent debug # Run all collectors once (test output)
lumenmon-agent register # Register with invite URL
lumenmon-agent start/stop # Control service
lumenmon-agent logs # View logs
lumenmon-agent uninstall # Remove agentConsole
Docker container running MQTT broker (Mosquitto), SQLite database, and web dashboard (Flask + Caddy).
Install: Downloads docker-compose.yml, pulls image from GitHub Container Registry, starts container.
Update: lumenmon update pulls latest image and restarts container. Data in ~/.lumenmon/console/data/ is preserved.
Uninstall: lumenmon uninstall stops container, removes image and all data.
Agent
Pure bash scripts that collect metrics and publish via mosquitto_pub over TLS. No Docker, no compiled binaries.
Supported Platforms:
| Platform | Install Path | Service |
|---|---|---|
| Debian/Ubuntu | /opt/lumenmon/ |
systemd |
| Proxmox VE | /opt/lumenmon/ |
systemd |
Requirements: mosquitto-clients (apt install mosquitto-clients)
Install:
- Downloads scripts to
/opt/lumenmon/ - Creates systemd service
lumenmon-agent.service - Creates CLI
/usr/local/bin/lumenmon-agent
Update: lumenmon-agent update fetches and checks out the latest release tag. Credentials preserved. The console dashboard shows "UPDATE AVAILABLE" when a newer version exists.
Uninstall: lumenmon-agent uninstall stops service/process, removes files.
Security
- TLS Pinning: Agents verify server certificate fingerprint on first connection
- Per-agent credentials: Each agent gets unique MQTT credentials
- Outbound only: Agents initiate connections, console cannot connect to agents
- Rate limiting: MQTT broker limits connections and message rates
Mail Forwarding
Two methods to receive system mail - use whichever fits your setup:
Method 1: Local spool (Debian/Ubuntu)
/var/mail/root → agent → MQTT → console
Agent automatically reads local mail spool every 5 minutes. Works out-of-the-box on systems where mail delivers to /var/mail/root.
Method 2: SMTP (Proxmox/PBS)
System notifications → SMTP (port 25) → console
Configure your system to send mail to <agent_id>@<console-host>. Works with Proxmox notification system.
Both methods store mail in the same messages table, displayed per-agent in the web UI.
Data
All data stored in SQLite at /data/metrics.db (inside container).
Retention: Metrics older than 24h auto-deleted every 5 minutes. Most recent value per metric always preserved so offline agents keep their last known status.
Metrics Table (one per agent+metric, e.g. id_abc123_generic_cpu):
| Column | Type | Description |
|---|---|---|
| timestamp | INTEGER | Unix timestamp (primary key) |
| value_real | REAL | Decimal values (CPU %, memory %) |
| value_int | INTEGER | Whole numbers |
| value_text | TEXT | Strings (hostname, version) |
| interval | INTEGER | Expected update interval (seconds) |
| min_value | REAL | Minimum valid value (optional) |
| max_value | REAL | Maximum valid value (optional) |
Messages Table (messages):
| Column | Type | Description |
|---|---|---|
| id | INTEGER | Auto-increment primary key |
| agent_id | TEXT | Agent that received the email |
| mail_from | TEXT | Sender address |
| mail_to | TEXT | Recipient address |
| subject | TEXT | Email subject |
| body | TEXT | Email body |
| received_at | TIMESTAMP | When received |
| read | INTEGER | 0=unread, 1=read |
Writing Custom Collectors
Collectors are bash scripts in agent/collectors/. Standard structure:
#!/bin/bash
# What this collector does.
# Data source and calculation details.
METRIC="generic_example"
TYPE="REAL" # REAL, INTEGER, or TEXT
MIN=0 # Optional hard minimum (fail/red below)
MAX=100 # Optional hard maximum (fail/red above)
WARN_MIN="" # Optional soft minimum (warn/yellow below)
WARN_MAX="" # Optional soft maximum (warn/yellow above)
source "$LUMENMON_HOME/core/mqtt/publish.sh"
while true; do
# IMPORTANT: Use LC_ALL=C for commands that produce localized output
value=$(LC_ALL=C some_command | parse_output)
publish_metric "$METRIC" "$value" "$TYPE" "$BREATHE" "$MIN" "$MAX" "$WARN_MIN" "$WARN_MAX"
[ "${LUMENMON_TEST_MODE:-}" = "1" ] && exit 0 # Support: lumenmon-agent status
sleep $BREATHE
doneLocale handling: Always use LC_ALL=C prefix for system commands to ensure consistent English output parsing:
# Good - forces English output
total=$(LC_ALL=C apt list --upgradable 2>/dev/null | grep -c "upgradable from")
usage=$(LC_ALL=C df -P / | tail -1 | awk '{print $5}' | tr -d '%')
# Bad - output varies by system locale
total=$(apt list --upgradable | grep -c "upgradable from") # Fails on German systemspublish_metric signature:
publish_metric "name" "value" "TYPE" interval [min] [max] [warn_min] [warn_max]Health Detection:
- Values outside
min/maxshow as failed (critical/red). - Values outside
warn_min/warn_max(but inside min/max) show as warning (degraded/yellow).
# Static bounds (percentages)
publish_metric "cpu" "$val" "REAL" "$PULSE" 0 100
# Dynamic bounds (ZFS: online must equal total drives)
publish_metric "zfs_online" "$online" "INTEGER" "$CYCLE" "$total" "$total"
# Warning-only threshold
publish_metric "debian_updates_total" "$total" "INTEGER" "$REPORT" 0 "" "" 0
# One-time metric (interval=0, never stale)
publish_metric "hostname" "$host" "TEXT" 0Categories:
| Directory | Prefix | Purpose |
|---|---|---|
collectors/generic/ |
generic_ |
Universal (CPU, memory, disk) |
collectors/proxmox/ |
proxmox_ |
Proxmox (VMs, containers, ZFS) |
collectors/pbs/ |
pbs_ |
Proxmox Backup Server checks |
collectors/hardware/ |
hardware_ |
Real-hardware telemetry |
collectors/optional/ |
optional_ |
Explicitly opt-in checks |
Development
./dev/auto # Full reset and setup with virtual agent
./dev/add3 # Spawn 3 test agents
./dev/check-collectors # Validate collector contract assumptions
./dev/sensor-inventory # List current remote sensors and failed checks
./dev/sandboxer-maintain --once # Run one auto-maintenance pass
./dev/lumenmon-diagnose # End-to-end health/data-flow diagnosisUse these commands as a complete fast-check list during development and direct deploy.
# Local script sanity
find . -name "*.sh" -type f -exec bash -n {} \;
./dev/check-collectors
# Console image sanity
docker build -t test-console:ci ./console
# E2E tests
cd dev/tests && npm test
cd dev/tests && npx playwright test lumenmon.spec.ts -g "Page Load & Initial State"
# Runtime status (local or remote)
lumenmon
lumenmon-agent
# Direct deploy + smoke checks
./dev/deploy-test agent
./dev/deploy-test console
./dev/deploy-test status
./dev/deploy-test checkOptional collectors are enabled via keys in agent/data/config (or /opt/lumenmon/data/config on host):
mullvad_active=1
# hardware collectors on virtual hosts (optional override)
hardware_force=0Console exposes webhook alert configuration status in GUI and API:
- API:
GET /api/alerts/status - GUI footer:
alerts: not configured/alerts: webhook dry-run/alerts: active
Current behavior is status-only scaffolding (no outbound webhook delivery yet).
Mail staleness is evaluated in console backend from messages.received_at:
- API:
GET /api/messages/staleness?hours=336 - Used by UI status warnings (
MAIL STALE > 14D) - Treated as warning/degraded (yellow), not critical.
cd console && npm install # First time setup
cd console && npm run dev # Watch mode - auto-recompile on changes
cd console && npm run build # One-time buildDeploy directly to a test server via SSH (bypasses GitHub Actions):
cp .env.example .env
# set LUMENMON_TEST_HOST in .env (gitignored)
export LUMENMON_TEST_HOST="root@your-test-server" # optional shell override
./dev/deploy-test web # Build CSS + hot reload frontend (~3s)
./dev/deploy-test agent # Deploy agent + restart (~1s)
./dev/deploy-test console # Full console + restart (~5s)
./dev/deploy-test status # Check remote status
./dev/deploy-test check # API/runtime smoke checks./dev/release # Create version tag, triggers GitHub Actions buildGitHub Actions only builds Docker images on version tags (v*), not on every commit. This keeps the dev loop fast.
Made with 🔆 by chriopter