GitHub - chriopter/lumenmon: System monitoring set up in 30s

  ██╗     ██╗   ██╗███╗   ███╗███████╗███╗   ██╗███╗   ██╗ ██████╗ ███╗   ██╗
  ██║     ██║   ██║████╗ ████║██╔════╝████╗  ██║████╗ ████║██╔═══██╗████╗  ██║
  ██║     ██║   ██║██╔████╔██║█████╗  ██╔██╗ ██║██╔████╔██║██║   ██║██╔██╗ ██║
  ██║     ██║   ██║██║╚██╔╝██║██╔══╝  ██║╚██╗██║██║╚██╔╝██║██║   ██║██║╚██╗██║
  ███████╗╚██████╔╝██║ ╚═╝ ██║███████╗██║ ╚████║██║ ╚═╝ ██║╚██████╔╝██║ ╚████║
  ╚══════╝ ╚═════╝ ╚═╝     ╚═╝╚══════╝╚═╝  ╚═══╝╚═╝     ╚═╝ ╚═════╝ ╚═╝  ╚═══╝

Lightweight system monitoring with MQTT transport. Pure push architecture — agents send metrics, the console never reaches into your systems. Sets up as docker server in 60 seconds, add clients via one magic command in 10 seconds to start monitoring. No dashboard config, no hassle.

Quick Start

Console (central dashboard):

curl -sSL https://raw.githubusercontent.com/chriopter/lumenmon/main/console/install.sh | bash

Agent (on each monitored host):

curl -sSL https://raw.githubusercontent.com/chriopter/lumenmon/main/agent/install.sh | bash
lumenmon-agent register '<invite-url>'
lumenmon-agent start

Architecture

┌─────────────┐               ┌─────────────┐
│   Agent     │──────────────►│   Console   │
├─────────────┤  MQTT/TLS     ├─────────────┤
│ Collectors  │──────────────►│ • MQTT 8884 │──► Web :8080
│ (see below) │               │ • SQLite    │
└─────────────┘               │ • Flask     │
  (bare metal)                └─────────────┘
                                 (Docker)

Collection Intervals

Rhythm	Interval	Typical collectors
`PULSE`	1s	`cpu`, `heartbeat`
`BREATHE`	60s	`memory`, `disk`
`CYCLE`	5m	`mail`, `proxmox_*`
`REPORT`	1h	`hostname`, `lumenmon`, `version`, `debian_updates`

Collectors

All collectors are plain Bash scripts under agent/collectors/.

Read this section as:

Collector: script name.
Publishes: metric names sent to MQTT.
Interval: update cadence.
Failure behavior: when/why a metric marks host state as degraded.

Generic (all Linux systems)

Collector	Publishes	Interval	Failure behavior
`cpu`	`generic_cpu`	1s (`PULSE`)	stale only
`memory`	`generic_memory`	60s (`BREATHE`)	stale only
`disk`	`generic_disk`	60s (`BREATHE`)	stale only
`heartbeat`	`generic_heartbeat`	1s (`PULSE`)	stale only (drives online/offline)
`hostname`	`generic_hostname`	1h (`REPORT`)	stale only
`lumenmon`	`generic_sys_os`, `generic_sys_kernel`, `generic_sys_uptime`	1h (`REPORT`)	stale only
`version`	`generic_agent_version`	1h (`REPORT`)	stale only (UI may show update warning)
`mail`	`mail_message`	5m (`CYCLE`)	informational (message stream)
`zpool`	`generic_zpool_total`, `generic_zpool_degraded`	5m (`CYCLE`)	fails when any non-Proxmox pool is degraded

Debian/Ubuntu

Collector	Publishes	Interval	Failure behavior
`updates`	`debian_updates_total`, `debian_updates_security`, `debian_updates_release`, `debian_updates_age`	1h (`REPORT`)	total/security only warn after updates are pending for >=24h; release >0 stays critical

Proxmox VE

Collector	Publishes	Interval	Failure behavior
`vms`	`proxmox_vms_*`	5m (`CYCLE`)	stale only
`containers`	`proxmox_containers_*`	5m (`CYCLE`)	stale only
`storage`	`proxmox_storage_*`	5m (`CYCLE`)	stale + bounds if configured
`zfs`	`proxmox_zfs_*`	5m (`CYCLE`)	stale + bounds (online drives vs total)
`zpool_health`	`proxmox_zpool_*`	5m (`CYCLE`)	degraded and upgrade-needed flags (`max=0`)

Proxmox Backup Server (PBS)

Collector	Publishes	Interval	Failure behavior
`datastore_count`	`pbs_datastore_count`	5m (`CYCLE`)	fails when no datastore is detected
`task_failures`	`pbs_task_failures_24h`	5m (`CYCLE`)	fails when errors/failures > 0 in last 24h
`backup_age`	`pbs_backup_age_hours`	5m (`CYCLE`)	fails when age exceeds 24h
`verify_age`	`pbs_verify_age_hours`	5m (`CYCLE`)	fails when age exceeds 168h
`sync_age`	`pbs_sync_age_hours`	5m (`CYCLE`)	fails when age exceeds 168h
`gc_age`	`pbs_gc_age_hours`	5m (`CYCLE`)	fails when age exceeds 168h

Hardware (real hosts)

Collector	Publishes	Interval	Failure behavior
`temp`	`hardware_temp_*`	5m (`CYCLE`)	fails on temperature thresholds; negative sensor glitches are clamped to 0
`pcie_errors`	`hardware_pcie_*`	1h (`REPORT`)	fails on PCIe/AER errors
`intel_gpu`	`hardware_intel_gpu_*`	5m (`CYCLE`)	fails on Intel GPU utilization thresholds
`vram`	`hardware_gpu_vram_*`	5m (`CYCLE`)	fails on VRAM usage thresholds
`smart_values`	`hardware_smart_*`	1h (`REPORT`)	fails on SMART health/temp/wear thresholds
`ssd_samsung`	`hardware_samsung_*`	1h (`REPORT`)	inventory/firmware visibility for Samsung SSDs

Note: on virtualized guests, hardware collectors stay disabled by default. If GPU passthrough is detected, hardware_intel_gpu and hardware_vram are enabled automatically.

Optional

Collector	Publishes	Interval	Failure behavior
`mullvad_active`	`optional_mullvad_active`	opt-in	stale/bounds depend on local config

Quick policy note

Collector health is computed from:

metric stale timeout (interval exceeded), and/or
min/max bounds violations (min_value, max_value).

Entity (host) health rolls up from metric health. If any metric fails, entity status becomes degraded.

Non-collector checks also exist in the console API/UI layer:

Mail staleness endpoint: /api/messages/staleness (default 14 days / 336h, warning)
Alerting status endpoint: /api/alerts/status

Commands

Console (lumenmon):

lumenmon            # Show status
lumenmon invite     # Generate agent invite
lumenmon logs       # View logs
lumenmon update     # Update container
lumenmon uninstall  # Remove everything

Agent (lumenmon-agent):

lumenmon-agent              # Show status
lumenmon-agent debug        # Run all collectors once (test output)
lumenmon-agent register     # Register with invite URL
lumenmon-agent start/stop   # Control service
lumenmon-agent logs         # View logs
lumenmon-agent uninstall    # Remove agent

Console

Docker container running MQTT broker (Mosquitto), SQLite database, and web dashboard (Flask + Caddy).

Install: Downloads docker-compose.yml, pulls image from GitHub Container Registry, starts container.

Update: lumenmon update pulls latest image and restarts container. Data in ~/.lumenmon/console/data/ is preserved.

Uninstall: lumenmon uninstall stops container, removes image and all data.

Agent

Pure bash scripts that collect metrics and publish via mosquitto_pub over TLS. No Docker, no compiled binaries.

Supported Platforms:

Platform	Install Path	Service
Debian/Ubuntu	`/opt/lumenmon/`	systemd
Proxmox VE	`/opt/lumenmon/`	systemd

Requirements: mosquitto-clients (apt install mosquitto-clients)

Install:

Downloads scripts to /opt/lumenmon/
Creates systemd service lumenmon-agent.service
Creates CLI /usr/local/bin/lumenmon-agent

Update: lumenmon-agent update fetches and checks out the latest release tag. Credentials preserved. The console dashboard shows "UPDATE AVAILABLE" when a newer version exists.

Uninstall: lumenmon-agent uninstall stops service/process, removes files.

Security

TLS Pinning: Agents verify server certificate fingerprint on first connection
Per-agent credentials: Each agent gets unique MQTT credentials
Outbound only: Agents initiate connections, console cannot connect to agents
Rate limiting: MQTT broker limits connections and message rates

Mail Forwarding

Two methods to receive system mail - use whichever fits your setup:

Method 1: Local spool (Debian/Ubuntu)

/var/mail/root → agent → MQTT → console

Agent automatically reads local mail spool every 5 minutes. Works out-of-the-box on systems where mail delivers to /var/mail/root.

Method 2: SMTP (Proxmox/PBS)

System notifications → SMTP (port 25) → console

Configure your system to send mail to <agent_id>@<console-host>. Works with Proxmox notification system.

Both methods store mail in the same messages table, displayed per-agent in the web UI.

Data

All data stored in SQLite at /data/metrics.db (inside container).

Retention: Metrics older than 24h auto-deleted every 5 minutes. Most recent value per metric always preserved so offline agents keep their last known status.

Metrics Table (one per agent+metric, e.g. id_abc123_generic_cpu):

Column	Type	Description
timestamp	INTEGER	Unix timestamp (primary key)
value_real	REAL	Decimal values (CPU %, memory %)
value_int	INTEGER	Whole numbers
value_text	TEXT	Strings (hostname, version)
interval	INTEGER	Expected update interval (seconds)
min_value	REAL	Minimum valid value (optional)
max_value	REAL	Maximum valid value (optional)

Messages Table (messages):

Column	Type	Description
id	INTEGER	Auto-increment primary key
agent_id	TEXT	Agent that received the email
mail_from	TEXT	Sender address
mail_to	TEXT	Recipient address
subject	TEXT	Email subject
body	TEXT	Email body
received_at	TIMESTAMP	When received
read	INTEGER	0=unread, 1=read

Writing Custom Collectors

Collectors are bash scripts in agent/collectors/. Standard structure:

#!/bin/bash
# What this collector does.
# Data source and calculation details.

METRIC="generic_example"
TYPE="REAL"            # REAL, INTEGER, or TEXT
MIN=0                  # Optional hard minimum (fail/red below)
MAX=100                # Optional hard maximum (fail/red above)
WARN_MIN=""           # Optional soft minimum (warn/yellow below)
WARN_MAX=""           # Optional soft maximum (warn/yellow above)

source "$LUMENMON_HOME/core/mqtt/publish.sh"

while true; do
    # IMPORTANT: Use LC_ALL=C for commands that produce localized output
    value=$(LC_ALL=C some_command | parse_output)

    publish_metric "$METRIC" "$value" "$TYPE" "$BREATHE" "$MIN" "$MAX" "$WARN_MIN" "$WARN_MAX"
    [ "${LUMENMON_TEST_MODE:-}" = "1" ] && exit 0  # Support: lumenmon-agent status

    sleep $BREATHE
done

Locale handling: Always use LC_ALL=C prefix for system commands to ensure consistent English output parsing:

# Good - forces English output
total=$(LC_ALL=C apt list --upgradable 2>/dev/null | grep -c "upgradable from")
usage=$(LC_ALL=C df -P / | tail -1 | awk '{print $5}' | tr -d '%')

# Bad - output varies by system locale
total=$(apt list --upgradable | grep -c "upgradable from")  # Fails on German systems

publish_metric signature:

publish_metric "name" "value" "TYPE" interval [min] [max] [warn_min] [warn_max]

Health Detection:

Values outside min/max show as failed (critical/red).
Values outside warn_min/warn_max (but inside min/max) show as warning (degraded/yellow).

# Static bounds (percentages)
publish_metric "cpu" "$val" "REAL" "$PULSE" 0 100

# Dynamic bounds (ZFS: online must equal total drives)
publish_metric "zfs_online" "$online" "INTEGER" "$CYCLE" "$total" "$total"

# Warning-only threshold
publish_metric "debian_updates_total" "$total" "INTEGER" "$REPORT" 0 "" "" 0

# One-time metric (interval=0, never stale)
publish_metric "hostname" "$host" "TEXT" 0

Categories:

Directory	Prefix	Purpose
`collectors/generic/`	`generic_`	Universal (CPU, memory, disk)
`collectors/proxmox/`	`proxmox_`	Proxmox (VMs, containers, ZFS)
`collectors/pbs/`	`pbs_`	Proxmox Backup Server checks
`collectors/hardware/`	`hardware_`	Real-hardware telemetry
`collectors/optional/`	`optional_`	Explicitly opt-in checks

Development

Local Development

./dev/auto         # Full reset and setup with virtual agent
./dev/add3         # Spawn 3 test agents
./dev/check-collectors  # Validate collector contract assumptions
./dev/sensor-inventory  # List current remote sensors and failed checks
./dev/sandboxer-maintain --once  # Run one auto-maintenance pass
./dev/lumenmon-diagnose  # End-to-end health/data-flow diagnosis

Operational Checks (Current)

Use these commands as a complete fast-check list during development and direct deploy.

# Local script sanity
find . -name "*.sh" -type f -exec bash -n {} \;
./dev/check-collectors

# Console image sanity
docker build -t test-console:ci ./console

# E2E tests
cd dev/tests && npm test
cd dev/tests && npx playwright test lumenmon.spec.ts -g "Page Load & Initial State"

# Runtime status (local or remote)
lumenmon
lumenmon-agent

# Direct deploy + smoke checks
./dev/deploy-test agent
./dev/deploy-test console
./dev/deploy-test status
./dev/deploy-test check

Optional Collector Config

Optional collectors are enabled via keys in agent/data/config (or /opt/lumenmon/data/config on host):

mullvad_active=1

# hardware collectors on virtual hosts (optional override)
hardware_force=0

Alerting (Webhook Status Only)

Console exposes webhook alert configuration status in GUI and API:

API: GET /api/alerts/status
GUI footer: alerts: not configured / alerts: webhook dry-run / alerts: active

Current behavior is status-only scaffolding (no outbound webhook delivery yet).

Mail Staleness (Server-side)

Mail staleness is evaluated in console backend from messages.received_at:

API: GET /api/messages/staleness?hours=336
Used by UI status warnings (MAIL STALE > 14D)
Treated as warning/degraded (yellow), not critical.

CSS (Tailwind)

cd console && npm install   # First time setup
cd console && npm run dev   # Watch mode - auto-recompile on changes
cd console && npm run build # One-time build

Remote Test Server

Deploy directly to a test server via SSH (bypasses GitHub Actions):

cp .env.example .env
# set LUMENMON_TEST_HOST in .env (gitignored)

export LUMENMON_TEST_HOST="root@your-test-server"  # optional shell override
./dev/deploy-test web      # Build CSS + hot reload frontend (~3s)
./dev/deploy-test agent    # Deploy agent + restart (~1s)
./dev/deploy-test console  # Full console + restart (~5s)
./dev/deploy-test status   # Check remote status
./dev/deploy-test check    # API/runtime smoke checks

Releases

./dev/release      # Create version tag, triggers GitHub Actions build

GitHub Actions only builds Docker images on version tags (v*), not on every commit. This keeps the dev loop fast.

Made with 🔆 by chriopter

Name		Name	Last commit message	Last commit date
Latest commit History 647 Commits
.claude		.claude
.github		.github
.sandboxer		.sandboxer
agent		agent
console		console
dev		dev
docs		docs
.env.example		.env.example
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
README.md		README.md
deploy.sh		deploy.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Quick Start

Architecture

Collection Intervals

Collectors

Generic (all Linux systems)

Debian/Ubuntu

Proxmox VE

Proxmox Backup Server (PBS)

Hardware (real hosts)

Optional

Quick policy note

Commands

Local Development

Operational Checks (Current)

Optional Collector Config

Alerting (Webhook Status Only)

Mail Staleness (Server-side)

CSS (Tailwind)

Remote Test Server

Releases

About

Uh oh!

Releases 14

Packages

Uh oh!

Uh oh!

Contributors 3

Uh oh!

Languages

chriopter/lumenmon

Folders and files

Latest commit

History

Repository files navigation

Quick Start

Architecture

Collection Intervals

Collectors

Generic (all Linux systems)

Debian/Ubuntu

Proxmox VE

Proxmox Backup Server (PBS)

Hardware (real hosts)

Optional

Quick policy note

Commands

Local Development

Operational Checks (Current)

Optional Collector Config

Alerting (Webhook Status Only)

Mail Staleness (Server-side)

CSS (Tailwind)

Remote Test Server

Releases

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 14

Packages 0

Uh oh!

Uh oh!

Contributors 3

Uh oh!

Languages

Packages