Skip to content

Comments

feat: add observability tooling for Flashbox images#93

Open
MoeMahhouk wants to merge 10 commits intomainfrom
moe/flashbox-observability
Open

feat: add observability tooling for Flashbox images#93
MoeMahhouk wants to merge 10 commits intomainfrom
moe/flashbox-observability

Conversation

@MoeMahhouk
Copy link
Member

@MoeMahhouk MoeMahhouk commented Feb 6, 2026

This pull request introduces a comprehensive observability and monitoring stack to the project, centered around Prometheus and its exporters. It adds Prometheus, node-exporter, and process-exporter as services, configures them for system and container-level metrics collection, and sets up recording rules for aggregated metrics. The changes also include dynamic configuration improvements, firewall adjustments for metrics endpoints, and new helper scripts for environment-specific configuration.

Observability & Monitoring Integration

  • Added Prometheus, node-exporter, and process-exporter as systemd services, including installation, configuration, and service enablement for system and container monitoring (prometheus.service, node-exporter.service, process-exporter.service).
  • Introduced Prometheus configuration templates and recording rules for aggregated CPU, memory, disk, network, and container health metrics (prometheus.yml.tmpl, recording_rules.yml, process-exporter.yml).
  • Added gomplate as a build dependency to render dynamic Prometheus configuration from templates.

Firewall & Networking

  • Updated firewall scripts in both L1 and L2 to dynamically allow outgoing traffic to the observability metrics endpoint, using the METRICS_ENDPOINT variable loaded from configuration.
  • Adjusted searcher-firewall.service dependencies to ensure correct ordering with configuration fetching.

Dynamic Configuration

  • Added project-specific dynamic configuration scripts for bob-l1 and bob-l2, supporting both QEMU development and Vault-based production environments. These scripts generate environment-specific config files based on mode and available secrets.

Miscellaneous

  • Ensured correct ownership of Prometheus data directories after installation to avoid permission issues.

These changes collectively enable robust, flexible, and secure monitoring of both the host system and key containers, and prepare the environment for future observability enhancements.

@MoeMahhouk MoeMahhouk force-pushed the moe/flashbox-observability branch from 8f75a8f to b0557e3 Compare February 12, 2026 17:42
@MoeMahhouk MoeMahhouk marked this pull request as ready for review February 13, 2026 12:45
@@ -0,0 +1,5 @@
process_names:
# Monitor the searcher container (conmon + all children via --children flag)
- name: "searcher-container"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we also monitor lighthouse in bob-l1?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants