Skip to content

Python-based monitoring toolset designed to collect real-time performance and health metrics from a SCALE Computing HyperCore (HC3) cluster. It fetches data via the cluster’s REST API, formats it, and pushes it into InfluxDB for analysis and visualization.

Notifications You must be signed in to change notification settings

Nels2/Scale-Api-Metrics

Repository files navigation

SCALE API Metrics

This project provides automated tools for collecting performance and health metrics from a SCALE Computing HC3 cluster via its REST API. Metrics are ingested into InfluxDB for real-time analysis, visualization, or alerting. I've also included some other sample scripts such as get_nodeUsage.py, which has been updated to use session-based login instead of Basic Auth.

Overview

  • get_stats.py: Collects per-VM metrics (CPU, disk I/O, network usage).
  • get_ClusterStats.py: Collects per-node metrics (CPU, memory, drive health and temperature).
  • get_stats-basicauth.py / get_ClusterStats-basicauth.py – Variants of the main metric collectors using Basic Auth instead of session tokens.
  • start_metric_fetch.sh: Entrypoint script to handle session refresh and sequential execution of both metric collectors.
  • gen_sessionID.py – Generates a session token stored at session/sessionLogin.p.
  • kill_sessionID.py – Invalidates the current session.
  • run_GenDevSession.sh – Shell wrapper to generate a dev session token.
  • run_KillDevSession.sh – Shell wrapper to kill a dev session.
  • get_nodeUsage.py – Collects node-level CPU/memory metrics, uses session-based login.
  • run_getSnapshotReportAll.sh – Optional utility for fetching snapshot-related data or reports (custom functionality).

Requirements

  • Python 3.9+
  • InfluxDB 2.x
  • Virtual environment set up in vfx/
  • SCALE HyperCore API access
  • Session token stored in session/sessionLogin.p (generated by gen_sessionID.py)

Installation

# Clone this repo
cd /Projects/
git clone https://github.com/Nels2/Scale-Api-Metrics.git
cd Scale-Api-Metrics

# Set up virtual environment
python3 -m venv vfx
source vfx/bin/activate
pip install -r requirements.txt

Usage

To start metric collection manually:

bash start_metric_fetch.sh

This script:

  • Checks if your session token is older than 12 hours
  • Kills and regenerates session ID if needed
  • Runs get_stats.py (VM metrics)
  • Runs get_ClusterStats.py (Node metrics)

Output

Metrics are written to InfluxDB:

  • VM stats (measurement: vm_metrics)
  • VM disk stats (measurement: disk_stats)
  • Node CPU (measurement: cpu)
  • Node memory (measurement: memory)
  • Disk health/temperature (measurement: disk)

Notes

  • SSL verification is disabled, you can just remove the flag if you do need it.
  • Requires a running InfluxDB instance and pre-created bucket/org, or you can swap out with an alternative.
  • Ensure SCALE REST API is accessible via hostname/IP set in host variable in both scripts

Session Management

  • Sessions are stored in: session/sessionLogin.p
  • If older than 12 hours, a new session is generated automatically

Cron Example

To run every two minutes, add this to your crontab:

*/2 * * * * /bin/bash /Projects/scale_metrics/start_metric_fetch.sh

License: MIT Feel free to use in your own projects.

About

Python-based monitoring toolset designed to collect real-time performance and health metrics from a SCALE Computing HyperCore (HC3) cluster. It fetches data via the cluster’s REST API, formats it, and pushes it into InfluxDB for analysis and visualization.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published