nightshiftco · tensor-ninja · Feb 26, 2026 · Feb 26, 2026 · Feb 27, 2026 · Feb 27, 2026
diff --git a/.gitignore b/.gitignore
@@ -1,9 +1,12 @@
 __pycache__/
+.DS_Store
 *.pyc
 *.pyo
+*.png
 *.pyd
 .Python
 env/
 build/
 dist/
 infra/.deploy-state
+loadtest/results/
diff --git a/loadtest/README.md b/loadtest/README.md
@@ -0,0 +1,77 @@
+# Load Testing
+
+Ramps concurrent Nightshift VMs in waves to find the concurrency ceiling of a node. Collects per-run timing metrics, host resource samples, and produces a scaling report with zone classifications.
+
+## Quick start
+
+```bash
+# Run load test (deploys agent, launches waves, writes results)
+uv run python loadtest/run.py --waves 5,10,20,30,50,50,50 --duration 180
+
+# Visualize the most recent result
+uv run --with matplotlib --with numpy python loadtest/run.py --vis
+
+# Visualize a specific result
+uv run --with matplotlib --with numpy python loadtest/run.py --vis loadtest/results/loadtest-20260226-114607.json
+```
+
+## How it works
+
+1. **Deploy** — `run.py` packages `agent.py` into a tarball and deploys it via `POST /api/agents`
+2. **Wave launch** — Waves fire concurrently (e.g. 5, then 10, then 20 VMs). Each wave runs in parallel with previous waves still executing, so VMs accumulate
+3. **SSE streaming** — Each run connects to the event stream and tracks cold start, heartbeats, and completion
+4. **Monitor** (optional) — If `--host` and `--key` are provided, `monitor.py` SSH-es into the host every 5 seconds and samples CPU, memory, disk, Firecracker VM count, TAP devices, file descriptors, and iptables rules
+5. **Report** — Results are saved to JSON and a scaling report is printed with GREEN/YELLOW/RED zone classifications
+
+## Options
+
+| Flag | Default | Description |
+|---|---|---|
+| `--api-url` | `https://api.nightshift.sh` | Nightshift API base URL |
+| `--api-key` | `12345` | API key |
+| `--waves` | `5,10,20,30,50,50,50` | Comma-separated wave sizes |
+| `--duration` | `180` | Run duration per agent (seconds) |
+| `--wave-interval` | `30` | Seconds between wave launches |
+| `--host` | — | SSH host for monitoring |
+| `--key` | — | SSH key path for monitoring |
+| `--output-dir` | `loadtest/results` | Output directory |
+| `--skip-deploy` | `false` | Reuse existing agent (skip deployment) |
+| `--fail-threshold` | `50` | Wave failure % that stops the test |
+| `--vis` | — | Visualize results instead of running a test |
+
+## Metrics
+
+**Per-run timing:**
+- `submit_duration_s` — time for POST to return 202
+- `cold_start_s` — time from submission to first SSE event (includes VM boot)
+- `total_duration_s` — full run duration
+
+**Per-wave aggregates:**
+- Success/failure rate
+- Avg, P95, max cold start
+
+**Host resources** (from `monitor.json`):
+- CPU load average, memory usage, disk usage
+- Firecracker process count, TAP device count
+- Nightshift file descriptors, iptables rules
+
+## Zone classification
+
+Each wave is classified into a zone based on its metrics:
+
+| Zone | Criteria | Meaning |
+|---|---|---|
+| **GREEN** | failure < 5%, p95 cold start < 15s, avg < 10s | Safe operating range |
+| **YELLOW** | avg cold start > 10s or p95 > 15s | Approaching limits, plan to scale |
+| **RED** | failure > 5% or p95 cold start > 30s | Over capacity, errors occurring |
+
+## Visualization
+
+`--vis` produces a PNG with four panels:
+
+1. **P95 Cold Start** — line chart colored by zone with gradient fill, annotated peak/best values
+2. **Success Rate** — per-wave bar chart with overall pass rate
+3. **Memory Usage** — time series from monitor data with peak/start/end callouts
+4. **Active VMs & System Load** — Firecracker count + load average overlay
+
+Charts use a dark theme with monospace fonts. Monitor panels only appear when `monitor.json` is present alongside the result file.
diff --git a/loadtest/agent.py b/loadtest/agent.py
@@ -0,0 +1,78 @@
+"""Load-test agent — deployed to Nightshift, runs for a configurable duration.
+
+The prompt should be a number of seconds to run (e.g. "180").
+The agent does light CPU work (prime sieve) and emits periodic heartbeats
+so the load test runner can track liveness.
+
+Usage (deploy manually):
+    uv run nightshift deploy loadtest/agent.py
+
+Or let loadtest/run.py deploy it automatically via the API.
+"""
+
+import os
+import time
+
+from nightshift import AgentConfig, NightshiftApp
+
+app = NightshiftApp()
+
+
+@app.agent(
+    AgentConfig(
+        vcpu_count=2,
+        mem_size_mib=2048,
+        timeout_seconds=1800,
+        max_concurrent_vms=200,
+        stateful=False,
+    )
+)
+async def loadtest_agent(prompt: str):
+    """Run for the requested duration with periodic heartbeats."""
+    try:
+        duration = int(prompt.strip())
+    except ValueError:
+        duration = 180
+
+    start = time.monotonic()
+    heartbeat_num = 0
+
+    yield {
+        "type": "agent.message",
+        "role": "assistant",
+        "content": f"loadtest: starting, duration={duration}s, pid={os.getpid()}",
+    }
+
+    while time.monotonic() - start < duration:
+        # Light CPU work: count primes up to 5000 via sieve
+        sieve = [True] * 5000
+        sieve[0] = sieve[1] = False
+        for i in range(2, int(5000**0.5) + 1):
+            if sieve[i]:
+                for j in range(i * i, 5000, i):
+                    sieve[j] = False
+        prime_count = sum(sieve)
+
+        heartbeat_num += 1
+        elapsed = time.monotonic() - start
+        yield {
+            "type": "agent.message",
+            "role": "assistant",
+            "content": (
+                f"loadtest: heartbeat {heartbeat_num}, "
+                f"elapsed={elapsed:.1f}s, primes={prime_count}"
+            ),
+        }
+
+        # Sleep between heartbeats (10s intervals)
+        remaining = duration - (time.monotonic() - start)
+        if remaining > 0:
+            import asyncio
+            await asyncio.sleep(min(10, remaining))
+
+    total = time.monotonic() - start
+    yield {
+        "type": "agent.message",
+        "role": "assistant",
+        "content": f"loadtest: completed, total={total:.1f}s, heartbeats={heartbeat_num}",
+    }