Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,9 +1,12 @@
__pycache__/
.DS_Store
*.pyc
*.pyo
*.png
*.pyd
.Python
env/
build/
dist/
infra/.deploy-state
loadtest/results/
77 changes: 77 additions & 0 deletions loadtest/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
# Load Testing

Ramps concurrent Nightshift VMs in waves to find the concurrency ceiling of a node. Collects per-run timing metrics, host resource samples, and produces a scaling report with zone classifications.

## Quick start

```bash
# Run load test (deploys agent, launches waves, writes results)
uv run python loadtest/run.py --waves 5,10,20,30,50,50,50 --duration 180

# Visualize the most recent result
uv run --with matplotlib --with numpy python loadtest/run.py --vis

# Visualize a specific result
uv run --with matplotlib --with numpy python loadtest/run.py --vis loadtest/results/loadtest-20260226-114607.json
```

## How it works

1. **Deploy** — `run.py` packages `agent.py` into a tarball and deploys it via `POST /api/agents`
2. **Wave launch** — Waves fire concurrently (e.g. 5, then 10, then 20 VMs). Each wave runs in parallel with previous waves still executing, so VMs accumulate
3. **SSE streaming** — Each run connects to the event stream and tracks cold start, heartbeats, and completion
4. **Monitor** (optional) — If `--host` and `--key` are provided, `monitor.py` SSH-es into the host every 5 seconds and samples CPU, memory, disk, Firecracker VM count, TAP devices, file descriptors, and iptables rules
5. **Report** — Results are saved to JSON and a scaling report is printed with GREEN/YELLOW/RED zone classifications

## Options

| Flag | Default | Description |
|---|---|---|
| `--api-url` | `https://api.nightshift.sh` | Nightshift API base URL |
| `--api-key` | `12345` | API key |
| `--waves` | `5,10,20,30,50,50,50` | Comma-separated wave sizes |
| `--duration` | `180` | Run duration per agent (seconds) |
| `--wave-interval` | `30` | Seconds between wave launches |
| `--host` | — | SSH host for monitoring |
| `--key` | — | SSH key path for monitoring |
| `--output-dir` | `loadtest/results` | Output directory |
| `--skip-deploy` | `false` | Reuse existing agent (skip deployment) |
| `--fail-threshold` | `50` | Wave failure % that stops the test |
| `--vis` | — | Visualize results instead of running a test |

## Metrics

**Per-run timing:**
- `submit_duration_s` — time for POST to return 202
- `cold_start_s` — time from submission to first SSE event (includes VM boot)
- `total_duration_s` — full run duration

**Per-wave aggregates:**
- Success/failure rate
- Avg, P95, max cold start

**Host resources** (from `monitor.json`):
- CPU load average, memory usage, disk usage
- Firecracker process count, TAP device count
- Nightshift file descriptors, iptables rules

## Zone classification

Each wave is classified into a zone based on its metrics:

| Zone | Criteria | Meaning |
|---|---|---|
| **GREEN** | failure < 5%, p95 cold start < 15s, avg < 10s | Safe operating range |
| **YELLOW** | avg cold start > 10s or p95 > 15s | Approaching limits, plan to scale |
| **RED** | failure > 5% or p95 cold start > 30s | Over capacity, errors occurring |

## Visualization

`--vis` produces a PNG with four panels:

1. **P95 Cold Start** — line chart colored by zone with gradient fill, annotated peak/best values
2. **Success Rate** — per-wave bar chart with overall pass rate
3. **Memory Usage** — time series from monitor data with peak/start/end callouts
4. **Active VMs & System Load** — Firecracker count + load average overlay

Charts use a dark theme with monospace fonts. Monitor panels only appear when `monitor.json` is present alongside the result file.
78 changes: 78 additions & 0 deletions loadtest/agent.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
"""Load-test agent — deployed to Nightshift, runs for a configurable duration.

The prompt should be a number of seconds to run (e.g. "180").
The agent does light CPU work (prime sieve) and emits periodic heartbeats
so the load test runner can track liveness.

Usage (deploy manually):
uv run nightshift deploy loadtest/agent.py

Or let loadtest/run.py deploy it automatically via the API.
"""

import os
import time

from nightshift import AgentConfig, NightshiftApp

app = NightshiftApp()


@app.agent(
AgentConfig(
vcpu_count=2,
mem_size_mib=2048,
timeout_seconds=1800,
max_concurrent_vms=200,
stateful=False,
)
)
async def loadtest_agent(prompt: str):
"""Run for the requested duration with periodic heartbeats."""
try:
duration = int(prompt.strip())
except ValueError:
duration = 180

start = time.monotonic()
heartbeat_num = 0

yield {
"type": "agent.message",
"role": "assistant",
"content": f"loadtest: starting, duration={duration}s, pid={os.getpid()}",
}

while time.monotonic() - start < duration:
# Light CPU work: count primes up to 5000 via sieve
sieve = [True] * 5000
sieve[0] = sieve[1] = False
for i in range(2, int(5000**0.5) + 1):
if sieve[i]:
for j in range(i * i, 5000, i):
sieve[j] = False
prime_count = sum(sieve)

heartbeat_num += 1
elapsed = time.monotonic() - start
yield {
"type": "agent.message",
"role": "assistant",
"content": (
f"loadtest: heartbeat {heartbeat_num}, "
f"elapsed={elapsed:.1f}s, primes={prime_count}"
),
}

# Sleep between heartbeats (10s intervals)
remaining = duration - (time.monotonic() - start)
if remaining > 0:
import asyncio
await asyncio.sleep(min(10, remaining))

total = time.monotonic() - start
yield {
"type": "agent.message",
"role": "assistant",
"content": f"loadtest: completed, total={total:.1f}s, heartbeats={heartbeat_num}",
}
Loading