Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .github/dependabot.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
version: 2
updates:
- package-ecosystem: "github-actions"
directory: "/"
schedule:
interval: "monthly"
188 changes: 188 additions & 0 deletions .github/workflows/dependency-update.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,188 @@
name: Dependency Update Check

on:
schedule:
- cron: '0 9 * * 1' # Every Monday 09:00 UTC
workflow_dispatch:

concurrency:
group: dependency-update
cancel-in-progress: false

permissions:
contents: write
pull-requests: write

jobs:
check-and-pr:
name: Check dependencies & open PR
runs-on: ubuntu-latest
outputs:
has_updates: ${{ steps.check.outputs.has_updates }}
pr_branch: ${{ steps.pr.outputs.branch }}
steps:
- name: Checkout
uses: actions/checkout@v4

- name: Setup Python
uses: actions/setup-python@v5
with:
python-version: "3.x"

- name: Install tools
run: |
sudo apt-get update
sudo apt-get install -y jq curl bats bc shellcheck dmidecode fio \
iproute2 lshw util-linux ethtool hwloc pciutils numactl smartmontools
pip install pre-commit

- name: Check for updates
id: check
run: |
report=$(bash scripts/check-updates.sh --json 2>/dev/null)
echo "$report" > update-report.json
count=$(echo "$report" | jq '.summary.updates_available')
echo "update_count=$count" >> "$GITHUB_OUTPUT"
if [ "$count" -gt 0 ]; then
echo "has_updates=true" >> "$GITHUB_OUTPUT"
echo "### Dependencies: $count update(s) available" >> "$GITHUB_STEP_SUMMARY"
jq -r '.dependencies[] | select(.update_available==true) | "- **\(.name)**: \(.current_version) → \(.latest_version)"' update-report.json >> "$GITHUB_STEP_SUMMARY"
else
echo "has_updates=false" >> "$GITHUB_OUTPUT"
echo "All dependencies up to date." >> "$GITHUB_STEP_SUMMARY"
fi

- name: Apply updates
if: steps.check.outputs.has_updates == 'true'
run: bash scripts/check-updates.sh --apply

- name: Validate — lint
if: steps.check.outputs.has_updates == 'true'
run: make lint

- name: Validate — unit tests
if: steps.check.outputs.has_updates == 'true'
run: make test

- name: Validate — static checks
if: steps.check.outputs.has_updates == 'true'
run: make static-checks

- name: Validate — smoke run
if: steps.check.outputs.has_updates == 'true'
run: sudo bash scripts/run-all.sh --smoke --ci

- name: Create or update PR
id: pr
if: steps.check.outputs.has_updates == 'true'
env:
GH_TOKEN: ${{ github.token }}
run: |
git config user.name "github-actions[bot]"
git config user.email "41898282+github-actions[bot]@users.noreply.github.com"

BRANCH="deps/auto-update-$(date +%Y%m%d)"

# Check for existing open dependency update PR
existing_branch=$(gh pr list --label dependencies --state open \
--json headRefName -q '.[0].headRefName // empty' 2>/dev/null || true)

if [ -n "$existing_branch" ]; then
BRANCH="$existing_branch"
git checkout -B "$BRANCH"
else
git checkout -b "$BRANCH"
fi
echo "branch=$BRANCH" >> "$GITHUB_OUTPUT"

git add -A

# Build commit message
updates_list=$(jq -r '.dependencies[] | select(.update_available==true) | "- \(.name): \(.current_version) → \(.latest_version)"' update-report.json)
git commit -m "deps: update dependencies ($(date +%Y-%m-%d))" -m "$updates_list" || true

git push -u origin "$BRANCH" --force-with-lease

# Build PR body
pr_body=$(cat <<'BODY_END'
## Automated Dependency Update

| Dependency | Current | Latest | Category |
|------------|---------|--------|----------|
BODY_END
)
table_rows=$(jq -r '.dependencies[] | select(.update_available==true) | "| \(.name) | \(.current_version) | \(.latest_version) | \(.category) |"' update-report.json)
pr_body="${pr_body}
${table_rows}

### CI Validation
- [x] Lint (\`make lint\`)
- [x] Unit tests (\`make test\`)
- [x] Static checks (\`make static-checks\`)
- [x] Smoke run (\`run-all.sh --smoke --ci\`)
- [ ] **GPU validation** — merge only after a quick run on a GPU host

> Auto-generated by \`scripts/check-updates.sh\` via scheduled CI."

if [ -n "$existing_branch" ]; then
pr_number=$(gh pr list --head "$existing_branch" --state open --json number -q '.[0].number' 2>/dev/null || true)
if [ -n "$pr_number" ]; then
gh pr edit "$pr_number" --body "$pr_body"
echo "Updated existing PR #${pr_number}"
fi
else
gh pr create \
--title "deps: update dependencies ($(date +%Y-%m-%d))" \
--body "$pr_body" \
--label "dependencies" || echo "PR creation failed (label may not exist)"
fi

- name: Upload update report
if: always()
uses: actions/upload-artifact@v4
with:
name: dependency-update-report
path: update-report.json
if-no-files-found: ignore

gpu-validate:
name: GPU validation (self-hosted)
needs: check-and-pr
if: needs.check-and-pr.outputs.has_updates == 'true' && vars.HPC_ENABLE_GPU_CI == '1'
runs-on: [self-hosted, linux, x64, gpu, nvidia]
steps:
- name: Checkout update branch
uses: actions/checkout@v4
with:
ref: ${{ needs.check-and-pr.outputs.pr_branch }}

- name: Run quick on GPU host
run: sudo bash scripts/run-all.sh --quick --ci

- name: Post GPU results to PR
if: always()
env:
GH_TOKEN: ${{ github.token }}
run: |
pr_number=$(gh pr list --head "${{ needs.check-and-pr.outputs.pr_branch }}" \
--state open --json number -q '.[0].number' 2>/dev/null || true)
result_file="/var/log/hpc-bench/results/run-all.json"
if [ -n "$pr_number" ] && [ -f "$result_file" ]; then
acceptance=$(jq -r '.acceptance // "unknown"' "$result_file")
gh pr comment "$pr_number" --body "### GPU Validation: ${acceptance}
Quick-mode results from self-hosted GPU runner.
<details><summary>Full results JSON</summary>

\`\`\`json
$(jq '.' "$result_file")
\`\`\`
</details>"
fi

- name: Upload GPU results
if: always()
uses: actions/upload-artifact@v4
with:
name: gpu-validation-results
path: /var/log/hpc-bench/results
if-no-files-found: warn
6 changes: 3 additions & 3 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ default_language_version:

repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.5.0
rev: v6.0.0
hooks:
- id: trailing-whitespace
- id: end-of-file-fixer
Expand All @@ -23,14 +23,14 @@ repos:
- id: check-shebang-scripts-are-executable

- repo: https://github.com/scop/pre-commit-shfmt
rev: v3.8.0-1
rev: v3.12.0-2
hooks:
- id: shfmt
args: ['-i', '4', '-ci', '-sr']
exclude: '^src/'

- repo: https://github.com/shellcheck-py/shellcheck-py
rev: v0.9.0.6
rev: v0.11.0.1
hooks:
- id: shellcheck
args: ['-s', 'bash', '-S', 'error']
Expand Down
15 changes: 14 additions & 1 deletion AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,14 +9,27 @@ This is a pure-Bash HPC benchmarking CLI suite (no web services, databases, or b
| Task | Command |
|------|---------|
| Lint (pre-commit: shfmt + shellcheck) | `make lint` |
| Unit tests (BATS, 98 tests) | `make test` |
| Unit tests (BATS, 126 tests) | `make test` |
| All quality gates | `make check` |
| CI static checks | `make static-checks` |
| Smoke run (bootstrap + inventory + report, ~5s) | `sudo bash scripts/run-all.sh --smoke --ci` |
| Quick run (short benchmarks) | `sudo bash scripts/run-all.sh --quick --ci` |

| Check dependency updates | `make check-updates` |
| Preview dependency updates | `bash scripts/check-updates.sh --apply --dry-run` |

See `Makefile` for all targets and `README.md` / `SKILL.md` for full documentation.

### Dependency update system

`scripts/check-updates.sh` tracks 14 external dependencies (container images, NVIDIA packages, upstream repos, pre-commit hooks) via `specs/dependencies.json`. It queries nvcr.io, Docker Hub, GitHub, and NVIDIA apt repos. Key modes:
- `--json` for CI consumption
- `--apply` to update version pins in source files
- `--apply --dry-run` to preview without modifying
- `--category <cat>` to filter (container_image, nvidia_package, pre_commit_hook, upstream_source)

The weekly GitHub Actions workflow (`.github/workflows/dependency-update.yml`) runs this automatically, validates with lint/tests/smoke, and opens a PR. CUDA↔driver compatibility constraints are checked before applying.

### Gotchas

- `pre-commit` is installed as a user package (`pip install --user`). Ensure `$HOME/.local/bin` is on `PATH` (the update script handles this).
Expand Down
17 changes: 17 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,23 @@ All notable changes to the HPC Bench Suite are documented here. Version number i

## Version History

### Dependency Update System (2026-02-26)

1. **New: `scripts/check-updates.sh`** — Automated dependency update checker.
- Tracks 14 external dependencies across 4 categories: container images (hpc-benchmarks, intel-hpckit), NVIDIA packages (driver, CUDA toolkit, DCGM, NCCL, Fabric Manager, Container Toolkit), upstream benchmark sources (gpu-burn, nccl-tests, nvbandwidth), and pre-commit hooks.
- Queries nvcr.io registry, Docker Hub, GitHub API, and NVIDIA apt repo.
- Modes: `--json` (machine-readable), `--apply` (update source files), `--apply --dry-run` (preview), `--category` (filter).
- CUDA↔driver compatibility constraints validated before applying updates.
- Post-apply validation (`bash -n` on `.sh`, `jq` on `.json`) with auto-revert on failure.
- Update history logged to `specs/update-history.json`.
2. **New: `specs/dependencies.json`** — Version manifest (single source of truth for tracked dependencies).
3. **New: `specs/update-history.json`** — Audit log for dependency updates.
4. **New: `.github/workflows/dependency-update.yml`** — Weekly GitHub Actions workflow (Mondays 09:00 UTC) that checks for updates, applies them, validates with lint/tests/smoke, and opens a PR. Optional GPU validation on self-hosted runner.
5. **New: `.github/dependabot.yml`** — Monthly auto-updates for GitHub Actions versions.
6. **New: `tests/check_updates.bats`** — 28 BATS tests for manifest schema, cross-checks, constraints, and script behavior.
7. **Makefile** — Added `check-updates` target.
8. **`.pre-commit-config.yaml`** — Updated hooks: pre-commit-hooks v4.5.0→v6.0.0, shfmt v3.8.0-1→v3.12.0-2, shellcheck-py v0.9.0.6→v0.11.0.1.

### V1.10 Changes (2026-02-14)

1. **scripts/run-all.sh** — Added CI mode:
Expand Down
12 changes: 11 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
.PHONY: help lint shellcheck test static-checks check smoke quick ci-smoke ci-quick report-html version
.PHONY: help lint shellcheck test static-checks check smoke quick ci-smoke ci-quick report-html check-updates version

# ── Meta ──

Expand All @@ -19,6 +19,9 @@ help:
@echo " make ci-quick Run quick with --ci (compact output for CI)"
@echo " make report-html Generate optional HTML report from HPC_RESULTS_DIR (default /var/log/hpc-bench/results)"
@echo ""
@echo "Maintenance:"
@echo " make check-updates Check tracked dependencies for newer versions"
@echo ""
@echo "Info:"
@echo " make version Show suite version"
@echo ""
Expand Down Expand Up @@ -66,6 +69,13 @@ report-html:
if [ ! -d "$$results" ]; then echo "Results dir not found: $$results. Run the suite first or set HPC_RESULTS_DIR."; exit 1; fi; \
python3 reporting/generate_html_report.py -i "$$results" -o "$$results/report.html"

# ── Maintenance ──

check-updates:
@command -v jq >/dev/null 2>&1 || { echo "jq not found."; exit 1; }
@command -v curl >/dev/null 2>&1 || { echo "curl not found."; exit 1; }
bash scripts/check-updates.sh

# ── Info ──

version:
Expand Down
30 changes: 26 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -191,6 +191,7 @@ HPC_RESULTS_DIR=/path/to/results bash scripts/report.sh
│ ├── bootstrap.sh # Bootstrap and dependency install
│ ├── run-all.sh # Master orchestrator (all phases)
│ ├── report.sh # Report generator
│ ├── check-updates.sh # Dependency update checker (see below)
│ ├── inventory.sh # General / CPU inventory
│ ├── gpu-inventory.sh
│ ├── topology.sh
Expand All @@ -211,10 +212,15 @@ HPC_RESULTS_DIR=/path/to/results bash scripts/report.sh
│ ├── filesystem-diag.sh
│ ├── thermal-power.sh
│ └── security-scan.sh
└── src/ # Bundled benchmark sources
├── stream.c # STREAM memory benchmark
├── gpu-burn/ # GPU burn-in (CUDA)
└── nccl-tests/ # Minimal NCCL test binaries
├── specs/
│ ├── modules.json # Module manifest (single source of truth)
│ ├── dependencies.json # Tracked external dependency versions
│ └── update-history.json # Dependency update audit log
├── src/ # Bundled benchmark sources
│ ├── stream.c # STREAM memory benchmark
│ ├── gpu-burn/ # GPU burn-in (CUDA)
│ └── nccl-tests/ # Minimal NCCL test binaries
└── tests/ # BATS unit and integration tests
```

## Linting and pre-commit
Expand Down Expand Up @@ -243,6 +249,22 @@ make static-checks
- Static gate runs `scripts/ci-static-checks.sh` (`bash -n` + `pre-commit run --all-files`)
- Ubuntu VM job runs `run-all.sh --smoke --ci` and `run-all.sh --quick --ci`
- Optional GPU job runs on self-hosted runners labeled `self-hosted,linux,x64,gpu,nvidia` and is enabled by repo variable `HPC_ENABLE_GPU_CI=1`
- **Dependency updates:** `.github/workflows/dependency-update.yml` runs weekly (Mondays 09:00 UTC) to check all 14 tracked dependencies for updates, apply them, validate with lint/tests/smoke, and open a PR. Manual trigger: `gh workflow run "Dependency Update Check"`. See **Dependency tracking** below.
- **Dependabot:** `.github/dependabot.yml` auto-updates GitHub Actions versions monthly.

## Dependency tracking

The suite tracks 14 external dependencies (container images, NVIDIA packages, upstream benchmark sources, pre-commit hooks) in `specs/dependencies.json`. Check for updates:

```bash
make check-updates # Human-readable report
bash scripts/check-updates.sh --json # Machine-readable JSON
bash scripts/check-updates.sh --apply --dry-run # Preview what would change
bash scripts/check-updates.sh --apply # Apply updates to source files
bash scripts/check-updates.sh --category nvidia_package # Check one category
```

The checker queries nvcr.io, Docker Hub, GitHub, and the NVIDIA apt repo. It validates CUDA↔driver compatibility constraints before applying, runs `bash -n` on modified files after applying, and logs all changes to `specs/update-history.json`.

## Concurrency and locking

Expand Down
13 changes: 10 additions & 3 deletions SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,16 +48,19 @@ bench/
│ ├── security-scan.sh # Phase 4: SSH audit, services, SUID, kernel
│ ├── report.sh # Phase 5: generates report.md from results
│ ├── run-all.sh # Orchestrator (runs all phases)
│ └── ci-static-checks.sh # CI-only: shellcheck + syntax checks
│ ├── ci-static-checks.sh # CI-only: shellcheck + syntax checks
│ └── check-updates.sh # Dependency update checker (not a module)
├── specs/
│ ├── modules.json # Module manifest (single source of truth)
│ └── hardware-specs.json # GPU spec lookup data
│ ├── dependencies.json # Tracked external dependency versions
│ └── update-history.json # Dependency update audit log
├── src/ # Bundled sources (gpu-burn, nccl-tests, STREAM)
├── tests/
│ ├── helpers.bash # Shared BATS test helpers
│ ├── common_helpers.bats # Unit tests for lib/common.sh
│ ├── report_helpers.bats # Unit tests for lib/report-common.sh
│ └── module_integration.bats # Integration tests (syntax, manifest, source-gate)
│ ├── module_integration.bats # Integration tests (syntax, manifest, source-gate)
│ └── check_updates.bats # Tests for dependency checker + manifest
├── .editorconfig # Formatting rules
├── .pre-commit-config.yaml # Pre-commit hooks (shfmt, shellcheck)
├── Makefile # Quality gates: make lint, make test, make smoke
Expand Down Expand Up @@ -118,6 +121,7 @@ bench/
- `make check` — runs lint + test + static checks.
- `make smoke` — runs `run-all.sh --smoke` end-to-end.
- `make quick` — runs `run-all.sh --quick` end-to-end.
- `make check-updates` — checks 14 tracked dependencies for upstream updates.

### JSON contract

Expand Down Expand Up @@ -180,3 +184,6 @@ Things an AI agent should **never** do:
| Dev conventions, JSON contract, config guide | [docs/DEVELOPMENT.md](docs/DEVELOPMENT.md) |
| Contributor guidelines | [CONTRIBUTING.md](CONTRIBUTING.md) |
| Quality gates | [Makefile](Makefile) |
| Dependency version manifest | [specs/dependencies.json](specs/dependencies.json) |
| Dependency update checker | [scripts/check-updates.sh](scripts/check-updates.sh) |
| Dependency update history | [specs/update-history.json](specs/update-history.json) |
Loading