feat: public pricing API + DaemonSet-aware node scoring by Guimove · Pull Request #4 · Guimove/clusterfit

Guimove · 2026-02-12T13:35:24Z

Summary

Replace AWS Pricing API (pricing:GetProducts) with public runs-on.com API — no IAM permissions needed
Remove aws-sdk-go-v2/service/pricing dependency entirely
Favor fewer, larger nodes per AWS EKS best practices
Update default instance families to current gen (m7i, c7i, r7i, m7a, c7a, r7a)
Fix "0 scenarios" log bug

Scoring changes

Per AWS: "fewer, larger instances are better, especially with many DaemonSets"

Nodes	Old Score	New Score
1	20	20
3	60	90
5-15	100	100
30	100	85
40	100	70 - DS penalty
100	80	55 - DS penalty

DaemonSet penalty: min(dsCount × nodeCount / 100 × 5, 20) — with 12 DS and 40 nodes, that's a -9.6 point penalty.

Test plan

go test -race ./... — all pass
golangci-lint run — 0 issues
Real cluster: 880 workloads + 12 DS, pricing shows correct $/month values

Replace the AWS Pricing API (requires pricing:GetProducts IAM permission) with the public runs-on.com EC2 pricing API. No authentication required, includes both on-demand and spot prices, updated hourly. - Remove aws-sdk-go-v2/service/pricing dependency entirely - Simplify AWSProvider: only needs ec2:DescribeInstanceTypes permission - EnrichWithPricing called automatically in GetInstanceTypes - Update default instance families to current gen (m7i, c7i, r7i, etc.) - Fix "0 scenarios" log message (was hardcoded, never updated)

Per AWS docs: "fewer, larger instances are better, especially if you have a lot of DaemonSets" — each DS runs on every node, so more nodes means more wasted resources on DS replicas. - Refine resilience scoring: sweet spot at 5-15 nodes, progressive penalty above 30 nodes instead of flat 100 for 3-50 - Add DaemonSet overhead penalty: high DS count × high node count reduces the resilience score (up to -20 points) - Pass DaemonSetCount from orchestrator to scorer

Add cluster-wide P95 CPU/memory and observed min/max node count queries to capture HPA/autoscaler scaling peaks that per-pod instant snapshots miss. Enforce a configurable minimum node count (default 3) as an HA constraint in bin-packing. Compute scaling efficiency per candidate instance type and penalize poor trough utilization in scoring. - Add ClusterAggregateMetrics and ScalingEfficiency model types - Replace unused peak replica queries with 4 cluster aggregate PromQL - Add MinNodes to SimulationConfig (default 3) with BFD padding - Compute trough CPU utilization from scaling ratio - Penalize resilience score when trough utilization < 30% - Display cluster P95, node range, min nodes in report headers - Show [trough: XX%] warning in table/markdown notes - Update README with full documentation and correct AWS requirements

Per-pod effective sizing uses max(request, P95_usage) which inflates CPU when pods over-request relative to actual usage. This led to "compute-optimized" classification on a cluster that was actually memory-bound (0.8 vCPU, 9.4 GiB → 11.75 GiB/vCPU). Prefer cluster-level aggregate P95 CPU/memory (from the full metrics window) for classification when available. Fall back to per-pod totals when aggregate metrics are absent.

When auto-classifying to an extreme (compute or memory-optimized), also include M-series (general-purpose) families. Per-pod requests may skew the bin-packing constraint away from the aggregate classification — M-series provides a balanced middle ground that the scorer evaluates alongside the primary family. Fixes clusters where aggregate usage is memory-heavy but per-pod CPU requests are inflated, causing R-series to be CPU-saturated (97%) with wasted memory (32%).

Guimove added 5 commits February 12, 2026 14:32

Guimove merged commit 83e1e7c into main Feb 12, 2026
3 checks passed

Guimove deleted the fix/pricing-enrichment branch February 12, 2026 14:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

feat: public pricing API + DaemonSet-aware node scoring#4

feat: public pricing API + DaemonSet-aware node scoring#4
Guimove merged 5 commits intomainfrom
fix/pricing-enrichment

Guimove commented Feb 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

Guimove commented Feb 12, 2026

Summary

Scoring changes

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant