Merged
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
Adds support for overriding the container image used by the ClusterLoader2 CRI/image-pull benchmark so pipelines can exercise large, multi-layer “customer-like” images.
Changes:
- Plumbs a new
test_image/CL2_TEST_IMAGEparameter from the pipeline step into the CRI override generator (cri.py) and CL2 config. - Updates the deployment template to use a configurable image (AKS/Linux/memory path) and adds a topology spread constraint to improve node distribution.
- Refines the containerd throughput “AvgPerNode” PromQL query to exclude nodes with no pulls.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| steps/engine/clusterloader2/cri/execute.yml | Passes --test_image into the CRI override step via env/config. |
| modules/python/clusterloader2/cri/cri.py | Adds --test_image CLI arg and writes CL2_TEST_IMAGE into overrides. |
| modules/python/clusterloader2/cri/config/deployment_template.yaml | Introduces TestImage template param, uses it for the AKS/Linux/memory image, and adds topology spread constraints. |
| modules/python/clusterloader2/cri/config/containerd-measurements.yaml | Filters “AvgPerNode” to active nodes in PromQL. |
| modules/python/clusterloader2/cri/config/config.yaml | Wires CL2_TEST_IMAGE into the deployment template fill map. |
Comments suppressed due to low confidence (1)
modules/python/clusterloader2/cri/config/deployment_template.yaml:59
- When
TestImageis not the default, this template no longer renders an explicitcommand/argsfor the memory container (thestressblock is gated byif eq $TestImage ...). That changes the benchmark from a known long-running workload to whatever the image entrypoint does, which can cause early exits or add non-pull-related variance. Consider keeping a stable long-running command for all images, or making the command configurable alongsideTestImage.
{{if eq $TestImage "e2e-test-images/resource-consumer:1.13"}}
command:
- stress
args:
- --vm
modules/python/clusterloader2/cri/config/deployment_template.yaml
Outdated
Show resolved
Hide resolved
jikuma
reviewed
Feb 12, 2026
jikuma
previously approved these changes
Feb 12, 2026
liyu-ma
reviewed
Feb 16, 2026
vittoriasalim
previously approved these changes
Feb 17, 2026
vittoriasalim
approved these changes
Feb 17, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Enables the use of custom container images in image pull benchmarks, allowing more realistic testing with large, multi-layer images that match real customer workloads.
Changes
TestImageparameter todeployment_template.yamlto allow overriding the default test image.TestImageis intentionally scoped to Linux/memory/AKS, this is the only path used by image-pull scenarios where we need custom image sizes for throughput testing. CPU workloads and Windows use resource-consumer which has specific stress commands.cri.pyandexecute.ymlto pass through thetest_imageparameter from pipeline configurationbenchmark/customer-replica:v1(10GB, 79-layer image matching customer manifest)operation_timeout: 30m,pod_startup_latency_threshold: 600smax_podsfrom 30 to 26 to prevent pod scheduling failures due to node resource constraintsTesting
Validated with 10-node cluster pulling 10GB custom image: