Skip to content

Conversation

@kamalca
Copy link
Collaborator

@kamalca kamalca commented Dec 16, 2025

  1. CPU class has 67+ tests, running for 60s each needs over an hour (the default) to execute
  2. Using --oom-avoid flag prevents failures due to OOM kill on the IO class tests
  3. Make error messages more descriptive: return a list of all failures instead of just the first error, correlate the cmd id to the error, categorize failures by initialization, execution, and kernel panics.

1. CPU class has 67+ tests, running for 60s each needs over an hour (the default) to execute
2. Using `--oom-avoid` flag prevents failures due to OOM kill on the IO class tests
@kamalca kamalca force-pushed the kameroncarr/stress-ng branch from 557edd8 to 43818f6 Compare December 17, 2025 19:06
Processes may be running in parallel. This can make it difficult to correlate the exit code error message with the process output logs above.

Including the process id in the error message makes it easier to know which command output is relevant to the process with a unexpected exit code.
@LiliDeng LiliDeng requested a review from anirudhrb January 4, 2026 07:41
anirudhrb
anirudhrb previously approved these changes Jan 4, 2026
@LiliDeng LiliDeng marked this pull request as ready for review January 5, 2026 01:44
@LiliDeng LiliDeng self-requested a review as a code owner January 5, 2026 01:44
@LiliDeng
Copy link
Collaborator

LiliDeng commented Jan 5, 2026

@kamalca do you plan to do the further change for this PR?

Collect all the failures from the environment instead of returning only the first error. This will give information on how many nodes in the environment failed and will give the original failure as well as kernel panic error if there is one.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants