Add job to cancel workflow on failure #139

glimchb · 2026-01-22T23:21:50Z

Since HPE rdma tests are quickest (only 15-20 min) if they fail, then fail the entire workflow run, all jobs
to save some free tier runners

Since HPE rdma tests are quickest (only 15-20 min) if they fail, then fail the entire workflow run, all jobs to save some free tier runners

mhae

Looks good to me.

mhae

Actually, I remembered that this probably not going to work.

See https://github.com/orgs/community/discussions/26311

I tried implementing the cancel early on when I did the runners and didn't get it to work.

glimchb · 2026-01-26T21:57:23Z

Actually, I remembered that this probably not going to work.

See https://github.com/orgs/community/discussions/26311

I tried implementing the cancel early on when I did the runners and didn't get it to work.

we cancelling the jobs like this today, see https://github.com/spdk/spdk-ci/blob/main/.github/scripts/parse_gerrit_webhook.sh
@mhae can you specify why you think it is not going to work here ?

karlatec · 2026-01-28T08:37:33Z

.github/workflows/nvmf-rdma.yml

      with:
        path: ./output
        name: hpe-job-nvmf-rdma
+    - name: Cancel all jobs enitre workflow run if this job failed


Shouldn't the same be added to common tests workflow? In this one common tests part failed after a minute or so, but HPE runner was still busy anyway.

It rarely happens. Since how runner only executes a single test. It was true only when how was offline for a bit…

Rarely, but still - why keep it busy in such a case? Honestly it was the first failed build I clicked on, so maybe it happens not as rarely :)

tomzawadzki · 2026-02-09T12:25:22Z

.github/workflows/nvmf-rdma.yml

+    - name: Cancel all jobs enitre workflow run if this job failed
+      if: failure()
+      env:
+        GH_TOKEN: ${{ github.token }}


Passing the token to a job that is 1) executing scripts from SPDK repo and 2) running on self-hosted runner, allows either to use that token.

Please correct me here, but my understanding is that passing the token or secrets to such job is always risky and should be avoided.

Correct way to add this cancelation would be to add a job on GH hosted runner instead of a step, that could get that token. Considering the purpose of this patch is to decrease time to cancel, I'll leave it up to you if you think it is worth do queue up another job.

tomzawadzki · 2026-02-09T12:27:22Z

Actually, I remembered that this probably not going to work.
See https://github.com/orgs/community/discussions/26311
I tried implementing the cancel early on when I did the runners and didn't get it to work.

we cancelling the jobs like this today, see https://github.com/spdk/spdk-ci/blob/main/.github/scripts/parse_gerrit_webhook.sh @mhae can you specify why you think it is not going to work here ?

@mhae I'm still unsure which part of the linked discussion was concerning, could you clarify ?

mhae · 2026-02-09T15:16:36Z

Actually, I remembered that this probably not going to work.
See https://github.com/orgs/community/discussions/26311
I tried implementing the cancel early on when I did the runners and didn't get it to work.

we cancelling the jobs like this today, see https://github.com/spdk/spdk-ci/blob/main/.github/scripts/parse_gerrit_webhook.sh @mhae can you specify why you think it is not going to work here ?

@mhae I'm still unsure which part of the linked discussion was concerning, could you clarify ?

My main concern is whether forwarding a cancel to a currently running job works correctly. I had some issues early on where not all parts of a job running on the self-hosted runner were canceled.

Reading through the description again, it seems we're not doing a cancel of the job of the self-hosted runner, is this correct?
If yes, then I'm less concerend.

tomzawadzki · 2026-02-10T08:28:56Z

My main concern is whether forwarding a cancel to a currently running job works correctly. I had some issues early on where not all parts of a job running on the self-hosted runner were canceled.

That's correct. This PR is intended to cancel other jobs from the workflow, which at this time is only github-hosted runners. Unfortunately the cancels we already have in parse_gerrit_webhook.sh won't help to alleviate any of those concerns, since it cancels the only running job in workflow from that job.

Besides the concern here I think it might be worth giving it a shot, if it handles cancelation of all the running and pending jobs from common_tests.

mhae

Removing my -1

Add job to cancel workflow on failure

22d57d3

Since HPE rdma tests are quickest (only 15-20 min) if they fail, then fail the entire workflow run, all jobs to save some free tier runners

glimchb requested review from karlatec, mhae and tomzawadzki January 22, 2026 23:21

mhae approved these changes Jan 23, 2026

View reviewed changes

mhae requested changes Jan 23, 2026

View reviewed changes

karlatec suggested changes Jan 28, 2026

View reviewed changes

glimchb added the enhancement New feature or request label Feb 7, 2026

tomzawadzki requested changes Feb 9, 2026

View reviewed changes

mhae reviewed Feb 10, 2026

View reviewed changes

mhae approved these changes Feb 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add job to cancel workflow on failure #139

Add job to cancel workflow on failure #139

glimchb commented Jan 22, 2026

Uh oh!

mhae left a comment

Uh oh!

mhae left a comment

Uh oh!

glimchb commented Jan 26, 2026

Uh oh!

karlatec Jan 28, 2026

Uh oh!

glimchb Jan 28, 2026

Uh oh!

karlatec Jan 28, 2026

Uh oh!

tomzawadzki Feb 9, 2026 •

edited

Loading

Uh oh!

tomzawadzki commented Feb 9, 2026

Uh oh!

mhae commented Feb 9, 2026

Uh oh!

tomzawadzki commented Feb 10, 2026

Uh oh!

mhae left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Add job to cancel workflow on failure #139

Are you sure you want to change the base?

Add job to cancel workflow on failure #139

Conversation

glimchb commented Jan 22, 2026

Uh oh!

mhae left a comment

Choose a reason for hiding this comment

Uh oh!

mhae left a comment

Choose a reason for hiding this comment

Uh oh!

glimchb commented Jan 26, 2026

Uh oh!

karlatec Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

glimchb Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

karlatec Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

tomzawadzki Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tomzawadzki commented Feb 9, 2026

Uh oh!

mhae commented Feb 9, 2026

Uh oh!

tomzawadzki commented Feb 10, 2026

Uh oh!

mhae left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

tomzawadzki Feb 9, 2026 •

edited

Loading