Skip to content

Conversation

@glimchb
Copy link
Contributor

@glimchb glimchb commented Jan 22, 2026

Since HPE rdma tests are quickest (only 15-20 min) if they fail, then fail the entire workflow run, all jobs
to save some free tier runners

Since HPE rdma tests are quickest (only 15-20 min) if they fail,
then fail the entire workflow run, all jobs
to save some free tier runners
Copy link
Contributor

@mhae mhae left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

Copy link
Contributor

@mhae mhae left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I remembered that this probably not going to work.

See https://github.com/orgs/community/discussions/26311

I tried implementing the cancel early on when I did the runners and didn't get it to work.

@glimchb
Copy link
Contributor Author

glimchb commented Jan 26, 2026

Actually, I remembered that this probably not going to work.

See https://github.com/orgs/community/discussions/26311

I tried implementing the cancel early on when I did the runners and didn't get it to work.

we cancelling the jobs like this today, see https://github.com/spdk/spdk-ci/blob/main/.github/scripts/parse_gerrit_webhook.sh
@mhae can you specify why you think it is not going to work here ?

with:
path: ./output
name: hpe-job-nvmf-rdma
- name: Cancel all jobs enitre workflow run if this job failed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the same be added to common tests workflow? In this one common tests part failed after a minute or so, but HPE runner was still busy anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It rarely happens. Since how runner only executes a single test. It was true only when how was offline for a bit…

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rarely, but still - why keep it busy in such a case? Honestly it was the first failed build I clicked on, so maybe it happens not as rarely :)

@glimchb glimchb added the enhancement New feature or request label Feb 7, 2026
- name: Cancel all jobs enitre workflow run if this job failed
if: failure()
env:
GH_TOKEN: ${{ github.token }}
Copy link
Contributor

@tomzawadzki tomzawadzki Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing the token to a job that is 1) executing scripts from SPDK repo and 2) running on self-hosted runner, allows either to use that token.

Please correct me here, but my understanding is that passing the token or secrets to such job is always risky and should be avoided.

Correct way to add this cancelation would be to add a job on GH hosted runner instead of a step, that could get that token. Considering the purpose of this patch is to decrease time to cancel, I'll leave it up to you if you think it is worth do queue up another job.

@tomzawadzki
Copy link
Contributor

Actually, I remembered that this probably not going to work.
See https://github.com/orgs/community/discussions/26311
I tried implementing the cancel early on when I did the runners and didn't get it to work.

we cancelling the jobs like this today, see https://github.com/spdk/spdk-ci/blob/main/.github/scripts/parse_gerrit_webhook.sh @mhae can you specify why you think it is not going to work here ?

@mhae I'm still unsure which part of the linked discussion was concerning, could you clarify ?

@mhae
Copy link
Contributor

mhae commented Feb 9, 2026

Actually, I remembered that this probably not going to work.
See https://github.com/orgs/community/discussions/26311
I tried implementing the cancel early on when I did the runners and didn't get it to work.

we cancelling the jobs like this today, see https://github.com/spdk/spdk-ci/blob/main/.github/scripts/parse_gerrit_webhook.sh @mhae can you specify why you think it is not going to work here ?

@mhae I'm still unsure which part of the linked discussion was concerning, could you clarify ?

Actually, I remembered that this probably not going to work.
See https://github.com/orgs/community/discussions/26311
I tried implementing the cancel early on when I did the runners and didn't get it to work.

we cancelling the jobs like this today, see https://github.com/spdk/spdk-ci/blob/main/.github/scripts/parse_gerrit_webhook.sh @mhae can you specify why you think it is not going to work here ?

@mhae I'm still unsure which part of the linked discussion was concerning, could you clarify ?

My main concern is whether forwarding a cancel to a currently running job works correctly. I had some issues early on where not all parts of a job running on the self-hosted runner were canceled.

Reading through the description again, it seems we're not doing a cancel of the job of the self-hosted runner, is this correct?
If yes, then I'm less concerend.

@tomzawadzki
Copy link
Contributor

My main concern is whether forwarding a cancel to a currently running job works correctly. I had some issues early on where not all parts of a job running on the self-hosted runner were canceled.

That's correct. This PR is intended to cancel other jobs from the workflow, which at this time is only github-hosted runners. Unfortunately the cancels we already have in parse_gerrit_webhook.sh won't help to alleviate any of those concerns, since it cancels the only running job in workflow from that job.

Besides the concern here I think it might be worth giving it a shot, if it handles cancelation of all the running and pending jobs from common_tests.

Copy link
Contributor

@mhae mhae left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing my -1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants