Call ResetTimer synchronously from eol and skip the Tick when channel is null#226
Conversation
|
@jglick Could you please review? (I tried to apply a fix with a minimal impact to not break anything.) |
jglick
left a comment
There was a problem hiding this comment.
OK I guess? I have no memory of writing that comment. Does it seem to work?
| @Override | ||
| public OutputStream decorateLogger(@SuppressWarnings("rawtypes") Run build, final OutputStream logger) | ||
| throws IOException, InterruptedException { | ||
| // TODO if channel == null, we can safely ResetTimer.call synchronously from eol and skip the Tick |
Flakiness gone with this change. All other tests works. Just in case I also tested manually the case described in https://issues.jenkins.io/browse/JENKINS-54078?focusedCommentId=351432&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-351432. So yes, It seem to work. |
|
After upgrading to v991 (which includes this PR) we're now facing cpu pressure and eventually Jenkins POD is being terminated.
JavaMelody__jenkins-0_8_29_22.pdf Any suggestions? Many thanks! |
|
Maybe worth adding, the jobs run with a timeout of 2h, so if I got this right the reset call before happened only every 1h and would now happen on each log line. |
via diff --git src/test/java/org/jenkinsci/plugins/workflow/steps/TimeoutStepTest.java src/test/java/org/jenkinsci/plugins/workflow/steps/TimeoutStepTest.java
index f18f476..faea2f4 100644
--- src/test/java/org/jenkinsci/plugins/workflow/steps/TimeoutStepTest.java
+++ src/test/java/org/jenkinsci/plugins/workflow/steps/TimeoutStepTest.java
@@ -219,6 +219,7 @@ public class TimeoutStepTest {
WorkflowRun b = p.scheduleBuild2(0).getStartCondition().get();
SemaphoreStep.waitForStart("restarted/1", b);
});
+ Thread.sleep(10_000);
sessions.then(j -> {
WorkflowJob p = j.jenkins.getItemByFullName("restarted", WorkflowJob.class);
WorkflowRun b = p.getBuildByNumber(1); |
|
@chwehrli @tarioch thanks for the report and sorry for any inconvenience. I believe #234 should fix the performance issue as well as address the original problem motivating this PR. If you want to test prior to release, that would be great; just download a |
|
Thank you very much for that ultra-fast response @jglick ! |
|
Thanks. Please use #234 for further comments. |
Cloudbees CI reported a flakiness in TimeoutStepExecutionTest#activityRestart.
Logs:
activityRestart_stacktrace.txt
activityRestart_stderr.txt
The issue could be reproduced locally, (test crashes quite often), when after restart timeout set to expire in less then 7.5 sec the test fails.
From what I see the reason of the flake is the race condition between Tick (which gets scheduled with timeout / 2, in our particular case 7.5 sec) and Killer's delay (which is after restart in our particular case was 3.3 sec). When
delay < timeout / 2no reset happens and TimeoutStepExecution cancelled the rest of the pipeline after delay.There is a
TODO if channel == null, we can safely ResetTimer.call synchronously from eol and skip the Tick, if we will follow this recommendation we could reset timer synchronously instead of scheduling it intimeout / 2time, which in itself will fix a flakiness.