Better fix to [JENKINS-25519] -- avoid errors when trying to read exitStatus from durable task before finished writing#66
Conversation
|
@reviewbybees |
| FilePath logFile = c.getLogFile(ws); | ||
| FilePath resultFile = c.getResultFile(ws); | ||
| if (resultFile.exists()) { | ||
| resultFile.delete(); // Maybe overly cautious, but better safe than sorry, similarly we should make sure no duplicate logfile? |
There was a problem hiding this comment.
Well, the whole control directory should not exist before we start, or should be empty.
| return controlDir(ws).child("pid"); | ||
| } | ||
|
|
||
| // TODO run as one big MasterToSlaveCallable<Integer> to avoid extra network roundtrips |
| listener.getLogger().println("still have " + pidFile + " so heartbeat checks unreliable; process may or may not be alive"); | ||
| } else { | ||
| listener.getLogger().println("wrapper script does not seem to be touching the log file in " + controlDir); | ||
| listener.getLogger().println("s " + controlDir); |
There was a problem hiding this comment.
Erm, I have no idea where that came from. Best guess It was a hotkeying error somewhere, but fixing now.
| } catch (NumberFormatException x) { | ||
| throw new IOException("corrupted content in " + status + ": " + x, x); | ||
| /** Avoids excess round-tripping when reading status file. */ | ||
| static class StatusCheck extends MasterToSlaveFileCallable<Integer> { |
There was a problem hiding this comment.
This is just gratuitous merge conflict creation against #60. Better to keep the patch short and to the point.
There was a problem hiding this comment.
We have one particular user who has longer latencies and this change likely helps keep them under the timeouts - thus it is hardly "gratuitous."
| public Integer invoke(File f, VirtualChannel channel) throws IOException, InterruptedException { | ||
| if (f.exists() && f.length() > 0) { | ||
| try { | ||
| String fileString = Files.readFirstLine(f, Charset.defaultCharset()); |
There was a problem hiding this comment.
Anyway you could just get rid of the exists check and use a single readToString call, catching FileNotFoundException and ignoring.
There was a problem hiding this comment.
If it's all agent-local then it doesn't really matter, does it? Also: as a rule of thumb in Java one should not be throwing Exceptions to signal routine and expected conditions. They come with a higher than normal level of baggage.
As opposed to Python where that's the idiomatic way to signal things like an Iterator being done.
There was a problem hiding this comment.
Well, packaging all of this up in a Remoting call is a pretty high level of baggage, which would swamp any tiny overhead of Throwable.fillInStackTrace in this case.
| } | ||
| } | ||
|
|
||
| static final StatusCheck STATUS_CHECK_INSTANCE = new StatusCheck(); |
There was a problem hiding this comment.
You know act is doing object allocation anyway, right?
| String quotedResultFile = quote(c.getResultFile(ws)); | ||
| if (capturingOutput) { | ||
| cmd = String.format("@echo off \r\ncmd /c \"\"%s\"\" > \"%s\" 2> \"%s\"\r\necho %%ERRORLEVEL%% > \"%s\"\r\n", | ||
| cmd = String.format("@echo off \r\ncmd /c \"\"%s\"\" > \"%s\" 2> \"%s\"\r\necho %%ERRORLEVEL%% > \"%s.tmp\"\r\nmove \"%s.tmp\" \"%s\"\r\n", |
| public Integer invoke(File f, VirtualChannel channel) throws IOException, InterruptedException { | ||
| if (f.exists() && f.length() > 0) { | ||
| try { | ||
| String fileString = Files.readFirstLine(f, Charset.defaultCharset()); |
There was a problem hiding this comment.
Minor note: use of defaultCharset could theoretically cause issues on z/OS. Since we are expecting this file to be ASCII (it should in fact just contain [0-9]+), it is safer and clearer to use StandardCharsets.ASCII.
There was a problem hiding this comment.
A really critical point - it looks like this is already broken, per #28 -- I think we'd need to address it in more comprehensive work to get it to play nicely in any case.
…able hang bugs on windows
Superior version of #65
Proposed solution to error like this, encountered when statusCode file exists but is empty, potentially when created but not written to yet, or not written fully. Was confirmed that previous version removed the errors with the "corrupt" status file content and greatly reduced failure rates.
Variant of JENKINS-25519