Fixed issue with exception in server run method not being logged#5796
Fixed issue with exception in server run method not being logged#5796dlmarion merged 5 commits intoapache:mainfrom
Conversation
The use of try-with-resources in the server process' main method causes the server's close method to always be called. When an exception occurs in the runServer method it may not be logged because the invocation of the close method consumes enough time that the lock watcher is fired which ends up halting the JVM. Closes apache#5794
|
I found this issue because I started 4.0 software on a 2.1.4 instance without running the The above was never logged, instead the only thing that I saw in the log was the fact that the server's lock was gone and that the JVM was being halted. AbstractServer.close was an empty method until #5726, which was included as part of the 2.1.4 release. |
|
If an exception is thrown in the
Is this due to the ZK lock being lost by closing the client context? I'm curious that one logger is able to print a message showing the lock is gone, but the exception message wasn't flushed. I thought try-with-resources will append any exception in the In general, if we don't care about any cleanup actions being called in the Could the exception be logged in |
I don't think so. ServerContext.close does call close on the shared conditional writers for the user and metadata tables, but I think if an exception is thrown from server's run method, then we want to exit as quickly as possible.
It's due to close being called and the JVM halting due to the lock watcher firing before the code in
Yes, we could do that. I did do that and I saw the logging show up. The place where the logging is done currently is in |
ctubbsii
left a comment
There was a problem hiding this comment.
It seems to me that this fix is working only because we are bypassing the cleanup in the close method, so it never triggers the Halt in the background thread, and instead exits abnormally through the main method with an exception and non-zero exit code (the usual way Java processes abnormally terminate).
But the trade off is that we entirely skip any attempt to clean up any resources in use by ZooKeeper, ServerContext, AbstractServer, etc., in their close methods.
If we are resigned to do that, then why bother implementing any close methods at all? And not even bother to make them AutoCloseable. If the only way to exit abnormally is from the background thread that ends up calling Halt, we could just do that every time, without bothering to write or maintain any cleanup close methods.
The quick fix is to log the message first and then try to clean up in the close method... which will trigger the halt.
A more complicated fix would be to track that the main method is about to exit through some atomic state variable, and then to skip any attempt to halt in the background thread... but I'm not sure if that's a good idea.
This also makes me question whether we can even get graceful shutdowns at all with a zero exit code. If the close method is what is triggering the eventual Halt, then it's possible that the JVM will halt during cleanup, even when runServer exits without an exception (such as due to a graceful shutdown), and then the process exits with a non-zero exit code even though it was intended to be a graceful shutdown.
Yeah, that's fair. I can modify this PR to do just that. |
ctubbsii
left a comment
There was a problem hiding this comment.
It seems like there it would be easy to write a small final method on the abstract server base class to call that would then call runServer and do this logging, so there's less copy pasted boilerplate code.
I'm not sure how best to handle the non zero exit code. Normally, it's bad form to call System.exit, and we'd normally want to just let the exception propagate, but in this case, that's not happening always because of the background thread doing the halting. Still, if the background thread doesn't halt first, we don't want to exit with a zero exit code here. Re throwing the logged exception would be fine, but may result in redundant messages to STDOUT. That's probably fine though.
I'm still not sure how the try with resources affects a graceful shutdown... it seems likely to also cause a halt, but this PR doesn't have to address that right now.
| System.err.println("Compactor died, exception thrown from runServer."); | ||
| e.printStackTrace(); | ||
| LOG.error("Compactor died, exception thrown from runServer.", e); | ||
| } |
There was a problem hiding this comment.
This is fine but causes it to exit with a normal exit code, 0. Normally an exception being thrown through the main class would cause a non-zero exit code.
ctubbsii
left a comment
There was a problem hiding this comment.
Good: The consolidation is good, and it rethrows in case of an exception so the exit code is always correct.
Neutral: This doesn't use try-with-resources, but it could instead of the finally block. I think the only difference is the log message that says that the exception happened in the close, but I don't think that difference in log messages is all that critical.
Bad: In the case of a clean exit, for a graceful shutdown, I think the background thread could still halt the process and produce an exception and non-zero exit code, but as I've mentioned, it's okay if this PR doesn't try to address that specific case.
The lock watcher may halt the process, but would do so with a The last thing that happens is unlocking the lock, which may end up causing the ServiceLockWatcher to fire before the JVM ends. However, it uses a |
Follow up to apache#5796 that attempts to handle the following case. 1. server.runServer() throws an error 2. server.close() throws en exception When the above happens the error will never be logged and its also not attached to the exception thrown in the finally block by server.close(). Therefore the error is lost and there is no indication it happeneded. With these changes the exception thrown by server.close() will be attached to the error thrown by server.runServer().
Follow up to #5796 that attempts to handle the following case. 1. server.runServer() throws an error 2. server.close() throws en exception or error When the above happens the error will never be logged and its also not attached to the exception thrown in the finally block by server.close(). Therefore the error is lost and there is no indication it happened. Changed the code to catch throwable to avoid loss of information in this case. Co-authored-by: Dave Marion <dlmarion@apache.org>
The use of try-with-resources in the server process' main method causes the server's close method to always be called. When an exception occurs in the runServer method it may not be logged because the invocation of the close method consumes enough time that the lock watcher is fired which ends up halting the JVM.
Closes #5794