Added ExitCodesIT to test process exit codes under various conditions#5811
Added ExitCodesIT to test process exit codes under various conditions#5811dlmarion merged 11 commits intoapache:mainfrom
Conversation
| zooCache.get().close(); | ||
| } | ||
| if (zooKeeperOpened.get()) { | ||
| zooSession.get().close(); |
There was a problem hiding this comment.
This needed to be moved to the end because it closes ZooKeeper, which deletes all ephemeral nodes and fires any watchers.
| } // end while | ||
| } catch (Exception e) { | ||
| LOG.error("Unhandled error occurred in Compactor", e); | ||
| } finally { |
There was a problem hiding this comment.
The change here was to remove the finally block. It causes the normal close code to occur even in Exception or Error case.
| assertEquals(1, exitValue); | ||
| } | ||
| } else { | ||
| // TODO: |
There was a problem hiding this comment.
This should be resolved, wasn't quite sure how to fix it.
|
Best viewed using the "Hide Whitespace" option. |
ctubbsii
left a comment
There was a problem hiding this comment.
Neat strategy for testing exit codes... I'm not sure how easy it's going to be to maintain, as it involves a bit of a learning curve to understand how it works. I'm also wondering if it might be fragile, when we make general improvements. Even so, it might be worth it.
core/src/main/java/org/apache/accumulo/core/clientImpl/ClientContext.java
Outdated
Show resolved
Hide resolved
|
|
||
| if (acquiredLock) { | ||
| Halt.halt(-1, | ||
| Halt.halt(1, |
server/base/src/main/java/org/apache/accumulo/server/ServerContext.java
Outdated
Show resolved
Hide resolved
| updateIdleStatus(true); | ||
| final AtomicReference<Throwable> err = new AtomicReference<>(); | ||
| final LogSorter logSorter = new LogSorter(this); | ||
| long nextSortLogsCheckTime = System.currentTimeMillis(); |
There was a problem hiding this comment.
Not for this PR, but I wonder if this needs to use the system clock or can be made to use relative time (nanoTime).
| // Must set shutdown as completed before calling super.close(). | ||
| // super.close() calls ServerContext.close() -> | ||
| // ClientContext.close() -> ZooSession.close() which removes | ||
| // all of the ephemeral nodes and forces the watches to fire. | ||
| getShutdownComplete().set(true); | ||
| log.info("stop requested. exiting ... "); | ||
| try { | ||
| gcLock.unlock(); | ||
| } catch (Exception e) { | ||
| log.warn("Failed to release GarbageCollector lock", e); | ||
| } | ||
|
|
||
| super.close(); |
There was a problem hiding this comment.
Is that something that we always do together? If so, I wonder if it can be done inside the close, or if the getShutdownComplete() stuff is needed at all.
| serverClass = TabletServer.class; | ||
| methodName = "updateIdleStatus"; | ||
| ctorParams = new Class<?>[] {ConfigOpts.class, Function.class, String[].class}; | ||
| ctorArgs = new Object[] {new ConfigOpts(), new ServerContextFunction(), |
There was a problem hiding this comment.
Can't you inline the ServerContextFunction like the server processes normally do, as in:
| ctorArgs = new Object[] {new ConfigOpts(), new ServerContextFunction(), | |
| ctorArgs = new Object[] {new ConfigOpts(), ServerContext::new, |
There was a problem hiding this comment.
I tried this initially, and I get a compilation error that says "The target type of this expression must be a functional interface"
| // Determine the constructor arguments and parameters for each server class. | ||
| // Find a method with no-args that does not return anything that is | ||
| // called during the servers run method that we can intercept to signal | ||
| // shutdown, exception, or error. |
There was a problem hiding this comment.
Good comment... but this is wild 😅
test/src/main/java/org/apache/accumulo/test/functional/ExitCodesIT_SimpleSuite.java
Outdated
Show resolved
Hide resolved
test/src/main/java/org/apache/accumulo/test/functional/ExitCodesIT_SimpleSuite.java
Outdated
Show resolved
Hide resolved
test/src/main/java/org/apache/accumulo/test/functional/ExitCodesIT_SimpleSuite.java
Outdated
Show resolved
Hide resolved
|
I'm seeing an inconsistency in how we handle Exceptions and Errors in the server threads (AbstractServer.run). In all other threads the AccumuloUncaughtExceptionHandler will:
For critical tasks put into a ThreadPool, we have a method that watches them and throws an ExecutionError if the task has failed. This Error is uncaught and would bubble up to the AccumuloUncaugthExceptionHandler, terminating the VM. We made some changes recently to AbstractServer in #5796 and #5808 such that a new method WIth this new method if Looking at one of the logs from ExitCodesIT we see the following: In any other thread, the VM would be halted without closing. I'm thinking that the |
Are you thinking of something like the following? This would only call close when there is no exception in runServer. public static void startServer(AbstractServer server, Logger LOG) throws Exception {
try {
server.runServer();
server.close();
} catch (Exception e) {
System.err
.println(server.getClass().getSimpleName() + " died, exception thrown from runServer.");
e.printStackTrace();
LOG.error("{} died, exception thrown from runServer.", server.getClass().getSimpleName(), e);
throw e;
}
}Or is the call to close() not needed at all in this code because of the comment about |
|
Yeah, I don't think |
| e.printStackTrace(); | ||
| LOG.error("{} died, exception thrown from runServer.", server.getClass().getSimpleName(), e); | ||
| throw e; | ||
| } finally { |
There was a problem hiding this comment.
A bit futher up could change the catch to catch(Exception e) instead of catch(Throwable e). The only reason it was catching Throwable was in case the finally block threw an exception.
There was a problem hiding this comment.
I thought it was for the logging of Error, not just Exception.
There was a problem hiding this comment.
It was for logging an error in the case where finally threw an exception in which case the error would be lost and would not propagate. W/o the finally the error will propagate now. So it depends on what the outer code does with it.
There was a problem hiding this comment.
There was a problem hiding this comment.
I don't think we need to. I checked the logs and the Exception and Error are logged correctly. The issue in #5796 was an exception in the try-with-resources was being suppressed when an exception happened in the close. try-with-resources is no longer being used.
|
|
||
| context.getLowMemoryDetector().logGCInfo(getConfiguration()); | ||
|
|
||
| // Must set shutdown as completed before calling super.close(). |
There was a problem hiding this comment.
This comment mentions calling super.close(), but do not see that.
| } | ||
| continue; | ||
| } finally { | ||
| currentCompactionId.set(null); |
There was a problem hiding this comment.
Does not seem like this should be cleared here because its used for dead compaction detection and it should be set for the entire time that a compactor is running a compaction.
| LOG.warn("Failed to close filesystem : {}", e.getMessage(), e); | ||
| } | ||
|
|
||
| super.close(); |
There was a problem hiding this comment.
Seems like the the new pattern in this PR is that super.close() is always called at the end of each run() method. Wondering if would make sense to push this into AbstractServer like the following. This would centralize the pattern to one place in the code.
public void runServer() throws Exception {
final AtomicReference<Throwable> err = new AtomicReference<>();
serverThread = new Thread(TraceUtil.wrap(()->{
this.run();
close();
}), applicationName);| compactorArgs.add("-o"); | ||
| compactorArgs.add(Property.COMPACTOR_GROUP_NAME.getKey() + "=TEST"); | ||
| serverClass = Compactor.class; | ||
| methodName = "updateIdleStatus"; |
There was a problem hiding this comment.
What happens w/ the test if this method does not exists? Does the test fail?
Seems like maybe it would fail because the SHUTDOWN case would expect an exit code of zero but would see some other exit code. But not sure.
Wondering if instead of this run time overriding if it would be possible to do it at compile time instead w/ something like the following. Then if someone changes a method name they will get a compile time error.
class ExitCompactor extends Compactor {
private final TerminalBehavior terminalBehavior;
protected ExitCompactor(ConfigOpts opts, String[] args, TerminalBehavior terminalBehavior) {
super(opts, args);
this.terminalBehavior = terminalBehavior;
}
@Override
public void updateIdleStatus(boolean idle){
switch (terminalBehavior){
case SHUTDOWN:
super.requestShutdownForTests();
break;
case ERROR:
throw new StackOverflowError();
case EXCEPTION:
throw new RuntimeException();
}
}
}There was a problem hiding this comment.
What happens w/ the test if this method does not exists? Does the test fail?
Seems like maybe it would fail because the SHUTDOWN case would expect an exit code of zero but would see some other exit code. But not sure.
I can make it fail explicity by adding the following to ProcessProxy.main:
diff --git a/test/src/main/java/org/apache/accumulo/test/functional/ExitCodesIT.java b/test/src/main/java/org/apache/accumulo/test/functional/ExitCodesIT.java
index 4f8f948d26..c17a30ab6e 100644
--- a/test/src/main/java/org/apache/accumulo/test/functional/ExitCodesIT.java
+++ b/test/src/main/java/org/apache/accumulo/test/functional/ExitCodesIT.java
@@ -155,6 +155,15 @@ public class ExitCodesIT extends SharedMiniClusterBase {
throw new UnsupportedOperationException(st + " is not currently supported");
}
+ // Check that methodName exists on serverClass
+ try {
+ @SuppressWarnings("unused")
+ var ignored = serverClass.getDeclaredMethod(methodName);
+ } catch (NoSuchMethodException nsme) {
+ nsme.printStackTrace();
+ System.exit(42);
+ }
+
Wondering if instead of this run time overriding if it would be possible to do it at compile time instead w/ something like the following. Then if someone changes a method name they will get a compile time error.
I think I could make something that is functionally equivalent without using ByteBuddy, but it would be more code. I started with ByteBuddy initially because I wasn't sure of the design when I started, but I knew that I basically needed a proxy for each server class.
There was a problem hiding this comment.
I think I could make something that is functionally equivalent without using ByteBuddy, but it would be more code.
If its not much more code then there are benefits. Can fully navigate and manipulate the code in an IDE and easily see what the test references. Also if a method is renamed will get a quick compile time failure instead of this test failing and then trying to figure out why.
There was a problem hiding this comment.
Removed ByteBuddy and subclassed server processes in ec5cd1d
| // We need to let this time out and then | ||
| // terminate the process. | ||
| IllegalStateException ise = assertThrows(IllegalStateException.class, | ||
| () -> Wait.waitFor(() -> !pi.getProcess().isAlive(), 120_000)); |
There was a problem hiding this comment.
Does this always wait for 2 mins? If so, could we lower time to wait to 30s or 60s?
There was a problem hiding this comment.
I can change to 1m. The other two tests for the GC take 30-35s
No description provided.