fix(consumers): prevent await deadlock on ContextCallable failure#2941
fix(consumers): prevent await deadlock on ContextCallable failure#2941imbajin merged 5 commits intoapache:masterfrom
Conversation
|
Hi, thanks for your PR, maybe we need at least 2 tests to validate the modification |
There was a problem hiding this comment.
Pull request overview
This PR fixes a potential deadlock issue in the Consumers class where await() could hang indefinitely if a ContextCallable fails before entering the runAndDone() method. The fix ensures that the countdown latch is always decremented, even when failures occur during worker thread initialization.
Changes:
- Introduced
safeRun()wrapper method to guaranteelatch.countDown()is called in all failure scenarios - Modified worker submission to use
safeRun()instead of directly submittingContextCallable - Moved latch countdown from
runAndDone()tosafeRun()to prevent double-counting while ensuring it always executes
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Pushed an update to this PR on branch Main changes
Verification
Thanks! |
|
Side note: there is another |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
hugegraph-server/hugegraph-core/src/main/java/org/apache/hugegraph/util/Consumers.java
Outdated
Show resolved
Hide resolved
hugegraph-server/hugegraph-core/src/main/java/org/apache/hugegraph/util/Consumers.java
Outdated
Show resolved
Hide resolved
hugegraph-server/hugegraph-core/src/main/java/org/apache/hugegraph/util/Consumers.java
Outdated
Show resolved
Hide resolved
hugegraph-server/hugegraph-core/src/main/java/org/apache/hugegraph/util/Consumers.java
Outdated
Show resolved
Hide resolved
hugegraph-server/hugegraph-core/src/main/java/org/apache/hugegraph/util/Consumers.java
Outdated
Show resolved
Hide resolved
hugegraph-server/hugegraph-test/src/main/java/org/apache/hugegraph/unit/util/ConsumersTest.java
Show resolved
Hide resolved
| } | ||
| } | ||
|
|
||
| @Test(timeout = 5000) |
There was a problem hiding this comment.
🧹 Minor: The test doesn't verify the specific deadlock scenario mentioned in the PR description - where ContextCallable fails before entering runAndDone().
Consider adding a test case that explicitly simulates the failure scenario, for example:
@Test(timeout = 1000)
public void testAwaitDoesNotHangWhenContextCallableFails() throws Throwable {
// Test that simulates ContextCallable constructor/call() failure
// before runAndDone() is entered
}
No need to update the another one (but we could merge them into one class/method reference) |
9e52f92 to
260e4ea
Compare
I’ll work on the test for where ContextCallable fails before entering runAndDone(). |
…ilure Add a unit test that explicitly covers the failure scenario described in the PR, where ContextCallable fails before entering runAndDone(). The test verifies that Consumers.await() does not hang when the worker task fails during ContextCallable execution, relying on safeRun() to always decrement the latch in its finally block. This test would deadlock on the previous implementation and passes with the current fix, ensuring the issue cannot regress.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #2941 +/- ##
============================================
- Coverage 39.38% 1.57% -37.82%
+ Complexity 456 43 -413
============================================
Files 812 779 -33
Lines 68660 65018 -3642
Branches 8968 8332 -636
============================================
- Hits 27044 1026 -26018
- Misses 38824 63906 +25082
+ Partials 2792 86 -2706 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
hugegraph-server/hugegraph-test/src/main/java/org/apache/hugegraph/unit/util/ConsumersTest.java
Show resolved
Hide resolved
|
Thanks for the review. I’ve updated the implementation and All related unit tests pass locally. |
When you said “no need to update the another one (but we could merge them into one class/method reference)”, did you mean that this PR should stay limited to the current Consumers, and that any fix for the other implementation should be handled separately possibly as a later refactor to share the common logic? |
Exactly! You got it. We want to avoid code duplication across the project—ideally, these utilities should only exist in one place. If they need to be shared(across multi-modules), we can move them to |
Purpose of the PR
Main Changes
Submit safeRun() instead of directly submitting ContextCallable
Add safeRun() wrapper to always call latch.countDown() in finally
Remove latch.countDown() from runAndDone() to avoid double-counting
Verifying these changes
Does this PR potentially affect the following parts?
Documentation Status
Doc - TODODoc - DoneDoc - No Need