fn: Remove ctx from GoroutineManager constructor#9341
fn: Remove ctx from GoroutineManager constructor#9341guggero merged 2 commits intolightningnetwork:masterfrom
Conversation
|
Important Review skippedAuto reviews are limited to specific labels. 🏷️ Labels to auto review (1)
Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
|
cc @starius |
starius
left a comment
There was a problem hiding this comment.
I like the change in API 🎉
IIUC this change would double the number of goroutines. I proposed one approach how to solve this.
fn/goroutine_manager.go
Outdated
|
|
||
| // Calling wg.Add(1) and wg.Wait() when wg's counter is 0 is a race | ||
| // condition, since it is not clear should Wait() block or not. This | ||
| // condition, since it is not clear if should Wait() block or not. This |
There was a problem hiding this comment.
nit: s/if should Wait() block/if Wait() should block/
| // Go call starts running here after acquiring the mutex, it | ||
| // would see that the context has expired and return false | ||
| // instead of calling wg.Add(1). | ||
| g.wg.Wait() |
There was a problem hiding this comment.
This update changes the behavior of Stop(). Previously, if one goroutine called Stop() and another goroutine invoked Stop() concurrently, the second call would block, waiting for the first call to complete. Now, the second Stop() call returns immediately.
There was a problem hiding this comment.
is that a problem? is there a usecase for needing to support the blocking behaviour? afaiu, this will mostly be used within other sub-systems and Stop will be called by their Stop methods which typically will also only be called once
There was a problem hiding this comment.
In the previous version Stop() actually did StopAndWait(). If we support multiple Stop() calls, I think they should behave the same way. If someone calls Stop() from multiple places, they are likely to expect that both calls block until all goroutines finish, not only the first call.
There was a problem hiding this comment.
I just dont think more than 1 system will ever own the GoroutineManager right?
ie, what makes this Stop different from other Stop methods of other subsystems in LND which use a similar pattern to this?
There was a problem hiding this comment.
I just dont think more than 1 system will ever own the GoroutineManager right?
I think, the intuition of Stop() method is that after Stop() call (successfully) returning, all the workers are down. If the second Stop() just returns immediately, this intuition is broken.
Can we panic or return error if a second Stop is detected?
There was a problem hiding this comment.
i see there is a Done() method - we could always update this to return a dedicated shutdown channel that is only closed after Wait() in Stop
There was a problem hiding this comment.
@bhandras - perhaps a 3rd opinion here just to break the tie would help? 🙏
There was a problem hiding this comment.
Option 3 sgtm! (Done with shutdown chan)
There was a problem hiding this comment.
cool - pushed a diff with option 3 included 🫡
There was a problem hiding this comment.
ah - just realised we cannot do this since callers may wait on Done() inside a goroutine that was started within Go().
Undoing this change
0f1388a to
8c19ec0
Compare
ellemouton
left a comment
There was a problem hiding this comment.
Thanks @starius - updated as per your suggestion.
I've also removed the ContextWithQuit commit as i've realised that I can just
adapt the existing fn.ContextGuard instead
| for _, cancel := range g.cancelFns { | ||
| cancel() | ||
| } |
There was a problem hiding this comment.
I propose to clear the map here, to free memory.
There was a problem hiding this comment.
can do, although i dont think it is necessary since this is not a restartable system. This will only be called when the system is shutting down right? if we set the map to nil here, then you could argue that in every Stop method we have in LND, we should go and set each object to nil.
Interested to hear what a second reviewer thinks too. But yeah, can defs add a g.cancelFns = nil line but cant do delete(g.cancelFns, id) since deleting from a map while iterating over it is not safe i think
| // Go call starts running here after acquiring the mutex, it | ||
| // would see that the context has expired and return false | ||
| // instead of calling wg.Add(1). | ||
| g.wg.Wait() |
There was a problem hiding this comment.
In the previous version Stop() actually did StopAndWait(). If we support multiple Stop() calls, I think they should behave the same way. If someone calls Stop() from multiple places, they are likely to expect that both calls block until all goroutines finish, not only the first call.
| // task exiting. It attempts to catch a race condition between wg.Done() and | ||
| // wg.Wait() calls. According to documentation of wg.Wait() this is acceptable, | ||
| // therefore this test passes even with -race. | ||
| func TestGoroutineManagerStopsStress(t *testing.T) { |
There was a problem hiding this comment.
We should decide what to do with multiple Stop() calls. If we allow them, we should keep this test to make sure gm doesn't crash in this situation. If we don't allow them, we should panic or return error from Stop is called more than once.
There was a problem hiding this comment.
see the other comment re supporting multiple stop calls
8c19ec0 to
6825bde
Compare
ellemouton
left a comment
There was a problem hiding this comment.
Thanks for the review @starius - addressed and replied! Will help to get a second opinion regarding your points for the Stop method
| for _, cancel := range g.cancelFns { | ||
| cancel() | ||
| } |
There was a problem hiding this comment.
can do, although i dont think it is necessary since this is not a restartable system. This will only be called when the system is shutting down right? if we set the map to nil here, then you could argue that in every Stop method we have in LND, we should go and set each object to nil.
Interested to hear what a second reviewer thinks too. But yeah, can defs add a g.cancelFns = nil line but cant do delete(g.cancelFns, id) since deleting from a map while iterating over it is not safe i think
| // Go call starts running here after acquiring the mutex, it | ||
| // would see that the context has expired and return false | ||
| // instead of calling wg.Add(1). | ||
| g.wg.Wait() |
There was a problem hiding this comment.
I just dont think more than 1 system will ever own the GoroutineManager right?
ie, what makes this Stop different from other Stop methods of other subsystems in LND which use a similar pattern to this?
| // task exiting. It attempts to catch a race condition between wg.Done() and | ||
| // wg.Wait() calls. According to documentation of wg.Wait() this is acceptable, | ||
| // therefore this test passes even with -race. | ||
| func TestGoroutineManagerStopsStress(t *testing.T) { |
There was a problem hiding this comment.
see the other comment re supporting multiple stop calls
6825bde to
bb3442f
Compare
ellemouton
left a comment
There was a problem hiding this comment.
Dankie @bhandras !
| // Go call starts running here after acquiring the mutex, it | ||
| // would see that the context has expired and return false | ||
| // instead of calling wg.Add(1). | ||
| g.wg.Wait() |
There was a problem hiding this comment.
but why will there ever be a second Stop call is what im getting at. Look at all the instances of ) Stop() { in the codebase
| // Go call starts running here after acquiring the mutex, it | ||
| // would see that the context has expired and return false | ||
| // instead of calling wg.Add(1). | ||
| g.wg.Wait() |
There was a problem hiding this comment.
I think that if we really want to have a method that explicitly blocks and does so for multiple routines, then we should have an explicit Done() <-chan struct method that does this
| // Go call starts running here after acquiring the mutex, it | ||
| // would see that the context has expired and return false | ||
| // instead of calling wg.Add(1). | ||
| g.wg.Wait() |
There was a problem hiding this comment.
i see there is a Done() method - we could always update this to return a dedicated shutdown channel that is only closed after Wait() in Stop
| // Go call starts running here after acquiring the mutex, it | ||
| // would see that the context has expired and return false | ||
| // instead of calling wg.Add(1). | ||
| g.wg.Wait() |
There was a problem hiding this comment.
@bhandras - perhaps a 3rd opinion here just to break the tie would help? 🙏
bb3442f to
186db3e
Compare
186db3e to
37bb082
Compare
|
Pointing to this change from this PR. So I think we can merge here once the CI is green on that side |
|
Pushed a new tag |
Updates the GoroutineManager to avoid having its constructor take
a context since this is an anti-pattern that I think we should avoid. The GoroutineManagers will
generally be created within LND subserver constructors and these will not really ever take contexts
and we want to avoid passing a fresh context to these
this PR was opened in response to reviewing this PR