refactor(iroh, iroh-relay): Add explicit close frame and better deal with disconnects due to same endpoint id #3921
+233
−45
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
A couple of refactors to how iroh deals with errors from the relay:
Closefor the relay-to-iroh-endpoint protocol. It contains aCloseReasonwith two variants:SameEndpointIdConnectedandShutdown. The former is emitted when the connection is dropped by the relay server because it received a new connection from the same endpoint id.Restartingas deprecated. It is not currently used. It is left in the wire protocol, but any new usage of the frame will trigger a deprecated warning.And the change that triggers this PR:
When the iroh relay receives a connection from an endpoint to which it already has a connection, it drops the old one. This is the case already for a long time. The old endpoint so far did not receive any indication as to why its connection was dropped, so it would reconnect again right away. The connection would succeed: Now it was its turn to be accepted, and the other endpoint would be dropped from the relay. This back-and-forth loop would continue infinitely.
This PR changes it such that if the SameEndpointIdConnected close frame is received, the receiving iroh endpoint will not reconnect to the relay. Instead, it will print an error log, and keep the relay connection in a permanently-pending state. This means that no further connection attempt is made, and all messages sent to the relay will be dropped eventually because its channels become full.
Breaking Changes
None
Notes & open questions
Not sure if this is the right call - let's discuss:
Endpointmethods have unexpected behaviours after callingEndpoint::close#3905 is addressed, because currently we'd put the endpoint into a weird state. However, so far we treated relays as mostly-optional, and having a relay actor terminate the endpoint might also not be expected behavior.Change checklist
quic-rpciroh-gossipiroh-blobsdumbpipesendmeFixes #3813