Skip to content

Conversation

@Frando
Copy link
Member

@Frando Frando commented Feb 9, 2026

Description

A couple of refactors to how iroh deals with errors from the relay:

  • Add new frame Close for the relay-to-iroh-endpoint protocol. It contains a CloseReason with two variants: SameEndpointIdConnected and Shutdown. The former is emitted when the connection is dropped by the relay server because it received a new connection from the same endpoint id.
  • Marks the frame Restarting as deprecated. It is not currently used. It is left in the wire protocol, but any new usage of the frame will trigger a deprecated warning.
  • Log the close reason on the client side

And the change that triggers this PR:

When the iroh relay receives a connection from an endpoint to which it already has a connection, it drops the old one. This is the case already for a long time. The old endpoint so far did not receive any indication as to why its connection was dropped, so it would reconnect again right away. The connection would succeed: Now it was its turn to be accepted, and the other endpoint would be dropped from the relay. This back-and-forth loop would continue infinitely.

This PR changes it such that if the SameEndpointIdConnected close frame is received, the receiving iroh endpoint will not reconnect to the relay. Instead, it will print an error log, and keep the relay connection in a permanently-pending state. This means that no further connection attempt is made, and all messages sent to the relay will be dropped eventually because its channels become full.

Breaking Changes

None

Notes & open questions

Not sure if this is the right call - let's discuss:

  • If we'd just stop the relay actor, it would not change anything, because most likely the relay actor would be restarted right away, triggering the connect-disconnect-connect loop between the two contenders again
  • We could add a hook so that the relay actor can fully shutdown the endpoint. I think this should be done after api: many Endpoint methods have unexpected behaviours after calling Endpoint::close #3905 is addressed, because currently we'd put the endpoint into a weird state. However, so far we treated relays as mostly-optional, and having a relay actor terminate the endpoint might also not be expected behavior.
  • We could add something like a "blocklist" or such to the top-level relay actor, so that we can terminate the ActiveRelayActor in such a way that the RelayActor does not restart it again.

Change checklist

  • Self-review.
  • Documentation updates following the style guide, if relevant.
  • Tests if relevant.
  • All breaking changes documented.
    • List all breaking changes in the above "Breaking Changes" section.
    • Open an issue or PR on any number0 repos that are affected by this breaking change. Give guidance on how the updates should be handled or do the actual updates themselves. The major ones are:

Fixes #3813

@Frando Frando force-pushed the Frando/refactor-relay-close branch from 4d33c2d to e15cd96 Compare February 9, 2026 13:57
@Frando Frando force-pushed the Frando/refactor-relay-close branch from 333e2ac to 6d80b55 Compare February 9, 2026 13:59
@github-actions
Copy link

github-actions bot commented Feb 9, 2026

Documentation for this PR has been generated and is available at: https://n0-computer.github.io/iroh/pr/3921/docs/iroh/

Last updated: 2026-02-09T14:48:16Z

@n0bot n0bot bot added this to iroh Feb 9, 2026
@github-project-automation github-project-automation bot moved this to 🚑 Needs Triage in iroh Feb 9, 2026
@dignifiedquire dignifiedquire moved this from 🚑 Needs Triage to 👀 In review in iroh Feb 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: 👀 In review

Development

Successfully merging this pull request may close these issues.

Detect shared secret key usage

1 participant