Skip to content

fix: if there are peers with the data we seek, but all of them are in…#954

Open
JamesPiechota wants to merge 1 commit intomasterfrom
jp/wait-for-read-peers-20260204
Open

fix: if there are peers with the data we seek, but all of them are in…#954
JamesPiechota wants to merge 1 commit intomasterfrom
jp/wait-for-read-peers-20260204

Conversation

@JamesPiechota
Copy link
Collaborator

@JamesPiechota JamesPiechota commented Feb 4, 2026

… cooldown - wait and try the range again

This avoids a death loop that can happen when there aren't many peers available for a range. If they all get throttled the node will iterate through the collect_peer_intervals loop very fast and likely trip the rate limit again as soon as it is lifted - but each time it will be at a different, random, point in the loop which can mean it takes forever to enqueue all intervals for the range.

Now with the wait the node will continue to march methodically through the range, waiting as necessary but never skipping ahead.


Note

Medium Risk
Touches core sync scheduling/peer selection logic; incorrect wait/backoff behavior could slow syncing or reduce progress under certain peer-availability/rate-limit conditions.

Overview
Prevents collect_peer_intervals from spinning through ranges when peers exist but are all on cooldown/throttled, by introducing an explicit wait outcome from peer selection and retrying the same offset after a delay.

Refactors peer selection in ar_peer_intervals into get_peers/2+get_peers2/4 shared by both normal and footprint modes, and adds a small log improvement in ar_data_sync to include queue_size and avoid rereading store_id from the record.

Written by Cursor Bugbot for commit 535f343. This will update automatically on new commits. Configure here.

… cooldown - wait and try the range again

This avoids a death loop that can happen when there aren't many peers available for a range. If they all get throttled the node will iterate through the collect_peer_intervals loop very fast and likely trip the rate limit again as soon as it is lifted - but each time it will be at a different point in the loop which can mean it takes forweaver to enqueue all itnervals for the range.

Now with the wait the node will continue to march methodically through the range, waiting as necessary but never skipping ahead.
not ar_rate_limiter:is_on_cooldown(Peer, RPMKey) andalso
not ar_rate_limiter:is_throttled(Peer, Path)
],
case length(AllPeers) > 0 andalso length(HotPeers) == 0 of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you condition on length(AllPeers)? Does not it make sense to always wait when the discovered peer list is empty?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm expecting there may be some ranges of chunks for where there are currently no peers at all (e.g. perhaps during some of the periods with high sacrifice mining, or perhaps when the node has configured they sync_from_local_peers_only too restrictively) - in that case I didn't want to have the node block indefinitely and instead continue syncing where possible.

@JamesPiechota
Copy link
Collaborator Author

@cursor review please

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants