Skip to content

ar_data_sync skip local peer verification#650

Open
shizzard wants to merge 16 commits intorelease/performance-2.8from
ar_data_sync-sjup-local-peer-verification
Open

ar_data_sync skip local peer verification#650
shizzard wants to merge 16 commits intorelease/performance-2.8from
ar_data_sync-sjup-local-peer-verification

Conversation

@shizzard
Copy link
Contributor

No description provided.

bucket_based_offset => false }) of
{ok, Config} = application:get_env(arweave, config),
case get_chunk(Start + 1, #{
pack => lists:member(pack_served_chunks, Config#config.enable),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't trace through all code paths leading to get_tx_data. Some look like they start at an HTTP handler - makes sense that we'd only repack when the option is set for those.

But it looks like a bunch come from ar_storage:read_tx which I think is used by a lot of internal processes. I wonder if those processes might depend on the chunk being repacked? Were you able to confirm one way or another?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, but I'll check what I can.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

worst case you can revert this to pack => true - I don't think there's much of a risk there (and probably no new risk?)

Peer = ar_http_util:arweave_peer(Req),
IsLocalPeerAddr = lists:member(Peer, Config#config.local_peers),

case IsPackServedChunks orelse IsLocalPeerAddr of
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be andalso? I forget where we landed

Like I could see it working like this:

  • server sets pack_served_chunks, in which case it only repacks chunks for members of its local_peers
  • client sets request_packed_chunks and sync_from_local_peers_only in which case it requests packed chunks from its local_peers and does not validate it them before storing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We decided that we pack served chunks for local peers even when the option is not set.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should always require the option to be set. Unless you and Lev discussed otherwise?

i.e. there are a lot of scenarios where local_peers are set for other reasons (e.g. reduce rate limiting between CM cluster peers) and in those cases we probably don't want to also pack served chunks unless it's desired (e.g. we wouldn't want a CM Exit Node to get bogged down repacking accidentally)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this few times:

  • CM exit node will unlikely have any data at all;
  • local_peers are considered trusted and therefore any requested packing is served.

Copy link
Collaborator

@JamesPiechota JamesPiechota Nov 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I honestly don't remember wha we discussed. But I think we should require enable pack_served_chunks to be set. My understanding is that we use local_peers as a filter on top of that so that nodes don't end up packing for peers they don't want to. But we still need enable pack_served_chunks

The client can trust all data it receives from one of its local_peers though. That makes sense.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just chatted on slack. Current approach as implemented works! clients will only request packed chunks if they have fetch_packed_chunks set, but if they do request a packed chunk, the local_peer will repack.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants