Skip to content

Optimized selector traversal over graphsync #253

@hannahhoward

Description

@hannahhoward

What

What we'd ultimately like to do is prevent duplicate selector traversals of the same block for certainly types of IPLD graph traversals. While the logic to detect "duplicates" is quite complicated and will evolve over time (our very first implementation is to special case just the traverse-all selector), we need a way to communicate about it over the network.

Rather than try to communicate about how to perform optimizations between peers, we're going to allow each side to optimize according to their OWN best interest and let the other know what they did. Most likely responders will almost always turn this on cause it minimizes their processing time. For client requests, we'll probably make it configurable when you initialize graphsync or perhaps at the request level as well. As we evolve our logic for detecting duplicate traversals, this will enable us to upgrade only one side and still gracefully finish a request.

How

currently, the schema for metadata about links traversed is as follows, encoded in the extension `graphsync/response-metadata:

type LinkMetadata struct {
  link Cid
  blockPresent Bool
}

type ResponseMetadata [LinkMetadata]

This ticket proposes a graphsync/metadata/1.1 extension of this format:

type BlockStatus enum {
  | Present  ("0")
  | Absent   ("1")
  | DuplicateTraversal ("2")
} representation int

type LinkMetadata struct {
  link Cid
  blockStatus BlockStatus
}

type ResponseMetadata [LinkMetadata]

When a responder sends DuplicateTraversal, it is telling the requestor that they their own traversal skipped this link, even though they have this block, because they believe it to be a duplicate. The requestor can expect no blocks or links to be sent for this part of the tree, whether or not the responder is telling the truth that it's a duplicate.

When a requestor receives this message, how it proceeds is up to the local configuration around duplicate traversals, and whether the traversal is indeed a duplicate according to local logic for detecting duplicate traversals. If it does proceed down the path, it can be assured that till it returns to that part of the traversal, all blocks loads will be local based on data previously received. If there is no local block, the requestor should not wait for a remote block.

Backwards Compatibility

It's safe to assume many requestors will support only the legacy metadata format for some time, but responders may still want to limit duplicate traversals for requestors supporting the new format.

With this in mind, I think we had better add a requestor side extension graphsync/response-metadata-version:

type Version int

And those that send the extension with a 1.1 are telling the responder ahead of time they support metadata 1.1

Alternatively, we know we don't want metadata to be an extension at all but rather part of the core protoocol, so maybe we should fold this into graphsync 1.1 encoded as CBOR (#88, ipld/specs#354, ipld/specs#355)

This is a meta-issue that will lead to implementation tickets once I have sign-off from @warpfork and others

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium: Good to have, but can wait until someone steps upeffort/weeksEstimated to take multiple weeksexp/intermediatePrior experience is likely helpfulneed/analysisNeeds further analysis before proceedingneed/maintainer-inputNeeds input from the current maintainer(s)status/blockedUnable to be worked further until needs are met

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions