-
Notifications
You must be signed in to change notification settings - Fork 38
Description
We have 2 rounds of "how do I pick my requests after I left off?" extensions:
DoNotSendCIDs - by provide a list of CIDs not to send, I can specify exactly blocks what I want the requestor to provide. This provides an extension with predictable behavior with a remote that can be used to resume requests. The problem is that it's extremely cumbersome for large DAGs and we quickly hit situations where sending all the CIDs we don't want is just not feasible. It also requires the requestor to assemble a list of CIDs of what it has somehow.
DoNotSendFirstBlocks - this is our current extension. We simply pass a block index at which to start sending blocks, with the instruction to simply not send anything any blocks before that point in the traversal. This extension works in most cases and is very compact. However, does not ultimately provide a predictable resume. The responder might skip over data it was missing in the initial transfer. If it acquired that data in the time before the request was resumed, suddenly DoNotSendFirstBlocks could behave quite unpredictably, as now those first blocks include blocks that weren't there before. (I almost forgot this was the reason I pursued the DoNotSendCIDs initially) In practice, this isn't much of a problem in Filecoin, where SPs tend to have the same data they've always had (even if they are missing some of it). But it seems like an issue for IPFS or a Retrieval Market where people's data stores change all the time.
Both of these extensions have an additional problem: they require the server to reexecute the selector traversal up to the point of resumption, even if the provider is not intend to send the data.
It seems to me we need an extensions that:
- can represent a resume point in the traversal compactly
- can do it in a predictable way.
- allows the provider to optimally skip as much traversal as is possible up to the point of resumption
I believe the simplest way to represent this is an IPLD path. Moreover, there's already a proposed addition to selector traversal to allow it to resume at a path -- ipld/go-ipld-prime#358
I'm not sure the above is the exact right solution, but I believe a ResumeAtPath extension is our best path forward.