Search before asking
Motivation
Currently, remote reads download all files first before consuming them, right? One potential optimization is to make it a streaming process, where downloaded files can be consumed immediately.
Our internal index building scenario typically involves consuming files from a day ago or even a few hours ago. I'm concerned about efficiency issues if we use the current model.
Solution
No response
Anything else?
No response
Willingness to contribute