Batching in query execution

In this task we will try to acheive batching in LID (postings) tree processing, as well as for histograms and aggregations. Modern analytical engines also process data in batches. Seq-db currently supports histograms and aggregations, not only full-text search. These operation shall also benefit from batching.

Batches will be just LID slices of arrays. In future, it's possible to support bitmaps or any other data structure.

Batching will not allow more efficient execution from algorithmic perspective, but benefit well from CPU. It should also align well with block skipping (NextGEQ). It will also be interesting to measure simple scenarios like histograms and aggregations with plain queries `service:some-service` with large amount of LIDs scrolled through.

There are also some downsides. At the first glance, it seems we can hit additional overhead on certain queries. For example, `pod:gateway-* AND request_id:'123'`. If fetching LIDs from token`request_id` will result in [30_000, 300_000, 700_000], then we will proceed with passing every LID of those to `pod:gateway-*` tree and get large batches each time basically to merge with a single LID. This might be partially addressed with hinting `Next` method of how much LIDs do we need, so that we use batching where we have a lot of LIDs from both sides.

**Pros**
- faster execution
- ability to make GetMID, GetRID, histogram and aggretation support batching - increase CPU cache efficiency 

**Cons**
- additional memory management (temp slices) in iterators
- need to make sure we do not do more job on some queries (disk reads, CPU work)

 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batching in query execution #329

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Batching in query execution #329

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions