Skip to content

feat/nested aggregations#102

Open
davidkelley wants to merge 2 commits intomainfrom
feat/nested-aggregations
Open

feat/nested aggregations#102
davidkelley wants to merge 2 commits intomainfrom
feat/nested-aggregations

Conversation

@davidkelley
Copy link
Owner

Copilot AI review requested due to automatic review settings February 10, 2026 16:39
@davidkelley davidkelley linked an issue Feb 10, 2026 that may be closed by this pull request
@davidkelley
Copy link
Owner Author

Update based on the current feat/nested-aggregations branch state:

Implemented Option A (true nested aggregations) for issue #90.

What was added

  • New aggregation type: nested with request shape { "type": "nested", "path": "...", "aggs": { ... } }.
  • New response variant: nested bucket response with doc_count and nested aggregations.
  • Nested collector pipeline in core aggregation engine:
    • iterates nested objects per matched parent doc,
    • preserves parent/child binding for deeper nested paths,
    • supports nested child aggs currently via terms and further nested blocks.
  • Validation updates:
    • validates nested paths,
    • resolves child fields relative to nested scope,
    • rejects unsupported child aggregation types under nested scope.
  • Fast-field nested object access helper for object-scoped term collection.
  • Segment nested metadata fix to correctly accumulate parent/object lineage for repeated nested objects.

Performance notes (Option A)

  • Impact is query-time only for requests that include nested aggregations.
  • Normal indexing/search paths are unchanged when nested aggs are not requested.
  • Cost scales with:
    • number of matched docs,
    • nested object count per matched doc,
    • terms cardinality in nested fields.
  • Added docs guidance to keep nested arrays bounded, apply outer filters, and use sampling/flattened facet fields for hotspot cases.

Tests, docs, and benchmark updates

  • Added core integration tests for:
    • nested object-level counts,
    • outer-filter interaction,
    • invalid nested path handling,
    • unsupported nested-child aggregation handling,
    • parent-binding for nested-in-nested,
    • metadata key/value facet binding regression.
  • Added HTTP integration coverage for nested aggs.
  • Updated request schema docs and README nested aggregation examples/perf caveats.
  • Added nested aggregation benchmark scenario in searchlite-core/benches/aggs.rs (aggs_nested_terms_metadata).

Validation run on this branch

  • cargo fmt --all
  • cargo clippy --all --all-features --all-targets -- -D warnings
  • cargo build --all --all-features
  • cargo test --all --all-features
  • cargo bench -p searchlite-core

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for nested aggregations (nested buckets with scoped sub-aggregations) across the core engine and HTTP layer, including request/response types, validation, execution logic, schema updates, docs, and benchmarks.

Changes:

  • Add nested aggregation request/response types and JSON schema support.
  • Implement nested aggregation execution (including nested-within-nested scoping) and update nested indexing/fastfield access to support object-level binding.
  • Add core + HTTP tests, docs, and a benchmark for nested term aggregations.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
searchlite-http/src/lib.rs Adds an HTTP integration test covering nested aggregations end-to-end.
searchlite-core/tests/aggregations.rs Adds core tests for nested terms, filtering interaction, validation failures, and nested-within-nested binding.
searchlite-core/src/query/aggs/mod.rs Implements nested aggregation collector + scoped collection and fastfield reads per nested object.
searchlite-core/src/index/segment.rs Fixes nested collection bookkeeping to support accumulating nested objects across parent objects (enables deeper nesting).
searchlite-core/src/index/fastfields.rs Adds nested_str_values_at to read values for a specific nested object index.
searchlite-core/src/api/types.rs Introduces NestedAggregation and AggregationResponse::Nested.
searchlite-core/src/api/reader.rs Adds nested aggregation validation + scoped field resolution for nested contexts.
searchlite-core/benches/aggs.rs Adds a benchmark for nested terms aggregation over key/value metadata pairs.
search-request.schema.json Adds JSON schema definition for nested aggregation blocks.
README.md Documents nested aggregation semantics and provides an example.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 40541f1890

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

- stabilize nested sampling hash with fixed-width object index

- validate nested agg paths against declared nested schema tree

- share scoped path helpers between validation and aggregation execution

- guard nested fast-field object lookups to the current doc range

- remove nested finalize clone and add regression coverage
@davidkelley
Copy link
Owner Author

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 208ba0d6c5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +952 to +956
let scoped_field = resolve_scoped_path(scope.path, &self.field);
self
.ctx
.fast_fields
.nested_str_values_at(&scoped_field, doc_id, scope.object_idx)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Restrict scoped terms fields to current nested level

This path resolution is used with scope.object_idx to read nested_str_values_at, which assumes the requested field is indexed at the same nested level as the current scope. If a nested terms agg points at a deeper descendant field (for example comment.reply.author inside nested(path: "comment")), validation currently accepts it, but scope.object_idx (comment index) is then applied to reply offsets, so buckets can include values from the wrong parent object and silently return incorrect counts.

Useful? React with 👍 / 👎.

Comment on lines +1794 to +1796
if let (Some(scope), Some(parents)) = (parent_scope, parents.as_ref()) {
if parents.get(object_idx).and_then(|p| *p) != Some(scope.object_idx) {
continue;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Enforce direct-child paths for nested-in-nested scopes

The parent filter compares each candidate object's immediate parent id to scope.object_idx, which is only valid when the nested path is a direct child of the current scope. Because scoped nested paths are resolved broadly (any descendant under the scope), a query like nested(path:"comment") containing nested(path:"reply.vote") can pass validation, but here parents stores reply indices while scope.object_idx is a comment index, causing misattributed or dropped nested objects and incorrect doc_count/sub-aggregation outputs.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

#4: Nested aggregation facets

1 participant