Skip to content

Conversation

@deshsidd
Copy link
Contributor

@deshsidd deshsidd commented Jan 5, 2026

Draft PR to support for Histogram field and aggregations.

Description

[Describe what this change achieves]

Related Issues

Addresses: #15284

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Summary by CodeRabbit

Release Notes

  • New Features
    • Added a new Histogram field type to support indexing and querying histogram-structured data with values and counts
    • Extended aggregations and queries to work with histogram fields, including min, max, sum, average, stats, percentiles, range, histogram aggregations, and exists queries

✏️ Tip: You can customize this high-level summary in your review settings.

Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com>
Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 5, 2026

📝 Walkthrough

Walkthrough

This pull request introduces comprehensive histogram field support to OpenSearch, enabling indexing and aggregation of histogram-structured data (values with associated counts). It adds a new field mapper, field data implementations, and extends the aggregation framework to handle histogram sources across multiple aggregator types.

Changes

Cohort / File(s) Summary
Core Histogram Field Type and Data Model
server/src/main/java/org/opensearch/index/mapper/HistogramFieldMapper.java, server/src/main/java/org/opensearch/index/fielddata/HistogramIndexFieldData.java, server/src/main/java/org/opensearch/index/fielddata/HistogramLeafFieldData.java, server/src/main/java/org/opensearch/indices/fielddata/Histogram.java
New field mapper for histogram types with binary doc value encoding; validates and stores value-count pairs. Index and leaf field data classes provide field data access. Core Histogram class encapsulates values and counts arrays.
Histogram Values Access
server/src/main/java/org/opensearch/index/fielddata/HistogramValues.java, server/src/main/java/org/opensearch/indices/fielddata/HistogramValues.java
Interfaces defining histogram data access pattern with advanceExact and histogram retrieval methods across index and indices packages.
ValuesSource and Type Registration
server/src/main/java/org/opensearch/index/fielddata/HistogramValuesSource.java, server/src/main/java/org/opensearch/search/aggregations/support/CoreValuesSourceType.java, server/src/main/java/org/opensearch/search/aggregations/support/HistogramValuesSourceType.java, server/src/main/java/org/opensearch/search/aggregations/support/ValuesSourceConfig.java
HistogramValuesSource extends ValuesSource.Numeric to expose histogram values and counts. CoreValuesSourceType.HISTOGRAM and registration logic enable histogram sources in aggregation framework. ValuesSourceConfig detects and routes histogram field types.
Aggregator Framework Integration
server/src/main/java/org/opensearch/search/aggregations/bucket/histogram/AbstractHistogramAggregator.java, server/src/main/java/org/opensearch/search/aggregations/bucket/histogram/HistogramAggregatorFactory.java, server/src/main/java/org/opensearch/search/aggregations/bucket/histogram/NumericHistogramAggregator.java, server/src/main/java/org/opensearch/search/aggregations/bucket/histogram/RangeHistogramAggregator.java, server/src/main/java/org/opensearch/search/aggregations/bucket/histogram/InternalHistogram.java
Register HISTOGRAM as valid CoreValuesSourceType; update histogram and range aggregators to use RAW format for histogram sources.
Range Aggregation Histogram Support
server/src/main/java/org/opensearch/search/aggregations/bucket/range/AbstractRangeAggregatorFactory.java, server/src/main/java/org/opensearch/search/aggregations/bucket/range/RangeAggregator.java, server/src/main/java/org/opensearch/search/aggregations/bucket/range/RangeAggregatorFactory.java
Register HISTOGRAM type; implement histogram-aware collector that iterates per-doc histogram values and counts, matching ranges and propagating to sub-buckets.
Metric Aggregators: Cardinality & Percentile Updates
server/src/main/java/org/opensearch/search/aggregations/metrics/AbstractHDRPercentilesAggregator.java, server/src/main/java/org/opensearch/search/aggregations/metrics/AbstractTDigestPercentilesAggregator.java, server/src/main/java/org/opensearch/search/aggregations/metrics/AbstractInternalTDigestPercentiles.java, server/src/main/java/org/opensearch/search/aggregations/metrics/PercentileRanksAggregatorFactory.java, server/src/main/java/org/opensearch/search/aggregations/metrics/PercentilesAggregatorFactory.java
HDR and TDigest percentile aggregators conditionally use RAW format for histogram sources and implement histogram-aware collectors. Factory registrations add HISTOGRAM to supported types.
Metric Aggregators: Min/Max/Value-Count Updates
server/src/main/java/org/opensearch/search/aggregations/metrics/MinAggregator.java, server/src/main/java/org/opensearch/search/aggregations/metrics/MinAggregatorFactory.java, server/src/main/java/org/opensearch/search/aggregations/metrics/MaxAggregator.java, server/src/main/java/org/opensearch/search/aggregations/metrics/MaxAggregatorFactory.java, server/src/main/java/org/opensearch/search/aggregations/metrics/ValueCountAggregator.java
Min/Max aggregators iterate histogram values to find first/last per-doc minima/maxima; ValueCount sums histogram counts. Factories register HISTOGRAM type.
Metric Aggregators: Sum/Avg/Stats Updates
server/src/main/java/org/opensearch/search/aggregations/metrics/SumAggregator.java, server/src/main/java/org/opensearch/search/aggregations/metrics/SumAggregatorFactory.java, server/src/main/java/org/opensearch/search/aggregations/metrics/AvgAggregator.java, server/src/main/java/org/opensearch/search/aggregations/metrics/AvgAggregatorFactory.java
Implement weighted sum/average using histogram values and counts (value \* count). Use RAW format for histogram sources. Factories register HISTOGRAM type.
Metric Aggregators: Stats/ExtendedStats Updates
server/src/main/java/org/opensearch/search/aggregations/metrics/StatsAggregator.java, server/src/main/java/org/opensearch/search/aggregations/metrics/StatsAggregatorFactory.java, server/src/main/java/org/opensearch/search/aggregations/metrics/ExtendedStatsAggregator.java, server/src/main/java/org/opensearch/search/aggregations/metrics/ExtendedStatsAggregatorFactory.java
Stats and ExtendedStats aggregators implement weighted aggregation with histogram counts for sum, count, and variance calculations. Factories register HISTOGRAM type.
Module Registration
server/src/main/java/org/opensearch/indices/IndicesModule.java
Register HistogramFieldMapper.PARSER in built-in mapper registry.
Test Suites
server/src/test/java/org/opensearch/index/mapper/HistogramFieldMapperTests.java, server/src/test/java/org/opensearch/index/query/HistogramFieldQueryTests.java
HistogramFieldMapperTests validates field parsing, encoding, validation of values/counts, serialization. HistogramFieldQueryTests exercises exists query, min/max/sum/avg/stats/percentiles/range/histogram aggregations on histogram data.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant Mapper as HistogramFieldMapper
    participant FieldData as HistogramIndexFieldData<br/>(LeafReaderContext)
    participant LeafFieldData as HistogramLeafFieldData
    participant DocValues as DocValues<br/>(BinaryDocValues)
    
    User->>Mapper: Index document with<br/>histogram (values, counts)
    activate Mapper
    Mapper->>Mapper: validateHistogramData()
    Mapper->>Mapper: encodeHistogram() →<br/>ByteBuffer
    Mapper-->>User: Store BinaryDocValuesField
    deactivate Mapper
    
    User->>FieldData: load(leafContext)
    activate FieldData
    FieldData->>LeafFieldData: new HistogramLeafFieldData()
    activate LeafFieldData
    LeafFieldData->>DocValues: getBinary(reader,<br/>fieldName)
    LeafFieldData-->>FieldData: Ready for queries
    deactivate LeafFieldData
    deactivate FieldData
    
    User->>LeafFieldData: getHistogramValues()
    activate LeafFieldData
    LeafFieldData->>DocValues: advanceExact(docId)
    activate DocValues
    DocValues-->>LeafFieldData: BytesRef
    deactivate DocValues
    LeafFieldData->>LeafFieldData: decodeHistogram(BytesRef)<br/>→ Histogram
    LeafFieldData-->>User: HistogramValues
    deactivate LeafFieldData
Loading
sequenceDiagram
    actor User
    participant Factory as AggregatorFactory<br/>(e.g., Sum)
    participant ValuesSourceConfig as ValuesSourceConfig
    participant Aggregator as Aggregator<br/>(e.g., SumAggregator)
    participant ValuesSource as HistogramValuesSource
    participant FieldData as HistogramLeafFieldData
    
    User->>Factory: Build aggregation on<br/>histogram field
    activate Factory
    Factory->>ValuesSourceConfig: resolve(histogramField)
    activate ValuesSourceConfig
    ValuesSourceConfig->>ValuesSourceConfig: Detect HistogramFieldType
    ValuesSourceConfig-->>Factory: ValuesSourceConfig<br/>(HISTOGRAM type)
    deactivate ValuesSourceConfig
    Factory->>Aggregator: getLeafCollector(<br/>ValuesSource)
    activate Aggregator
    Aggregator->>ValuesSource: instanceof check
    Note over Aggregator: Branch: Histogram path
    Aggregator->>ValuesSource: doubleValues(context)
    activate ValuesSource
    ValuesSource->>FieldData: getHistogramValues()
    activate FieldData
    FieldData-->>ValuesSource: HistogramValues
    deactivate FieldData
    ValuesSource-->>Aggregator: SortedNumericDoubleValues
    deactivate ValuesSource
    Aggregator->>Aggregator: Loop: per-doc histogram<br/>Accumulate(value × count)
    Aggregator-->>User: Weighted aggregation result
    deactivate Aggregator
    deactivate Factory
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested labels

Search:Aggregations, Indexing, enhancement

Suggested reviewers

  • andrross
  • sachinpkale
  • shwetathareja
  • saratvemulapalli
  • mch2
  • msfroh
  • dbwiddis
  • jed326
  • owaiskazi19
  • Rishikesh1159

Poem

🐰 A histogram hops through the fields so bright,
Values and counts dance in perfect sight,
Aggregations bloom with weighted cheer,
Min, max, and sum—the magic is here!
With tests in place and mappers aligned,
Histogram support is perfectly designed! 🎉

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 1.23% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Description check ❓ Inconclusive The PR description is incomplete. It lacks a substantive description of what the change achieves beyond the title, missing details about the implementation approach and design decisions. Provide a detailed description in the 'Description' section explaining what histogram field support entails, how aggregations work, and the implementation approach.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Draft: Add support for histogram field and aggregations' clearly and concisely describes the main change: adding histogram field support and aggregations. It directly matches the PR's primary objective.
✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 16

Fix all issues with AI Agents 🤖
In @server/src/main/java/org/opensearch/index/fielddata/HistogramValues.java:
- Around line 53-55: The Javadoc for the method in class HistogramValues still
refers to HistogramValue but the method now returns Histogram; update the
documentation to reference the correct return type (Histogram) and any related
wording (e.g., "Get the Histogram associated with the current document" and note
reuse of the returned Histogram) so the comment matches the actual return type
and behavior of the method.

In @server/src/main/java/org/opensearch/index/mapper/HistogramFieldMapper.java:
- Line 1: Add the standard Apache‑2.0 SPDX license header to this source; update
the top of HistogramFieldMapper.java (package org.opensearch.index.mapper; /
class HistogramFieldMapper) to include the same SPDX/Apache‑2.0 license header
block used across the project (copy from other mapper files) so the file
complies with licensing requirements.
- Around line 119-126: The values array parsing accepts NaN and Infinity; update
the block that handles VALUES_FIELD in HistogramFieldMapper (the loop using
parser.nextToken(), token, and values.add(parser.doubleValue())) to explicitly
validate each parsed double (use Double.isFinite(value) or check
!Double.isNaN(value) && !Double.isInfinite(value)) and throw the same
IllegalArgumentException("values must be numbers") when a non-finite value is
encountered so NaN/Infinity are rejected.

In @server/src/main/java/org/opensearch/indices/fielddata/HistogramValues.java:
- Around line 13-16: Delete the unused interface
org.opensearch.indices.fielddata.HistogramValues (remove the file and its
declaration), and update any imports/usages to reference the established
abstract class org.opensearch.index.fielddata.HistogramValues instead; ensure no
code still imports org.opensearch.indices.fielddata.HistogramValues and run a
compile to verify all aggregator classes that rely on HistogramValues are using
the abstract class symbol.

In
@server/src/main/java/org/opensearch/search/aggregations/bucket/histogram/HistogramAggregatorFactory.java:
- Line 71: NumericHistogramAggregator currently reads histogram values via
doubleValues() but ignores per-value counts, causing each histogram entry to be
treated as count=1; update the aggregation logic in NumericHistogramAggregator
(the code that handles values iteration and key computation using
Math.floor((value - offset) / interval) and DocValueFormat.RAW) to detect when
the values source is a HistogramValuesSource, call histogram.getCounts()
alongside histogram.getValues(), and multiply/add bucket doc counts by the
corresponding count for each value (iterate counts in sync with values),
mirroring the pattern used in RangeAggregator where instanceof
HistogramValuesSource triggers extraction of both getValues() and getCounts()
and iterates counts when updating buckets.

In
@server/src/main/java/org/opensearch/search/aggregations/bucket/range/RangeAggregator.java:
- Around line 379-389: The code in RangeAggregator increments the bucket once
via collectBucket and then count times via collectExistingBucket, producing
count+1 increments; change the logic so total increments equal count by either
calling collectExistingBucket (count - 1) times when count > 0 after the initial
collectBucket, or (if intent is to only use the histogram count) remove the
initial collectBucket and call collectExistingBucket count times; update the
loop that currently does "for (long k = 0; k < count; k++) {
collectExistingBucket(...); }" accordingly and ensure you handle count == 0
safely; verify the behavior for matchedRange/ranges and
subBucketOrdinal/collectBucket/collectExistingBucket remains consistent.

In
@server/src/main/java/org/opensearch/search/aggregations/bucket/range/RangeAggregatorFactory.java:
- Around line 35-46: Remove the unused imports from RangeAggregatorFactory.java:
delete HistogramValuesSource, DocValueFormat, ValuesSource, Aggregator,
AggregatorFactories, AggregatorFactory, CardinalityUpperBound,
InternalRange.Factory, and SearchContext from the import list; ensure only the
actually referenced types (e.g., QueryShardContext, RangeAggregator.Range,
ValuesSourceConfig, and any others used in this file) remain and then recompile
to verify no missing references.

In
@server/src/main/java/org/opensearch/search/aggregations/metrics/AbstractInternalTDigestPercentiles.java:
- Line 38: The file AbstractInternalTDigestPercentiles contains an unused import
of org.opensearch.index.fielddata.HistogramValuesSource; remove that import line
so the class no longer references HistogramValuesSource (ensure no other code in
AbstractInternalTDigestPercentiles uses HistogramValuesSource before removing).
This will clean up the unused dependency and eliminate the unused-import
warning.

In
@server/src/main/java/org/opensearch/search/aggregations/metrics/AvgAggregator.java:
- Around line 204-269: The histogram-handling block is unreachable because the
method returns before branching on valuesSource; move the histogram vs
non-histogram branching so you check "if (valuesSource instanceof
HistogramValuesSource)" before any return, and then return the appropriate
LeafBucketCollectorBase for each branch (the histogram branch building the
collector that uses HistogramValues, and the existing non-histogram branch using
values.nextValue()). Update the method in AvgAggregator where the current early
return occurs so each branch returns its collector directly and no code after
the return is intended to execute.

In
@server/src/main/java/org/opensearch/search/aggregations/metrics/MinAggregatorFactory.java:
- Line 60: MinAggregatorFactory's histogram collection uses only
histogram.getValues()[0] and ignores counts, so update the histogram handling in
the collector to iterate over histogram.getValues() and histogram.getCounts(),
skipping entries where counts[i] == 0 and computing the minimum across only
entries with non-zero counts; use histogram.getValues() and
histogram.getCounts(), compare each value to the current min and set when lower
(ensure to handle empty/zero-count-only histograms appropriately, e.g., leave as
NaN/unset as current code expects).

In
@server/src/main/java/org/opensearch/search/aggregations/metrics/StatsAggregator.java:
- Around line 150-152: The histogram handling in StatsAggregator currently does
O(count) work by repeating kahanSummation.add(value) inside the loop; change
this to a single weighted addition (e.g., kahanSummation.add(value * count)) to
avoid the per-count loop and drastically improve performance for large counts,
or alternatively implement and call a weighted Kahan add method if you want to
preserve per-addition Kahan compensation semantics; update the code that uses
the long count and value inside the for-loop to perform one multiply-and-add via
kahanSummation instead of iterating.

In
@server/src/main/java/org/opensearch/search/aggregations/support/HistogramValuesSourceType.java:
- Around line 33-36: The getField method currently returns null; implement it to
resolve and return a HistogramValuesSource using the provided FieldContext and
AggregationScript.LeafFactory so histogram aggregations can obtain a proper
ValuesSource. Inside getField(FieldContext fieldContext,
AggregationScript.LeafFactory script) construct and return a
HistogramValuesSource (or use the existing factory/utility that wraps a
FieldContext and script into a ValuesSource) ensuring you handle missing/mapped
fields similarly to other ValuesSourceType implementations and propagate any
exceptions as before.

In
@server/src/test/java/org/opensearch/index/mapper/HistogramFieldMapperTests.java:
- Around line 319-345: The parser currently accepts Double.POSITIVE_INFINITY and
Double.NaN because XContentParser treats them as numeric tokens and
parseCreateField() calls parser.doubleValue() without rejecting special values;
update validation so validateHistogramData() (or immediately after values are
parsed in parseCreateField()) checks each value with Double.isFinite(value) (or
equivalently !Double.isInfinite && !Double.isNaN) and throws a
MapperParsingException with a message containing "values must be numbers" when
any value is not finite; ensure the thrown exception becomes the cause so
existing tests asserting containsString("values must be numbers") continue to
pass.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
server/src/main/java/org/opensearch/search/aggregations/metrics/PercentilesAggregatorFactory.java (1)

62-70: Update test class declarations to include histogram support, or exclude HISTOGRAM from factory registration.

The AbstractHDRPercentilesAggregator and AbstractTDigestPercentilesAggregator both contain explicit histogram handling logic (extracting histogram values and counts, recording them with recordValueWithCount), but the corresponding test classes—TDigestPercentilesAggregatorTests and HDRPercentilesAggregatorTests—do not declare CoreValuesSourceType.HISTOGRAM in their getSupportedValuesSourceTypes() method. This creates an inconsistency: the factory now registers histogram support, the implementations handle histograms, but the tests don't verify it. Either add HISTOGRAM to the test declarations to enable proper test coverage, or remove it from the factory registration if histogram support is intentionally incomplete.

server/src/main/java/org/opensearch/search/aggregations/metrics/MinAggregator.java (1)

214-265: Dead code: getRegularLeafCollector method is never called.

This helper method is defined but never invoked. The getLeafCollector method at lines 166-211 handles both histogram and non-histogram paths directly and returns before reaching this code.

Additionally, line 219 calls growMins(bucket) but then lines 220-224 immediately duplicate the same grow logic, which would be redundant even if this code were reachable.

🔎 Proposed fix

Remove the dead code method entirely:

-    private LeafBucketCollector getRegularLeafCollector(BigArrays bigArrays, SortedNumericDoubleValues allValues, LeafBucketCollector sub) {
-        final NumericDoubleValues values = MultiValueMode.MIN.select(allValues);
-        return new LeafBucketCollectorBase(sub, allValues) {
-            @Override
-            public void collect(int doc, long bucket) throws IOException {
-                growMins(bucket);
-                if (bucket >= mins.size()) {
-                    long from = mins.size();
-                    mins = bigArrays.grow(mins, bucket + 1);
-                    mins.fill(from, mins.size(), Double.POSITIVE_INFINITY);
-                }
-
-                if (values.advanceExact(doc)) {
-                    final double value = values.doubleValue();
-                    double min = mins.get(bucket);
-                    mins.set(bucket, Math.min(min, value));
-                }
-            }
-
-            @Override
-            public void collect(DocIdStream stream, long bucket) throws IOException {
-                // ... rest of method ...
-            }
-            // ... rest of method ...
-        };
-    }
server/src/main/java/org/opensearch/search/aggregations/metrics/StatsAggregator.java (1)

165-188: Incomplete histogram support: collect(DocIdStream, bucket) ignores histogram counts.

This method still uses counts.increment(bucket, valuesCount) which treats each histogram bucket as a single value, ignoring the actual counts. When processing histogram fields via this code path, statistics will be incorrect.

Either add histogram-aware handling here or document that histogram fields don't support this collection mode.

🧹 Nitpick comments (19)
server/src/main/java/org/opensearch/search/aggregations/bucket/histogram/InternalHistogram.java (1)

39-39: Consider removing unused import.

The HistogramValuesSource import doesn't appear to be used in this file. If it's not needed, consider removing it to keep the imports clean.

server/src/main/java/org/opensearch/search/aggregations/bucket/histogram/AbstractHistogramAggregator.java (1)

37-37: Consider removing unused import.

The HistogramValuesSource import doesn't appear to be used in this file. If it's not needed, consider removing it to keep the imports clean.

server/src/main/java/org/opensearch/search/aggregations/bucket/range/AbstractRangeAggregatorFactory.java (1)

111-116: LGTM! Format selection logic is correct.

The conditional format selection properly handles histogram sources by using DocValueFormat.RAW, which is appropriate for preserving raw histogram values. The cast to Numeric on line 123 is type-safe since HistogramValuesSource extends ValuesSource.Numeric.

The same conditional format selection pattern appears in multiple files:

  • NumericHistogramAggregator.java (lines 91-93)
  • RangeHistogramAggregator.java (lines 91-93)
  • AbstractRangeAggregatorFactory.java (lines 111-116)

Consider extracting this logic into a helper method to reduce duplication:

🔎 Suggested refactor to reduce duplication

Create a utility method in a common location:

public static DocValueFormat selectFormatForHistogram(ValuesSourceConfig config) {
    return config.getValuesSource() instanceof HistogramValuesSource
        ? DocValueFormat.RAW
        : config.format();
}

Then use it consistently:

selectFormatForHistogram(valuesSourceConfig)

Also applies to: 124-124

server/src/main/java/org/opensearch/indices/fielddata/Histogram.java (1)

11-30: Add constructor validation for array length consistency.

The constructor accepts values and counts arrays but does not validate that they have the same length. This could lead to subtle bugs if mismatched arrays are passed, since getSize() only returns values.length.

🔎 Proposed validation
 public Histogram(double[] values, long[] counts) {
+    if (values == null || counts == null) {
+        throw new IllegalArgumentException("values and counts must not be null");
+    }
+    if (values.length != counts.length) {
+        throw new IllegalArgumentException("values and counts must have the same length");
+    }
     this.values = values;
     this.counts = counts;
 }
server/src/main/java/org/opensearch/search/aggregations/support/HistogramValuesSourceType.java (1)

43-46: Return a meaningful type name.

Returning an empty string for typeName() will make debugging and error messages unclear.

🔎 Proposed fix
 @Override
 public String typeName() {
-    return "";
+    return "histogram";
 }
server/src/main/java/org/opensearch/index/fielddata/HistogramValuesSource.java (1)

121-128: Dead code: decodeMin method is never called.

This private method is defined but never invoked anywhere in this class or accessible externally.

🔎 Proposed fix

Remove the unused method:

-    private double decodeMin(BytesRef bytesRef) {
-        ByteBuffer buffer = ByteBuffer.wrap(bytesRef.bytes, bytesRef.offset, bytesRef.length);
-        int size = buffer.getInt();
-        if (size > 0) {
-            return buffer.getDouble();
-        }
-        return Double.NaN;
-    }
server/src/main/java/org/opensearch/index/fielddata/HistogramLeafFieldData.java (3)

45-57: Consider adding defensive checks for malformed binary data.

The decodeHistogram method assumes the binary data is well-formed. If the stored size value is corrupted or larger than the actual data, buffer.getDouble() or buffer.getLong() will throw BufferUnderflowException. Consider adding validation.

🔎 Proposed defensive check
 private Histogram decodeHistogram(BytesRef bytesRef) {
     ByteBuffer buffer = ByteBuffer.wrap(bytesRef.bytes, bytesRef.offset, bytesRef.length);
     int size = buffer.getInt();
+    int expectedBytes = 4 + (size * (Double.BYTES + Long.BYTES));
+    if (bytesRef.length < expectedBytes || size < 0) {
+        throw new IllegalStateException("Corrupted histogram data: invalid size or insufficient bytes");
+    }
     double[] values = new double[size];
     long[] counts = new long[size];

62-70: Returning null from getScriptValues() may cause NPEs in script contexts.

If a user attempts to access a histogram field from a Painless script, returning null here will likely cause a NullPointerException. Consider throwing UnsupportedOperationException with a clear message, or implementing a basic ScriptDocValues wrapper.

🔎 Proposed fix
 @Override
 public ScriptDocValues<?> getScriptValues() {
-    return null;
+    throw new UnsupportedOperationException("Histogram fields do not support script access");
 }

 @Override
 public SortedBinaryDocValues getBytesValues() {
-    return null;
+    throw new UnsupportedOperationException("Histogram fields do not support bytes values");
 }

72-75: ramBytesUsed() returning 0 is inaccurate.

This method is used for memory accounting in circuit breakers. Returning 0 means histogram field data won't be properly tracked, which could lead to memory issues under heavy load. Consider estimating memory based on the reader's field info or document count.

server/src/main/java/org/opensearch/index/fielddata/HistogramIndexFieldData.java (1)

52-55: newBucketedSort() returning null is inconsistent with sortField().

While sortField() throws UnsupportedOperationException, newBucketedSort() silently returns null. This inconsistency could lead to confusing NPEs for callers. Consider throwing the same exception for consistency.

🔎 Proposed fix
 @Override
 public BucketedSort newBucketedSort(BigArrays bigArrays, Object missingValue, MultiValueMode sortMode, XFieldComparatorSource.Nested nested, SortOrder sortOrder, DocValueFormat format, int bucketSize, BucketedSort.ExtraData extra) {
-    return null;
+    throw new UnsupportedOperationException("Histogram fields do not support sorting");
 }
server/src/main/java/org/opensearch/index/mapper/HistogramFieldMapper.java (3)

14-14: Unused import: BytesBinaryIndexFieldData.

This import is not used in the file and should be removed.

-import org.opensearch.index.fielddata.plain.BytesBinaryIndexFieldData;

33-39: FIELD_TYPE is defined but never used.

The FIELD_TYPE static field is configured but not referenced anywhere in the class. The parseCreateField method creates a BinaryDocValuesField directly without using this field type. Either remove FIELD_TYPE or use it in field creation.


42-44: The indexed parameter is misleading for histogram fields.

The indexed parameter is exposed in the builder, but HistogramFieldType.termQuery() throws UnsupportedOperationException. This could confuse users who set index: true expecting search capability. Consider either removing the parameter or documenting this limitation.

server/src/test/java/org/opensearch/index/mapper/HistogramFieldMapperTests.java (2)

30-30: Unused import: stringEncode from Geohash.

This static import is not used anywhere in the test file.

-import static org.opensearch.geometry.utils.Geohash.stringEncode;

57-63: Consider using assertThat with instanceOf matcher for type check.

The type check can be more idiomatically written using the existing matcher.

🔎 Suggested improvement
-assertTrue("Type parser should be an instance of HistogramFieldMapper.TypeParser",
-    parser instanceof HistogramFieldMapper.TypeParser);
+assertThat("Type parser should be an instance of HistogramFieldMapper.TypeParser",
+    parser, instanceOf(HistogramFieldMapper.TypeParser.class));
server/src/test/java/org/opensearch/index/query/HistogramFieldQueryTests.java (4)

29-29: Unused import: List.

The List import is not used in this file.

-import java.util.List;

53-55: Consider using @Before or unique index names for test isolation.

All tests use the same defaultIndexName = "test" and create the index at the start of each test. While OpenSearchSingleNodeTestCase typically handles cleanup, using unique index names (e.g., with test method name) or explicit cleanup would make tests more robust against parallel execution or cleanup failures.


366-399: Range aggregation test assertions could be more specific.

The test only verifies that bucket doc counts are >= 0, which would pass even if the aggregation returned empty buckets. Consider asserting that at least one bucket has a positive count, or verify specific bucket counts based on the histogram data.

🔎 Suggested stronger assertions
 Range range = response.getAggregations().get("range_agg");
 assertThat(range, notNullValue());
+assertThat("Should have 3 range buckets", range.getBuckets().size(), equalTo(3));
+long totalDocCount = 0;
 for (Range.Bucket bucket : range.getBuckets()) {
     assertThat(bucket.getDocCount(), greaterThanOrEqualTo(0L));
+    totalDocCount += bucket.getDocCount();
 }
+assertThat("At least one bucket should have docs", totalDocCount, greaterThan(0L));

401-432: Histogram aggregation test could verify bucket keys and counts more precisely.

Similar to the range test, this only checks that doc counts are non-negative. Given the known input data (values 0.1, 0.2, 0.3 with interval 0.1), you could verify expected bucket keys and counts.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 66d2482 and 2079fb1.

📒 Files selected for processing (39)
  • server/src/main/java/org/opensearch/index/fielddata/HistogramIndexFieldData.java
  • server/src/main/java/org/opensearch/index/fielddata/HistogramLeafFieldData.java
  • server/src/main/java/org/opensearch/index/fielddata/HistogramValues.java
  • server/src/main/java/org/opensearch/index/fielddata/HistogramValuesSource.java
  • server/src/main/java/org/opensearch/index/mapper/HistogramFieldMapper.java
  • server/src/main/java/org/opensearch/indices/IndicesModule.java
  • server/src/main/java/org/opensearch/indices/fielddata/Histogram.java
  • server/src/main/java/org/opensearch/indices/fielddata/HistogramValues.java
  • server/src/main/java/org/opensearch/search/aggregations/bucket/histogram/AbstractHistogramAggregator.java
  • server/src/main/java/org/opensearch/search/aggregations/bucket/histogram/HistogramAggregatorFactory.java
  • server/src/main/java/org/opensearch/search/aggregations/bucket/histogram/InternalHistogram.java
  • server/src/main/java/org/opensearch/search/aggregations/bucket/histogram/NumericHistogramAggregator.java
  • server/src/main/java/org/opensearch/search/aggregations/bucket/histogram/RangeHistogramAggregator.java
  • server/src/main/java/org/opensearch/search/aggregations/bucket/range/AbstractRangeAggregatorFactory.java
  • server/src/main/java/org/opensearch/search/aggregations/bucket/range/RangeAggregator.java
  • server/src/main/java/org/opensearch/search/aggregations/bucket/range/RangeAggregatorFactory.java
  • server/src/main/java/org/opensearch/search/aggregations/metrics/AbstractHDRPercentilesAggregator.java
  • server/src/main/java/org/opensearch/search/aggregations/metrics/AbstractInternalTDigestPercentiles.java
  • server/src/main/java/org/opensearch/search/aggregations/metrics/AbstractTDigestPercentilesAggregator.java
  • server/src/main/java/org/opensearch/search/aggregations/metrics/AvgAggregator.java
  • server/src/main/java/org/opensearch/search/aggregations/metrics/AvgAggregatorFactory.java
  • server/src/main/java/org/opensearch/search/aggregations/metrics/ExtendedStatsAggregator.java
  • server/src/main/java/org/opensearch/search/aggregations/metrics/ExtendedStatsAggregatorFactory.java
  • server/src/main/java/org/opensearch/search/aggregations/metrics/MaxAggregator.java
  • server/src/main/java/org/opensearch/search/aggregations/metrics/MaxAggregatorFactory.java
  • server/src/main/java/org/opensearch/search/aggregations/metrics/MinAggregator.java
  • server/src/main/java/org/opensearch/search/aggregations/metrics/MinAggregatorFactory.java
  • server/src/main/java/org/opensearch/search/aggregations/metrics/PercentileRanksAggregatorFactory.java
  • server/src/main/java/org/opensearch/search/aggregations/metrics/PercentilesAggregatorFactory.java
  • server/src/main/java/org/opensearch/search/aggregations/metrics/StatsAggregator.java
  • server/src/main/java/org/opensearch/search/aggregations/metrics/StatsAggregatorFactory.java
  • server/src/main/java/org/opensearch/search/aggregations/metrics/SumAggregator.java
  • server/src/main/java/org/opensearch/search/aggregations/metrics/SumAggregatorFactory.java
  • server/src/main/java/org/opensearch/search/aggregations/metrics/ValueCountAggregator.java
  • server/src/main/java/org/opensearch/search/aggregations/support/CoreValuesSourceType.java
  • server/src/main/java/org/opensearch/search/aggregations/support/HistogramValuesSourceType.java
  • server/src/main/java/org/opensearch/search/aggregations/support/ValuesSourceConfig.java
  • server/src/test/java/org/opensearch/index/mapper/HistogramFieldMapperTests.java
  • server/src/test/java/org/opensearch/index/query/HistogramFieldQueryTests.java
🧰 Additional context used
🧬 Code graph analysis (23)
server/src/main/java/org/opensearch/index/fielddata/HistogramValues.java (1)
server/src/main/java/org/opensearch/indices/fielddata/Histogram.java (1)
  • Histogram (11-31)
server/src/main/java/org/opensearch/search/aggregations/support/ValuesSourceConfig.java (1)
server/src/main/java/org/opensearch/index/mapper/HistogramFieldMapper.java (1)
  • HistogramFieldMapper (28-246)
server/src/main/java/org/opensearch/index/fielddata/HistogramLeafFieldData.java (1)
server/src/main/java/org/opensearch/indices/fielddata/Histogram.java (1)
  • Histogram (11-31)
server/src/main/java/org/opensearch/search/aggregations/support/CoreValuesSourceType.java (2)
server/src/main/java/org/opensearch/index/fielddata/HistogramIndexFieldData.java (1)
  • HistogramIndexFieldData (20-61)
server/src/main/java/org/opensearch/index/fielddata/HistogramValuesSource.java (1)
  • HistogramValuesSource (20-134)
server/src/main/java/org/opensearch/search/aggregations/bucket/range/RangeAggregator.java (6)
server/src/main/java/org/opensearch/index/fielddata/HistogramIndexFieldData.java (1)
  • HistogramIndexFieldData (20-61)
server/src/main/java/org/opensearch/index/fielddata/HistogramLeafFieldData.java (1)
  • HistogramLeafFieldData (20-76)
server/src/main/java/org/opensearch/index/fielddata/HistogramValues.java (1)
  • HistogramValues (44-58)
server/src/main/java/org/opensearch/index/fielddata/HistogramValuesSource.java (1)
  • HistogramValuesSource (20-134)
server/src/main/java/org/opensearch/indices/fielddata/Histogram.java (1)
  • Histogram (11-31)
server/src/main/java/org/opensearch/search/aggregations/LeafBucketCollector.java (1)
  • LeafBucketCollector (51-179)
server/src/main/java/org/opensearch/indices/IndicesModule.java (1)
server/src/main/java/org/opensearch/index/mapper/HistogramFieldMapper.java (1)
  • HistogramFieldMapper (28-246)
server/src/main/java/org/opensearch/search/aggregations/metrics/AvgAggregator.java (4)
server/src/main/java/org/opensearch/index/fielddata/HistogramIndexFieldData.java (1)
  • HistogramIndexFieldData (20-61)
server/src/main/java/org/opensearch/index/fielddata/HistogramLeafFieldData.java (1)
  • HistogramLeafFieldData (20-76)
server/src/main/java/org/opensearch/index/fielddata/HistogramValues.java (1)
  • HistogramValues (44-58)
server/src/main/java/org/opensearch/index/fielddata/HistogramValuesSource.java (1)
  • HistogramValuesSource (20-134)
server/src/main/java/org/opensearch/search/aggregations/metrics/MaxAggregator.java (5)
server/src/main/java/org/opensearch/index/fielddata/HistogramIndexFieldData.java (1)
  • HistogramIndexFieldData (20-61)
server/src/main/java/org/opensearch/index/fielddata/HistogramLeafFieldData.java (1)
  • HistogramLeafFieldData (20-76)
server/src/main/java/org/opensearch/index/fielddata/HistogramValues.java (1)
  • HistogramValues (44-58)
server/src/main/java/org/opensearch/index/fielddata/HistogramValuesSource.java (1)
  • HistogramValuesSource (20-134)
server/src/main/java/org/opensearch/indices/fielddata/Histogram.java (1)
  • Histogram (11-31)
server/src/main/java/org/opensearch/index/fielddata/HistogramValuesSource.java (1)
server/src/main/java/org/opensearch/indices/fielddata/Histogram.java (1)
  • Histogram (11-31)
server/src/main/java/org/opensearch/search/aggregations/metrics/AbstractInternalTDigestPercentiles.java (1)
server/src/main/java/org/opensearch/index/fielddata/HistogramValuesSource.java (1)
  • HistogramValuesSource (20-134)
server/src/main/java/org/opensearch/search/aggregations/bucket/histogram/RangeHistogramAggregator.java (2)
server/src/main/java/org/opensearch/index/fielddata/HistogramValuesSource.java (1)
  • HistogramValuesSource (20-134)
server/src/main/java/org/opensearch/index/mapper/RangeFieldMapper.java (1)
  • RangeFieldMapper (87-687)
server/src/main/java/org/opensearch/search/aggregations/bucket/histogram/NumericHistogramAggregator.java (1)
server/src/main/java/org/opensearch/index/fielddata/HistogramValuesSource.java (1)
  • HistogramValuesSource (20-134)
server/src/main/java/org/opensearch/search/aggregations/metrics/MinAggregator.java (5)
server/src/main/java/org/opensearch/index/fielddata/HistogramIndexFieldData.java (1)
  • HistogramIndexFieldData (20-61)
server/src/main/java/org/opensearch/index/fielddata/HistogramLeafFieldData.java (1)
  • HistogramLeafFieldData (20-76)
server/src/main/java/org/opensearch/index/fielddata/HistogramValues.java (1)
  • HistogramValues (44-58)
server/src/main/java/org/opensearch/index/fielddata/HistogramValuesSource.java (1)
  • HistogramValuesSource (20-134)
server/src/main/java/org/opensearch/indices/fielddata/Histogram.java (1)
  • Histogram (11-31)
server/src/test/java/org/opensearch/index/mapper/HistogramFieldMapperTests.java (1)
libs/x-content/src/main/java/org/opensearch/common/xcontent/json/JsonXContent.java (1)
  • JsonXContent (64-146)
server/src/main/java/org/opensearch/search/aggregations/bucket/histogram/AbstractHistogramAggregator.java (1)
server/src/main/java/org/opensearch/index/fielddata/HistogramValuesSource.java (1)
  • HistogramValuesSource (20-134)
server/src/main/java/org/opensearch/search/aggregations/metrics/AbstractHDRPercentilesAggregator.java (5)
server/src/main/java/org/opensearch/index/fielddata/HistogramIndexFieldData.java (1)
  • HistogramIndexFieldData (20-61)
server/src/main/java/org/opensearch/index/fielddata/HistogramLeafFieldData.java (1)
  • HistogramLeafFieldData (20-76)
server/src/main/java/org/opensearch/index/fielddata/HistogramValues.java (1)
  • HistogramValues (44-58)
server/src/main/java/org/opensearch/index/fielddata/HistogramValuesSource.java (1)
  • HistogramValuesSource (20-134)
server/src/main/java/org/opensearch/indices/fielddata/Histogram.java (1)
  • Histogram (11-31)
server/src/main/java/org/opensearch/search/aggregations/bucket/histogram/InternalHistogram.java (1)
server/src/main/java/org/opensearch/index/fielddata/HistogramValuesSource.java (1)
  • HistogramValuesSource (20-134)
server/src/main/java/org/opensearch/indices/fielddata/HistogramValues.java (1)
server/src/main/java/org/opensearch/index/fielddata/HistogramValues.java (1)
  • HistogramValues (44-58)
server/src/main/java/org/opensearch/search/aggregations/metrics/StatsAggregator.java (1)
server/src/main/java/org/opensearch/index/fielddata/HistogramValuesSource.java (1)
  • HistogramValuesSource (20-134)
server/src/main/java/org/opensearch/search/aggregations/bucket/range/AbstractRangeAggregatorFactory.java (1)
server/src/main/java/org/opensearch/index/fielddata/HistogramValuesSource.java (1)
  • HistogramValuesSource (20-134)
server/src/main/java/org/opensearch/search/aggregations/metrics/AbstractTDigestPercentilesAggregator.java (1)
server/src/main/java/org/opensearch/index/fielddata/HistogramValuesSource.java (1)
  • HistogramValuesSource (20-134)
server/src/test/java/org/opensearch/index/query/HistogramFieldQueryTests.java (3)
server/src/main/java/org/opensearch/indices/fielddata/Histogram.java (1)
  • Histogram (11-31)
server/src/main/java/org/opensearch/search/aggregations/bucket/range/RangeAggregator.java (1)
  • Range (110-265)
server/src/main/java/org/opensearch/index/query/QueryBuilders.java (1)
  • QueryBuilders (60-804)
server/src/main/java/org/opensearch/search/aggregations/metrics/ExtendedStatsAggregator.java (1)
server/src/main/java/org/opensearch/index/fielddata/HistogramValuesSource.java (1)
  • HistogramValuesSource (20-134)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (21)
  • GitHub Check: gradle-check
  • GitHub Check: precommit (21, ubuntu-24.04-arm)
  • GitHub Check: precommit (21, windows-2025, true)
  • GitHub Check: precommit (25, macos-15-intel)
  • GitHub Check: precommit (25, ubuntu-24.04-arm)
  • GitHub Check: precommit (25, macos-15)
  • GitHub Check: precommit (21, macos-15)
  • GitHub Check: precommit (25, windows-latest)
  • GitHub Check: precommit (25, ubuntu-latest)
  • GitHub Check: precommit (21, ubuntu-latest)
  • GitHub Check: precommit (21, windows-latest)
  • GitHub Check: precommit (21, macos-15-intel)
  • GitHub Check: detect-breaking-change
  • GitHub Check: assemble (21, windows-latest)
  • GitHub Check: assemble (25, windows-latest)
  • GitHub Check: assemble (25, ubuntu-24.04-arm)
  • GitHub Check: assemble (25, ubuntu-latest)
  • GitHub Check: assemble (21, ubuntu-latest)
  • GitHub Check: assemble (21, ubuntu-24.04-arm)
  • GitHub Check: Analyze (java)
  • GitHub Check: Mend Security Check
🔇 Additional comments (27)
server/src/main/java/org/opensearch/search/aggregations/metrics/AbstractTDigestPercentilesAggregator.java (1)

85-87: Formatter handling for histogram sources is consistent across both percentiles implementations.

The conditional logic correctly uses DocValueFormat.RAW for histogram value sources since histogram data is pre-aggregated. This pattern is consistently applied in both AbstractTDigestPercentilesAggregator (lines 85-87) and AbstractHDRPercentilesAggregator (lines 91-93), confirming the implementation is sound.

server/src/main/java/org/opensearch/index/fielddata/HistogramValues.java (1)

56-56: Verify all implementations and callers are updated for the new return type.

The return type changed from HistogramValue to Histogram, which is a breaking API change. Ensure that all implementations of this abstract method and all call sites have been updated to use the new Histogram type.

server/src/main/java/org/opensearch/indices/IndicesModule.java (1)

59-59: LGTM! Histogram field mapper registration follows the standard pattern.

The import and registration of HistogramFieldMapper correctly follows the established pattern for built-in field mappers. This enables the "histogram" field type to be used in index mappings alongside other standard field types.

Also applies to: 184-184

server/src/main/java/org/opensearch/search/aggregations/metrics/ExtendedStatsAggregatorFactory.java (1)

75-75: Histogram weighted statistics computation is correctly implemented.

The code properly handles histogram value sources for extended stats aggregation:

  • Histogram counts are correctly retrieved via histogramValuesSource.getCounts(ctx) and applied as weights when computing statistics (lines 160-164: value * count for weighted sum and value * value * count for weighted sum of squares)
  • Values and counts are iterated in parallel with proper advancement checks (lines 143-144)
  • The sigma parameter is correctly applied to variance-based calculations for standard deviation bounds (lines 262-265)
  • RAW format is explicitly set for histogram sources (lines 88-90)
server/src/main/java/org/opensearch/search/aggregations/metrics/SumAggregatorFactory.java (1)

76-76: Verify histogram support implementation in SumAggregator.

The registry change correctly adds HISTOGRAM as a supported value source type. Ensure that SumAggregator properly handles histogram-based value sources, including correct accumulation logic for histogram values and counts.

#!/bin/bash
# Verify SumAggregator handles HistogramValuesSource

ast-grep --pattern $'class SumAggregator {
  $$$
}'

rg -n "HistogramValuesSource" server/src/main/java/org/opensearch/search/aggregations/metrics/SumAggregator.java -A5 -B5
server/src/main/java/org/opensearch/search/aggregations/metrics/MaxAggregatorFactory.java (1)

60-60: Histogram support is properly implemented in MaxAggregator.

The registry change to add HISTOGRAM as a supported value source type is backed by comprehensive histogram-specific logic in MaxAggregator. The implementation correctly extracts histogram field data, retrieves histogram values, and finds the maximum by selecting the last value from the sorted histogram values array, properly updating the max aggregation for each bucket.

server/src/main/java/org/opensearch/search/aggregations/metrics/PercentileRanksAggregatorFactory.java (1)

64-64: Histogram support in percentile ranks is properly implemented.

The registry correctly includes HISTOGRAM as a supported value source type. Both TDigestPercentileRanksAggregator and HDRPercentileRanksAggregator inherit from their respective abstract base classes (AbstractTDigestPercentilesAggregator and AbstractHDRPercentilesAggregator), which explicitly handle histogram-based value sources with appropriate instanceof checks and dedicated histogram value extraction logic.

server/src/main/java/org/opensearch/search/aggregations/support/ValuesSourceConfig.java (1)

194-206: Early return is necessary for histogram script handling and is not redundant.

The HistogramIndexFieldData.getValuesSourceType() correctly returns CoreValuesSourceType.HISTOGRAM, so that part of the early return aligns with what the normal path would produce. However, the critical difference is the explicit scriptValueType = ValueType.DOUBLE setting (line 200). The normal field resolution path would instead use userValueTypeHint for scriptValueType (initialized at line 153), which is incorrect for histogram scripts. The existing inline comment already clarifies this requirement: "use DOUBLE for histogram". This early return is necessary to ensure histogram script aggregations have the correct value type.

Likely an incorrect or invalid review comment.

server/src/main/java/org/opensearch/search/aggregations/bucket/histogram/NumericHistogramAggregator.java (1)

38-40: LGTM! Histogram-aware format selection implemented correctly.

The conditional format selection appropriately uses DocValueFormat.RAW for histogram sources to preserve raw double values, aligning with the histogram support pattern across the codebase.

Also applies to: 91-93

server/src/main/java/org/opensearch/search/aggregations/metrics/AvgAggregatorFactory.java (1)

76-76: LGTM! Correctly extends AVG aggregation support to histogram sources.

The addition of CoreValuesSourceType.HISTOGRAM enables histogram data to participate in average aggregations, consistent with the broader histogram support implementation.

server/src/main/java/org/opensearch/search/aggregations/metrics/StatsAggregatorFactory.java (1)

71-71: LGTM! Correctly extends Stats aggregation support to histogram sources.

The addition enables histogram data to participate in statistics aggregations, aligning with the histogram support pattern across multiple aggregation types.

server/src/main/java/org/opensearch/search/aggregations/bucket/range/AbstractRangeAggregatorFactory.java (1)

35-37: LGTM! Import additions and registry expansion are correct.

The additions enable histogram sources to participate in range aggregations with proper type support.

Also applies to: 73-73

server/src/main/java/org/opensearch/search/aggregations/bucket/histogram/RangeHistogramAggregator.java (1)

91-93: The instanceof check for HistogramValuesSource is dead code due to registry-based type safety.

The concern about type compatibility is mitigated by the ValuesSourceRegistry pattern. RangeHistogramAggregator is registered only for CoreValuesSourceType.RANGE (HistogramAggregatorFactory.java:67), which ensures only ValuesSource.Range instances are passed to this aggregator. HistogramValuesSource is produced exclusively by a separate CoreValuesSourceType.HISTOGRAM variant and will never reach this aggregator through normal execution.

However, the isinstance check on line 91 is misleading dead code that should either be removed or replaced with an assertion. The check suggests defensive programming against a scenario the registry prevents, which reduces code clarity.

Likely an incorrect or invalid review comment.

server/src/main/java/org/opensearch/search/aggregations/support/CoreValuesSourceType.java (1)

163-189: LGTM - HISTOGRAM values source type implementation follows established patterns.

The implementation correctly:

  • Throws appropriate exceptions for unsupported operations (scripts, missing values)
  • Validates the field data type before casting
  • Returns HistogramValuesSource for field-based access

One consideration: you may want to override getFormatter() to return DocValueFormat.RAW explicitly for consistency with how histogram values are formatted elsewhere in this PR (e.g., ExtendedStatsAggregator and SumAggregator force RAW format for histogram sources).

server/src/main/java/org/opensearch/search/aggregations/bucket/range/RangeAggregator.java (1)

394-417: LGTM - Non-histogram path correctly preserved.

The original collection logic for numeric values is correctly maintained in the else branch.

server/src/main/java/org/opensearch/search/aggregations/metrics/ValueCountAggregator.java (1)

119-142: LGTM - Histogram value count implementation is correct.

The implementation properly:

  • Loads histogram field data and values
  • Sums all histogram bin counts to get the total count
  • Increments the bucket by the total count

This correctly interprets value_count for histogram fields as the total number of observations across all bins.

server/src/main/java/org/opensearch/search/aggregations/metrics/AbstractHDRPercentilesAggregator.java (1)

111-149: LGTM!

The histogram-aware handling is correctly structured with proper branching before returning collectors. The histogram path appropriately uses recordValueWithCount to record each value with its associated count from the histogram buckets.

server/src/main/java/org/opensearch/index/fielddata/HistogramValuesSource.java (1)

46-80: LGTM - doubleValues implementation is correct.

The SortedNumericDoubleValues implementation properly manages iteration state with advanceExact, nextValue, and docValueCount. The bounds checking in nextValue() prevents invalid access.

server/src/main/java/org/opensearch/search/aggregations/metrics/MinAggregator.java (1)

183-188: The assumption about sorted histogram values is guaranteed and does not require changes.

Histogram values are validated to be in strictly ascending order at index time in HistogramFieldMapper.validateHistogramData(), which throws IllegalArgumentException if values violate this ordering. The code comment in MinAggregator accurately reflects this guarantee, making values[0] correct as the minimum value.

Likely an incorrect or invalid review comment.

server/src/main/java/org/opensearch/index/fielddata/HistogramIndexFieldData.java (2)

20-61: LGTM on overall structure.

The class correctly implements IndexFieldData for histogram fields with appropriate value source type registration and load semantics.


47-50: Remove redundant wideSortField override.

The default wideSortField implementation delegates to sortField(), which HistogramIndexFieldData already overrides to throw UnsupportedOperationException. The wideSortField override is unnecessary—removing it produces identical behavior since the exception is thrown through the sortField() method regardless.

server/src/main/java/org/opensearch/index/mapper/HistogramFieldMapper.java (2)

158-178: LGTM on validation logic.

The validateHistogramData method correctly validates that:

  • Arrays have the same length
  • Values are in strictly ascending order
  • Counts are non-negative

181-191: LGTM on encoding logic.

The encoding format [int size][size × (double value, long count)] is correct and matches the decoding logic in HistogramLeafFieldData.decodeHistogram().

server/src/test/java/org/opensearch/index/mapper/HistogramFieldMapperTests.java (2)

65-93: LGTM on histogram value encoding test.

This test thoroughly validates the binary encoding format by parsing the encoded bytes and verifying each value/count pair.


95-163: LGTM on empty/missing array validation tests.

Good coverage of edge cases for empty and missing arrays with proper exception message validation.

server/src/test/java/org/opensearch/index/query/HistogramFieldQueryTests.java (2)

67-102: LGTM on exists query test.

Good test verifying that exists query works correctly with histogram fields, properly distinguishing documents with and without the histogram field.


156-181: LGTM on sum aggregation test with clear expected value calculation.

The comment clearly documents the expected calculation: (0.1 * 3) + (0.2 * 7) + (0.3 * 23) = 8.6. This is the correct weighted sum for histogram aggregations.

Comment on lines 53 to 55
* Get the {@link HistogramValue} associated with the current document.
* The returned {@link HistogramValue} might be reused across calls.
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Update javadoc to reflect the new return type.

The javadoc still references HistogramValue, but the method now returns Histogram. Update the comment to match the actual return type.

🔎 Proposed fix
     /**
-     * Get the {@link HistogramValue} associated with the current document.
-     * The returned {@link HistogramValue} might be reused across calls.
+     * Get the {@link Histogram} associated with the current document.
+     * The returned {@link Histogram} might be reused across calls.
      */
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
* Get the {@link HistogramValue} associated with the current document.
* The returned {@link HistogramValue} might be reused across calls.
*/
* Get the {@link Histogram} associated with the current document.
* The returned {@link Histogram} might be reused across calls.
*/
🤖 Prompt for AI Agents
In @server/src/main/java/org/opensearch/index/fielddata/HistogramValues.java
around lines 53-55, The Javadoc for the method in class HistogramValues still
refers to HistogramValue but the method now returns Histogram; update the
documentation to reference the correct return type (Histogram) and any related
wording (e.g., "Get the Histogram associated with the current document" and note
reuse of the returned Histogram) so the comment matches the actual return type
and behavior of the method.

@@ -0,0 +1,246 @@
package org.opensearch.index.mapper;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Missing SPDX license header.

This file is missing the standard Apache 2.0 license header that's present in other files in this PR. Add the license header for compliance.

🔎 Add license header
+/*
+ * SPDX-License-Identifier: Apache-2.0
+ *
+ * The OpenSearch Contributors require contributions made to
+ * this file be licensed under the Apache-2.0 license or a
+ * compatible open source license.
+ */
+
 package org.opensearch.index.mapper;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
package org.opensearch.index.mapper;
/*
* SPDX-License-Identifier: Apache-2.0
*
* The OpenSearch Contributors require contributions made to
* this file be licensed under the Apache-2.0 license or a
* compatible open source license.
*/
package org.opensearch.index.mapper;
🤖 Prompt for AI Agents
In @server/src/main/java/org/opensearch/index/mapper/HistogramFieldMapper.java
around line 1, Add the standard Apache‑2.0 SPDX license header to this source;
update the top of HistogramFieldMapper.java (package
org.opensearch.index.mapper; / class HistogramFieldMapper) to include the same
SPDX/Apache‑2.0 license header block used across the project (copy from other
mapper files) so the file complies with licensing requirements.

Comment on lines +119 to +126
if (VALUES_FIELD.equals(currentFieldName)) {
while ((token = parser.nextToken()) != XContentParser.Token.END_ARRAY) {
if (token == XContentParser.Token.VALUE_NUMBER) {
values.add(parser.doubleValue());
} else {
throw new IllegalArgumentException("values must be numbers");
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

No validation for NaN or Infinity in values array.

The parser accepts Double.NaN and Double.POSITIVE_INFINITY/NEGATIVE_INFINITY as valid double values. While the test file expects these to fail with "values must be numbers", the current implementation doesn't explicitly reject them. Add explicit validation.

🔎 Proposed fix
 if (VALUES_FIELD.equals(currentFieldName)) {
     while ((token = parser.nextToken()) != XContentParser.Token.END_ARRAY) {
         if (token == XContentParser.Token.VALUE_NUMBER) {
-            values.add(parser.doubleValue());
+            double val = parser.doubleValue();
+            if (Double.isNaN(val) || Double.isInfinite(val)) {
+                throw new IllegalArgumentException("values must be finite numbers, got: " + val);
+            }
+            values.add(val);
         } else {
             throw new IllegalArgumentException("values must be numbers");
         }
     }
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (VALUES_FIELD.equals(currentFieldName)) {
while ((token = parser.nextToken()) != XContentParser.Token.END_ARRAY) {
if (token == XContentParser.Token.VALUE_NUMBER) {
values.add(parser.doubleValue());
} else {
throw new IllegalArgumentException("values must be numbers");
}
}
if (VALUES_FIELD.equals(currentFieldName)) {
while ((token = parser.nextToken()) != XContentParser.Token.END_ARRAY) {
if (token == XContentParser.Token.VALUE_NUMBER) {
double val = parser.doubleValue();
if (Double.isNaN(val) || Double.isInfinite(val)) {
throw new IllegalArgumentException("values must be finite numbers, got: " + val);
}
values.add(val);
} else {
throw new IllegalArgumentException("values must be numbers");
}
}
🤖 Prompt for AI Agents
In @server/src/main/java/org/opensearch/index/mapper/HistogramFieldMapper.java
around lines 119-126, The values array parsing accepts NaN and Infinity; update
the block that handles VALUES_FIELD in HistogramFieldMapper (the loop using
parser.nextToken(), token, and values.add(parser.doubleValue())) to explicitly
validate each parsed double (use Double.isFinite(value) or check
!Double.isNaN(value) && !Double.isInfinite(value)) and throw the same
IllegalArgumentException("values must be numbers") when a non-finite value is
encountered so NaN/Infinity are rejected.

Comment on lines +13 to +16
public interface HistogramValues {
boolean advanceExact(int doc) throws IOException;
Histogram histogram() throws IOException;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find both HistogramValues files
echo "=== Searching for HistogramValues files ==="
fd "HistogramValues.java" -type f

echo ""
echo "=== Checking org.opensearch.indices.fielddata.HistogramValues ==="
find . -path "*/opensearch/indices/fielddata/HistogramValues.java" -type f -exec cat -n {} \;

echo ""
echo "=== Checking org.opensearch.index.fielddata.HistogramValues ==="
find . -path "*/opensearch/index/fielddata/HistogramValues.java" -type f -exec cat -n {} \;

Repository: opensearch-project/OpenSearch

Length of output: 3126


🏁 Script executed:

#!/bin/bash
# Search for imports and usage of both types
echo "=== Imports of org.opensearch.indices.fielddata.HistogramValues ==="
rg "import org\.opensearch\.indices\.fielddata\.HistogramValues" --type java -A 2

echo ""
echo "=== Imports of org.opensearch.index.fielddata.HistogramValues ==="
rg "import org\.opensearch\.index\.fielddata\.HistogramValues" --type java -A 2

echo ""
echo "=== References to indices.fielddata.HistogramValues (without import) ==="
rg "indices\.fielddata\.HistogramValues" --type java -B 2 -A 2

echo ""
echo "=== References to index.fielddata.HistogramValues (without import) ==="
rg "index\.fielddata\.HistogramValues" --type java -B 2 -A 2

Repository: opensearch-project/OpenSearch

Length of output: 24162


Remove unused interface org.opensearch.indices.fielddata.HistogramValues.

A duplicate interface exists in org.opensearch.indices.fielddata with identical method signatures to the established abstract class in org.opensearch.index.fielddata.HistogramValues. The new interface is unused throughout the codebase while the abstract class is actively imported and used in 9+ aggregator implementations. This creates unnecessary confusion with identically named types in different packages and should be removed unless intended for plugin/SPI use.

🤖 Prompt for AI Agents
In @server/src/main/java/org/opensearch/indices/fielddata/HistogramValues.java
around lines 13-16, Delete the unused interface
org.opensearch.indices.fielddata.HistogramValues (remove the file and its
declaration), and update any imports/usages to reference the established
abstract class org.opensearch.index.fielddata.HistogramValues instead; ensure no
code still imports org.opensearch.indices.fielddata.HistogramValues and run a
compile to verify all aggregator classes that rely on HistogramValues are using
the abstract class symbol.

builder.register(
HistogramAggregationBuilder.REGISTRY_KEY,
List.of(CoreValuesSourceType.NUMERIC, CoreValuesSourceType.DATE, CoreValuesSourceType.BOOLEAN),
List.of(CoreValuesSourceType.NUMERIC, CoreValuesSourceType.DATE, CoreValuesSourceType.BOOLEAN, CoreValuesSourceType.HISTOGRAM),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

find . -name "NumericHistogramAggregator.java" -type f

Repository: opensearch-project/OpenSearch

Length of output: 179


🏁 Script executed:

find . -name "HistogramValuesSource.java" -type f

Repository: opensearch-project/OpenSearch

Length of output: 255


🏁 Script executed:

find . -type f -name "*HistogramAggregator*.java" | head -20

Repository: opensearch-project/OpenSearch

Length of output: 2298


🏁 Script executed:

wc -l ./server/src/main/java/org/opensearch/search/aggregations/bucket/histogram/NumericHistogramAggregator.java

Repository: opensearch-project/OpenSearch

Length of output: 183


🏁 Script executed:

cat -n ./server/src/main/java/org/opensearch/search/aggregations/bucket/histogram/NumericHistogramAggregator.java

Repository: opensearch-project/OpenSearch

Length of output: 7079


🏁 Script executed:

wc -l ./server/src/main/java/org/opensearch/index/fielddata/HistogramValuesSource.java

Repository: opensearch-project/OpenSearch

Length of output: 157


🏁 Script executed:

cat -n ./server/src/main/java/org/opensearch/index/fielddata/HistogramValuesSource.java

Repository: opensearch-project/OpenSearch

Length of output: 5706


🏁 Script executed:

rg -n "getCounts" --type=java server/src/main/java/org/opensearch/search/aggregations/

Repository: opensearch-project/OpenSearch

Length of output: 1490


🏁 Script executed:

wc -l ./server/src/test/java/org/opensearch/search/aggregations/bucket/histogram/NumericHistogramAggregatorTests.java

Repository: opensearch-project/OpenSearch

Length of output: 188


🏁 Script executed:

sed -n '1,100p' ./server/src/test/java/org/opensearch/search/aggregations/bucket/histogram/NumericHistogramAggregatorTests.java

Repository: opensearch-project/OpenSearch

Length of output: 4763


🏁 Script executed:

sed -n '360,380p' server/src/main/java/org/opensearch/search/aggregations/bucket/range/RangeAggregator.java

Repository: opensearch-project/OpenSearch

Length of output: 1188


🏁 Script executed:

cat -n server/src/main/java/org/opensearch/search/aggregations/bucket/range/RangeAggregator.java | sed -n '350,390p'

Repository: opensearch-project/OpenSearch

Length of output: 2385


🏁 Script executed:

grep -A 30 "histogram field" ./server/src/test/java/org/opensearch/search/aggregations/bucket/histogram/NumericHistogramAggregatorTests.java

Repository: opensearch-project/OpenSearch

Length of output: 55


NumericHistogramAggregator must handle histogram value counts to produce correct aggregation results.

The implementation correctly applies DocValueFormat.RAW and calculates bucket keys with interval/offset (line 127: Math.floor((value - offset) / interval)). However, it critically overlooks histogram-specific handling: it only extracts histogram values via doubleValues() but ignores the associated counts.

HistogramValuesSource provides a getCounts() method (similar to metrics aggregators like StatsAggregator), but NumericHistogramAggregator doesn't use it. Comparison with RangeAggregator (lines 359-390) shows the correct pattern: check instanceof HistogramValuesSource, extract both histogram.getValues() and histogram.getCounts(), and iterate through counts when aggregating. Without this, each histogram value is implicitly treated as having count=1, producing incorrect bucket aggregations.

🤖 Prompt for AI Agents
In
@server/src/main/java/org/opensearch/search/aggregations/bucket/histogram/HistogramAggregatorFactory.java
around line 71, NumericHistogramAggregator currently reads histogram values via
doubleValues() but ignores per-value counts, causing each histogram entry to be
treated as count=1; update the aggregation logic in NumericHistogramAggregator
(the code that handles values iteration and key computation using
Math.floor((value - offset) / interval) and DocValueFormat.RAW) to detect when
the values source is a HistogramValuesSource, call histogram.getCounts()
alongside histogram.getValues(), and multiply/add bucket doc counts by the
corresponding count for each value (iterate counts in sync with values),
mirroring the pattern used in RangeAggregator where instanceof
HistogramValuesSource triggers extraction of both getValues() and getCounts()
and iterates counts when updating buckets.

builder.register(
MinAggregationBuilder.REGISTRY_KEY,
List.of(CoreValuesSourceType.NUMERIC, CoreValuesSourceType.DATE, CoreValuesSourceType.BOOLEAN),
List.of(CoreValuesSourceType.NUMERIC, CoreValuesSourceType.DATE, CoreValuesSourceType.BOOLEAN, CoreValuesSourceType.HISTOGRAM),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

find . -type f -name "MinAggregatorFactory.java" | head -5

Repository: opensearch-project/OpenSearch

Length of output: 164


🏁 Script executed:

cat -n ./server/src/main/java/org/opensearch/search/aggregations/metrics/MinAggregatorFactory.java

Repository: opensearch-project/OpenSearch

Length of output: 4375


🏁 Script executed:

find . -type f -name "MinAggregator.java" | head -5

Repository: opensearch-project/OpenSearch

Length of output: 157


🏁 Script executed:

cat -n ./server/src/main/java/org/opensearch/search/aggregations/metrics/MinAggregator.java

Repository: opensearch-project/OpenSearch

Length of output: 18224


🏁 Script executed:

find . -type f -name "Histogram.java" -path "*/indices/fielddata/*" | head -3

Repository: opensearch-project/OpenSearch

Length of output: 143


🏁 Script executed:

cat -n ./server/src/main/java/org/opensearch/indices/fielddata/Histogram.java

Repository: opensearch-project/OpenSearch

Length of output: 944


🏁 Script executed:

find . -type f -name "SumAggregator.java" -path "*/aggregations/metrics/*" | head -3

Repository: opensearch-project/OpenSearch

Length of output: 157


🏁 Script executed:

cat -n ./server/src/main/java/org/opensearch/search/aggregations/metrics/SumAggregator.java | head -250

Repository: opensearch-project/OpenSearch

Length of output: 12776


🏁 Script executed:

cat -n ./server/src/main/java/org/opensearch/search/aggregations/metrics/SumAggregator.java | sed -n '183,216p'

Repository: opensearch-project/OpenSearch

Length of output: 2127


🏁 Script executed:

find . -type f -name "*MinAggregator*Test*.java" -o -name "*Test*MinAggregator*.java" | head -5

Repository: opensearch-project/OpenSearch

Length of output: 162


🏁 Script executed:

rg -n "histogram|Histogram" ./server/src/test/java/org/opensearch/search/aggregations/metrics/MinAggregatorTests.java -i

Repository: opensearch-project/OpenSearch

Length of output: 474


🏁 Script executed:

rg -n "CoreValuesSourceType.HISTOGRAM|HistogramValuesSource" ./server/src/test/java/org/opensearch/search/aggregations/metrics/MinAggregatorTests.java -i

Repository: opensearch-project/OpenSearch

Length of output: 55


🏁 Script executed:

find . -type f -name "*AggregatorTests.java" -path "*/metrics/*" -exec grep -l "HistogramValuesSource\|CoreValuesSourceType.HISTOGRAM" {} \;

Repository: opensearch-project/OpenSearch

Length of output: 55


🏁 Script executed:

git log --oneline -20 ./server/src/main/java/org/opensearch/search/aggregations/metrics/MinAggregator.java

Repository: opensearch-project/OpenSearch

Length of output: 110


🏁 Script executed:

git show 2079fb11 --stat | head -30

Repository: opensearch-project/OpenSearch

Length of output: 1667


🏁 Script executed:

git show 2079fb11 -- ./server/src/main/java/org/opensearch/search/aggregations/metrics/MinAggregator.java | head -200

Repository: opensearch-project/OpenSearch

Length of output: 8847


Histogram weight/count handling missing in MinAggregator histogram collection.

The histogram support implementation in MinAggregator is incomplete. While RAW format is correctly applied and histogram values are accessed, the implementation ignores histogram counts entirely.

The collector (lines 180-189) only uses values[0] without checking histogram counts:

double[] values = histogram.getValues();
if (values.length > 0) {
    double value = values[0];
    // ...
}

This differs from SumAggregator's histogram handling (lines 206-207), which correctly applies counts:

for (int i = 0; i < histValues.length; i++) {
    kahanSummation.add(histValues[i] * counts[i]);
}

For min aggregation to be semantically correct with histogram data, values with zero counts should not contribute to the minimum calculation. Add logic to iterate through all histogram values and their corresponding counts to find the minimum value among entries with non-zero counts, similar to how SumAggregator processes histogram weight information.

🤖 Prompt for AI Agents
In
@server/src/main/java/org/opensearch/search/aggregations/metrics/MinAggregatorFactory.java
around line 60, MinAggregatorFactory's histogram collection uses only
histogram.getValues()[0] and ignores counts, so update the histogram handling in
the collector to iterate over histogram.getValues() and histogram.getCounts(),
skipping entries where counts[i] == 0 and computing the minimum across only
entries with non-zero counts; use histogram.getValues() and
histogram.getCounts(), compare each value to the current min and set when lower
(ensure to handle empty/zero-count-only histograms appropriately, e.g., leave as
NaN/unset as current code expects).

Comment on lines +150 to +152
for (long j = 0; j < count; j++) {
kahanSummation.add(value);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Performance: Avoid repeated additions for histogram counts.

The inner loop for (long j = 0; j < count; j++) adds the same value repeatedly, which is O(count) per bucket. For histograms with large counts (e.g., millions), this will be extremely slow.

Use weighted addition value * count instead.

🔎 Proposed fix
                     // Process both values and counts
                     for (int i = 0; i < valuesCount; i++) {
                         double value = values.nextValue();
                         long count = countArray[i];
                         totalDocCount += count;
-                        for (long j = 0; j < count; j++) {
-                            kahanSummation.add(value);
-                        }
+                        kahanSummation.add(value * count);
                         min = Math.min(min, value);
                         max = Math.max(max, value);
                     }

Note: This changes the numerical precision characteristics slightly since Kahan summation compensates per-addition. If exact parity with repeated additions is required, consider implementing a weighted Kahan add method.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for (long j = 0; j < count; j++) {
kahanSummation.add(value);
}
kahanSummation.add(value * count);
🤖 Prompt for AI Agents
In
@server/src/main/java/org/opensearch/search/aggregations/metrics/StatsAggregator.java
around lines 150-152, The histogram handling in StatsAggregator currently does
O(count) work by repeating kahanSummation.add(value) inside the loop; change
this to a single weighted addition (e.g., kahanSummation.add(value * count)) to
avoid the per-count loop and drastically improve performance for large counts,
or alternatively implement and call a weighted Kahan add method if you want to
preserve per-addition Kahan compensation semantics; update the code that uses
the long count and value inside the for-loop to perform one multiply-and-add via
kahanSummation instead of iterating.

Comment on lines +182 to +240

// Check if we're dealing with histogram values
if (valuesSource instanceof HistogramValuesSource) {
HistogramIndexFieldData indexFieldData = ((HistogramValuesSource) valuesSource).getHistogramFieldData();
HistogramLeafFieldData leafFieldData = indexFieldData.load(ctx);
HistogramValues histogramValues = leafFieldData.getHistogramValues();

return new LeafBucketCollectorBase(sub, values) {
@Override
public void collect(int doc, long bucket) throws IOException {
sums = bigArrays.grow(sums, bucket + 1);
compensations = bigArrays.grow(compensations, bucket + 1);

if (histogramValues.advanceExact(doc)) {
Histogram histogram = histogramValues.histogram();
double[] histValues = histogram.getValues();
long[] counts = histogram.getCounts();

// Initialize Kahan summation with current values
double sum = sums.get(bucket);
double compensation = compensations.get(bucket);
kahanSummation.reset(sum, compensation);

// Calculate sum directly from histogram values and counts
for (int i = 0; i < histValues.length; i++) {
kahanSummation.add(histValues[i] * counts[i]);
}

// We don't need to process the values from doubleValues() as they're already counted
compensations.set(bucket, kahanSummation.delta());
sums.set(bucket, kahanSummation.value());
}
}
};
} else {
// Original logic for non-histogram fields
return new LeafBucketCollectorBase(sub, values) {
@Override
public void collect(int doc, long bucket) throws IOException {
sums = bigArrays.grow(sums, bucket + 1);
compensations = bigArrays.grow(compensations, bucket + 1);

if (values.advanceExact(doc)) {
final int valuesCount = values.docValueCount();
double sum = sums.get(bucket);
double compensation = compensations.get(bucket);
kahanSummation.reset(sum, compensation);

for (int i = 0; i < valuesCount; i++) {
double value = values.nextValue();
kahanSummation.add(value);
}

compensations.set(bucket, kahanSummation.delta());
sums.set(bucket, kahanSummation.value());
}
}
};
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: Unreachable code after return statement.

The code at lines 183-240 is unreachable because the method returns at line 181. The histogram-specific collection logic and the else branch for non-histogram fields will never execute.

The return statement at line 181 completes the LeafBucketCollectorBase block and returns it, making all subsequent code dead.

🔎 Proposed fix - restructure to proper if-else branching
     @Override
     public LeafBucketCollector getLeafCollector(LeafReaderContext ctx, final LeafBucketCollector sub) throws IOException {
         if (valuesSource == null) {
             return LeafBucketCollector.NO_OP_COLLECTOR;
         }

         final BigArrays bigArrays = context.bigArrays();
-        final SortedNumericDoubleValues values = valuesSource.doubleValues(ctx);
         final CompensatedSum kahanSummation = new CompensatedSum(0, 0);
-        return new LeafBucketCollectorBase(sub, values) {
-            @Override
-            public void collect(int doc, long bucket) throws IOException {
-                // ... existing logic ...
-            }
-            // ... other methods ...
-        };
-
-        // Check if we're dealing with histogram values
+
         if (valuesSource instanceof HistogramValuesSource) {
-            // histogram logic (currently unreachable)
+            HistogramIndexFieldData indexFieldData = ((HistogramValuesSource) valuesSource).getHistogramFieldData();
+            HistogramLeafFieldData leafFieldData = indexFieldData.load(ctx);
+            HistogramValues histogramValues = leafFieldData.getHistogramValues();
+
+            return new LeafBucketCollectorBase(sub, null) {
+                // ... histogram collection logic ...
+            };
         } else {
-            // non-histogram logic (currently unreachable)
+            final SortedNumericDoubleValues values = valuesSource.doubleValues(ctx);
+            return new LeafBucketCollectorBase(sub, values) {
+                // ... existing collection logic with stream/range support ...
+            };
         }
     }

Note: The histogram path currently lacks collect(DocIdStream, long) and collectRange(int, int) implementations that exist in the original collector.

Committable suggestion skipped: line range outside the PR's diff.

Comment on lines +33 to +36
@Override
public ValuesSource getField(FieldContext fieldContext, AggregationScript.LeafFactory script) {
return null;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: getField() returns null - histogram field resolution will fail.

This stub implementation returns null for getField(), which means histogram fields cannot be properly resolved through the standard ValuesSourceConfig infrastructure. Aggregations configured with a histogram field will likely fail with a NullPointerException or silently produce no results.

This method needs to return a proper ValuesSource (i.e., HistogramValuesSource) for the given field context.

🔎 Proposed implementation sketch
 @Override
 public ValuesSource getField(FieldContext fieldContext, AggregationScript.LeafFactory script) {
-    return null;
+    if (fieldContext.indexFieldData() instanceof HistogramIndexFieldData) {
+        return new HistogramValuesSource((HistogramIndexFieldData) fieldContext.indexFieldData());
+    }
+    throw new IllegalArgumentException("Expected histogram field data but got: " + fieldContext.indexFieldData().getClass());
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@Override
public ValuesSource getField(FieldContext fieldContext, AggregationScript.LeafFactory script) {
return null;
}
@Override
public ValuesSource getField(FieldContext fieldContext, AggregationScript.LeafFactory script) {
if (fieldContext.indexFieldData() instanceof HistogramIndexFieldData) {
return new HistogramValuesSource((HistogramIndexFieldData) fieldContext.indexFieldData());
}
throw new IllegalArgumentException("Expected histogram field data but got: " + fieldContext.indexFieldData().getClass());
}
🤖 Prompt for AI Agents
In
@server/src/main/java/org/opensearch/search/aggregations/support/HistogramValuesSourceType.java
around lines 33-36, The getField method currently returns null; implement it to
resolve and return a HistogramValuesSource using the provided FieldContext and
AggregationScript.LeafFactory so histogram aggregations can obtain a proper
ValuesSource. Inside getField(FieldContext fieldContext,
AggregationScript.LeafFactory script) construct and return a
HistogramValuesSource (or use the existing factory/utility that wraps a
FieldContext and script into a ValuesSource) ensuring you handle missing/mapped
fields similarly to other ValuesSourceType implementations and propagate any
exceptions as before.

Comment on lines +319 to +345
// Test infinite values
e = expectThrows(MapperParsingException.class, () -> {
mapper.parse(source(b -> b.startObject("field")
.startArray("values")
.value(Double.POSITIVE_INFINITY)
.value(0.2)
.value(0.3)
.endArray()
.array("counts", 1, 2, 3)
.endObject()));
});
assertThat(e.getMessage(), containsString("failed to parse field [field] of type [histogram]"));
assertThat(e.getCause().getMessage(), containsString("values must be numbers"));

// Test NaN values
e = expectThrows(MapperParsingException.class, () -> {
mapper.parse(source(b -> b.startObject("field")
.startArray("values")
.value(Double.NaN)
.value(0.2)
.value(0.3)
.endArray()
.array("counts", 1, 2, 3)
.endObject()));
});
assertThat(e.getMessage(), containsString("failed to parse field [field] of type [histogram]"));
assertThat(e.getCause().getMessage(), containsString("values must be numbers"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

find . -name "HistogramFieldMapperTests.java" -type f

Repository: opensearch-project/OpenSearch

Length of output: 154


🏁 Script executed:

sed -n '319,345p' ./server/src/test/java/org/opensearch/index/mapper/HistogramFieldMapperTests.java

Repository: opensearch-project/OpenSearch

Length of output: 1266


🏁 Script executed:

find . -name "HistogramFieldMapper.java" -type f | grep -v Test

Repository: opensearch-project/OpenSearch

Length of output: 149


🏁 Script executed:

rg -n "parseCreateField" --type java | grep -i histogram

Repository: opensearch-project/OpenSearch

Length of output: 229


🏁 Script executed:

rg -n "values must be numbers" --type java

Repository: opensearch-project/OpenSearch

Length of output: 760


🏁 Script executed:

rg -l "HistogramFieldMapper" --type java | grep -v Test

Repository: opensearch-project/OpenSearch

Length of output: 298


🏁 Script executed:

rg -l "histogram" --type java -i | head -20

Repository: opensearch-project/OpenSearch

Length of output: 1909


🏁 Script executed:

ls -la server/src/main/java/org/opensearch/index/mapper/ | grep -i histogram

Repository: opensearch-project/OpenSearch

Length of output: 149


🏁 Script executed:

ls server/src/main/java/org/opensearch/index/mapper/ | head -30

Repository: opensearch-project/OpenSearch

Length of output: 908


🏁 Script executed:

rg "class.*Histogram" --type java | head -20

Repository: opensearch-project/OpenSearch

Length of output: 3879


🏁 Script executed:

head -50 ./server/src/test/java/org/opensearch/index/mapper/HistogramFieldMapperTests.java

Repository: opensearch-project/OpenSearch

Length of output: 1948


🏁 Script executed:

sed -n '99,160p' ./server/src/main/java/org/opensearch/index/mapper/HistogramFieldMapper.java

Repository: opensearch-project/OpenSearch

Length of output: 2845


🏁 Script executed:

wc -l ./server/src/main/java/org/opensearch/index/mapper/HistogramFieldMapper.java

Repository: opensearch-project/OpenSearch

Length of output: 153


🏁 Script executed:

sed -n '99,200p' ./server/src/main/java/org/opensearch/index/mapper/HistogramFieldMapper.java

Repository: opensearch-project/OpenSearch

Length of output: 4016


🏁 Script executed:

rg -n "isFinite|isInfinite|isNaN" ./server/src/main/java/org/opensearch/index/mapper/HistogramFieldMapper.java

Repository: opensearch-project/OpenSearch

Length of output: 55


🏁 Script executed:

cat ./server/src/main/java/org/opensearch/index/mapper/HistogramFieldMapper.java

Repository: opensearch-project/OpenSearch

Length of output: 9573


Tests for Infinity and NaN will fail—mapper does not validate these special double values.

The test assertions expect Infinity and NaN to be rejected with message "values must be numbers", but the parseCreateField() method accepts them silently. The error "values must be numbers" is only thrown when a token is not a VALUE_NUMBER, and XContentParser correctly identifies Infinity and NaN as valid numeric values. These will be parsed via parser.doubleValue() and added to the values list. The validateHistogramData() method has no checks for Double.isFinite() or Double.isNaN(). Add validation to reject these special values.

🤖 Prompt for AI Agents
In
@server/src/test/java/org/opensearch/index/mapper/HistogramFieldMapperTests.java
around lines 319-345, The parser currently accepts Double.POSITIVE_INFINITY and
Double.NaN because XContentParser treats them as numeric tokens and
parseCreateField() calls parser.doubleValue() without rejecting special values;
update validation so validateHistogramData() (or immediately after values are
parsed in parseCreateField()) checks each value with Double.isFinite(value) (or
equivalently !Double.isInfinite && !Double.isNaN) and throws a
MapperParsingException with a message containing "values must be numbers" when
any value is not finite; ensure the thrown exception becomes the cause so
existing tests asserting containsString("values must be numbers") continue to
pass.

@github-actions
Copy link
Contributor

github-actions bot commented Jan 5, 2026

❌ Gradle check result for 2079fb1: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant