-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Draft: Add support for histogram field and aggregations #20373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com>
Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com>
📝 WalkthroughWalkthroughThis pull request introduces comprehensive histogram field support to OpenSearch, enabling indexing and aggregation of histogram-structured data (values with associated counts). It adds a new field mapper, field data implementations, and extends the aggregation framework to handle histogram sources across multiple aggregator types. Changes
Sequence Diagram(s)sequenceDiagram
actor User
participant Mapper as HistogramFieldMapper
participant FieldData as HistogramIndexFieldData<br/>(LeafReaderContext)
participant LeafFieldData as HistogramLeafFieldData
participant DocValues as DocValues<br/>(BinaryDocValues)
User->>Mapper: Index document with<br/>histogram (values, counts)
activate Mapper
Mapper->>Mapper: validateHistogramData()
Mapper->>Mapper: encodeHistogram() →<br/>ByteBuffer
Mapper-->>User: Store BinaryDocValuesField
deactivate Mapper
User->>FieldData: load(leafContext)
activate FieldData
FieldData->>LeafFieldData: new HistogramLeafFieldData()
activate LeafFieldData
LeafFieldData->>DocValues: getBinary(reader,<br/>fieldName)
LeafFieldData-->>FieldData: Ready for queries
deactivate LeafFieldData
deactivate FieldData
User->>LeafFieldData: getHistogramValues()
activate LeafFieldData
LeafFieldData->>DocValues: advanceExact(docId)
activate DocValues
DocValues-->>LeafFieldData: BytesRef
deactivate DocValues
LeafFieldData->>LeafFieldData: decodeHistogram(BytesRef)<br/>→ Histogram
LeafFieldData-->>User: HistogramValues
deactivate LeafFieldData
sequenceDiagram
actor User
participant Factory as AggregatorFactory<br/>(e.g., Sum)
participant ValuesSourceConfig as ValuesSourceConfig
participant Aggregator as Aggregator<br/>(e.g., SumAggregator)
participant ValuesSource as HistogramValuesSource
participant FieldData as HistogramLeafFieldData
User->>Factory: Build aggregation on<br/>histogram field
activate Factory
Factory->>ValuesSourceConfig: resolve(histogramField)
activate ValuesSourceConfig
ValuesSourceConfig->>ValuesSourceConfig: Detect HistogramFieldType
ValuesSourceConfig-->>Factory: ValuesSourceConfig<br/>(HISTOGRAM type)
deactivate ValuesSourceConfig
Factory->>Aggregator: getLeafCollector(<br/>ValuesSource)
activate Aggregator
Aggregator->>ValuesSource: instanceof check
Note over Aggregator: Branch: Histogram path
Aggregator->>ValuesSource: doubleValues(context)
activate ValuesSource
ValuesSource->>FieldData: getHistogramValues()
activate FieldData
FieldData-->>ValuesSource: HistogramValues
deactivate FieldData
ValuesSource-->>Aggregator: SortedNumericDoubleValues
deactivate ValuesSource
Aggregator->>Aggregator: Loop: per-doc histogram<br/>Accumulate(value × count)
Aggregator-->>User: Weighted aggregation result
deactivate Aggregator
deactivate Factory
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Suggested labels
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 16
Fix all issues with AI Agents 🤖
In @server/src/main/java/org/opensearch/index/fielddata/HistogramValues.java:
- Around line 53-55: The Javadoc for the method in class HistogramValues still
refers to HistogramValue but the method now returns Histogram; update the
documentation to reference the correct return type (Histogram) and any related
wording (e.g., "Get the Histogram associated with the current document" and note
reuse of the returned Histogram) so the comment matches the actual return type
and behavior of the method.
In @server/src/main/java/org/opensearch/index/mapper/HistogramFieldMapper.java:
- Line 1: Add the standard Apache‑2.0 SPDX license header to this source; update
the top of HistogramFieldMapper.java (package org.opensearch.index.mapper; /
class HistogramFieldMapper) to include the same SPDX/Apache‑2.0 license header
block used across the project (copy from other mapper files) so the file
complies with licensing requirements.
- Around line 119-126: The values array parsing accepts NaN and Infinity; update
the block that handles VALUES_FIELD in HistogramFieldMapper (the loop using
parser.nextToken(), token, and values.add(parser.doubleValue())) to explicitly
validate each parsed double (use Double.isFinite(value) or check
!Double.isNaN(value) && !Double.isInfinite(value)) and throw the same
IllegalArgumentException("values must be numbers") when a non-finite value is
encountered so NaN/Infinity are rejected.
In @server/src/main/java/org/opensearch/indices/fielddata/HistogramValues.java:
- Around line 13-16: Delete the unused interface
org.opensearch.indices.fielddata.HistogramValues (remove the file and its
declaration), and update any imports/usages to reference the established
abstract class org.opensearch.index.fielddata.HistogramValues instead; ensure no
code still imports org.opensearch.indices.fielddata.HistogramValues and run a
compile to verify all aggregator classes that rely on HistogramValues are using
the abstract class symbol.
In
@server/src/main/java/org/opensearch/search/aggregations/bucket/histogram/HistogramAggregatorFactory.java:
- Line 71: NumericHistogramAggregator currently reads histogram values via
doubleValues() but ignores per-value counts, causing each histogram entry to be
treated as count=1; update the aggregation logic in NumericHistogramAggregator
(the code that handles values iteration and key computation using
Math.floor((value - offset) / interval) and DocValueFormat.RAW) to detect when
the values source is a HistogramValuesSource, call histogram.getCounts()
alongside histogram.getValues(), and multiply/add bucket doc counts by the
corresponding count for each value (iterate counts in sync with values),
mirroring the pattern used in RangeAggregator where instanceof
HistogramValuesSource triggers extraction of both getValues() and getCounts()
and iterates counts when updating buckets.
In
@server/src/main/java/org/opensearch/search/aggregations/bucket/range/RangeAggregator.java:
- Around line 379-389: The code in RangeAggregator increments the bucket once
via collectBucket and then count times via collectExistingBucket, producing
count+1 increments; change the logic so total increments equal count by either
calling collectExistingBucket (count - 1) times when count > 0 after the initial
collectBucket, or (if intent is to only use the histogram count) remove the
initial collectBucket and call collectExistingBucket count times; update the
loop that currently does "for (long k = 0; k < count; k++) {
collectExistingBucket(...); }" accordingly and ensure you handle count == 0
safely; verify the behavior for matchedRange/ranges and
subBucketOrdinal/collectBucket/collectExistingBucket remains consistent.
In
@server/src/main/java/org/opensearch/search/aggregations/bucket/range/RangeAggregatorFactory.java:
- Around line 35-46: Remove the unused imports from RangeAggregatorFactory.java:
delete HistogramValuesSource, DocValueFormat, ValuesSource, Aggregator,
AggregatorFactories, AggregatorFactory, CardinalityUpperBound,
InternalRange.Factory, and SearchContext from the import list; ensure only the
actually referenced types (e.g., QueryShardContext, RangeAggregator.Range,
ValuesSourceConfig, and any others used in this file) remain and then recompile
to verify no missing references.
In
@server/src/main/java/org/opensearch/search/aggregations/metrics/AbstractInternalTDigestPercentiles.java:
- Line 38: The file AbstractInternalTDigestPercentiles contains an unused import
of org.opensearch.index.fielddata.HistogramValuesSource; remove that import line
so the class no longer references HistogramValuesSource (ensure no other code in
AbstractInternalTDigestPercentiles uses HistogramValuesSource before removing).
This will clean up the unused dependency and eliminate the unused-import
warning.
In
@server/src/main/java/org/opensearch/search/aggregations/metrics/AvgAggregator.java:
- Around line 204-269: The histogram-handling block is unreachable because the
method returns before branching on valuesSource; move the histogram vs
non-histogram branching so you check "if (valuesSource instanceof
HistogramValuesSource)" before any return, and then return the appropriate
LeafBucketCollectorBase for each branch (the histogram branch building the
collector that uses HistogramValues, and the existing non-histogram branch using
values.nextValue()). Update the method in AvgAggregator where the current early
return occurs so each branch returns its collector directly and no code after
the return is intended to execute.
In
@server/src/main/java/org/opensearch/search/aggregations/metrics/MinAggregatorFactory.java:
- Line 60: MinAggregatorFactory's histogram collection uses only
histogram.getValues()[0] and ignores counts, so update the histogram handling in
the collector to iterate over histogram.getValues() and histogram.getCounts(),
skipping entries where counts[i] == 0 and computing the minimum across only
entries with non-zero counts; use histogram.getValues() and
histogram.getCounts(), compare each value to the current min and set when lower
(ensure to handle empty/zero-count-only histograms appropriately, e.g., leave as
NaN/unset as current code expects).
In
@server/src/main/java/org/opensearch/search/aggregations/metrics/StatsAggregator.java:
- Around line 150-152: The histogram handling in StatsAggregator currently does
O(count) work by repeating kahanSummation.add(value) inside the loop; change
this to a single weighted addition (e.g., kahanSummation.add(value * count)) to
avoid the per-count loop and drastically improve performance for large counts,
or alternatively implement and call a weighted Kahan add method if you want to
preserve per-addition Kahan compensation semantics; update the code that uses
the long count and value inside the for-loop to perform one multiply-and-add via
kahanSummation instead of iterating.
In
@server/src/main/java/org/opensearch/search/aggregations/support/HistogramValuesSourceType.java:
- Around line 33-36: The getField method currently returns null; implement it to
resolve and return a HistogramValuesSource using the provided FieldContext and
AggregationScript.LeafFactory so histogram aggregations can obtain a proper
ValuesSource. Inside getField(FieldContext fieldContext,
AggregationScript.LeafFactory script) construct and return a
HistogramValuesSource (or use the existing factory/utility that wraps a
FieldContext and script into a ValuesSource) ensuring you handle missing/mapped
fields similarly to other ValuesSourceType implementations and propagate any
exceptions as before.
In
@server/src/test/java/org/opensearch/index/mapper/HistogramFieldMapperTests.java:
- Around line 319-345: The parser currently accepts Double.POSITIVE_INFINITY and
Double.NaN because XContentParser treats them as numeric tokens and
parseCreateField() calls parser.doubleValue() without rejecting special values;
update validation so validateHistogramData() (or immediately after values are
parsed in parseCreateField()) checks each value with Double.isFinite(value) (or
equivalently !Double.isInfinite && !Double.isNaN) and throws a
MapperParsingException with a message containing "values must be numbers" when
any value is not finite; ensure the thrown exception becomes the cause so
existing tests asserting containsString("values must be numbers") continue to
pass.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
server/src/main/java/org/opensearch/search/aggregations/metrics/PercentilesAggregatorFactory.java (1)
62-70: Update test class declarations to include histogram support, or exclude HISTOGRAM from factory registration.The AbstractHDRPercentilesAggregator and AbstractTDigestPercentilesAggregator both contain explicit histogram handling logic (extracting histogram values and counts, recording them with
recordValueWithCount), but the corresponding test classes—TDigestPercentilesAggregatorTests and HDRPercentilesAggregatorTests—do not declareCoreValuesSourceType.HISTOGRAMin theirgetSupportedValuesSourceTypes()method. This creates an inconsistency: the factory now registers histogram support, the implementations handle histograms, but the tests don't verify it. Either add HISTOGRAM to the test declarations to enable proper test coverage, or remove it from the factory registration if histogram support is intentionally incomplete.server/src/main/java/org/opensearch/search/aggregations/metrics/MinAggregator.java (1)
214-265: Dead code:getRegularLeafCollectormethod is never called.This helper method is defined but never invoked. The
getLeafCollectormethod at lines 166-211 handles both histogram and non-histogram paths directly and returns before reaching this code.Additionally, line 219 calls
growMins(bucket)but then lines 220-224 immediately duplicate the same grow logic, which would be redundant even if this code were reachable.🔎 Proposed fix
Remove the dead code method entirely:
- private LeafBucketCollector getRegularLeafCollector(BigArrays bigArrays, SortedNumericDoubleValues allValues, LeafBucketCollector sub) { - final NumericDoubleValues values = MultiValueMode.MIN.select(allValues); - return new LeafBucketCollectorBase(sub, allValues) { - @Override - public void collect(int doc, long bucket) throws IOException { - growMins(bucket); - if (bucket >= mins.size()) { - long from = mins.size(); - mins = bigArrays.grow(mins, bucket + 1); - mins.fill(from, mins.size(), Double.POSITIVE_INFINITY); - } - - if (values.advanceExact(doc)) { - final double value = values.doubleValue(); - double min = mins.get(bucket); - mins.set(bucket, Math.min(min, value)); - } - } - - @Override - public void collect(DocIdStream stream, long bucket) throws IOException { - // ... rest of method ... - } - // ... rest of method ... - }; - }server/src/main/java/org/opensearch/search/aggregations/metrics/StatsAggregator.java (1)
165-188: Incomplete histogram support:collect(DocIdStream, bucket)ignores histogram counts.This method still uses
counts.increment(bucket, valuesCount)which treats each histogram bucket as a single value, ignoring the actual counts. When processing histogram fields via this code path, statistics will be incorrect.Either add histogram-aware handling here or document that histogram fields don't support this collection mode.
🧹 Nitpick comments (19)
server/src/main/java/org/opensearch/search/aggregations/bucket/histogram/InternalHistogram.java (1)
39-39: Consider removing unused import.The
HistogramValuesSourceimport doesn't appear to be used in this file. If it's not needed, consider removing it to keep the imports clean.server/src/main/java/org/opensearch/search/aggregations/bucket/histogram/AbstractHistogramAggregator.java (1)
37-37: Consider removing unused import.The
HistogramValuesSourceimport doesn't appear to be used in this file. If it's not needed, consider removing it to keep the imports clean.server/src/main/java/org/opensearch/search/aggregations/bucket/range/AbstractRangeAggregatorFactory.java (1)
111-116: LGTM! Format selection logic is correct.The conditional format selection properly handles histogram sources by using
DocValueFormat.RAW, which is appropriate for preserving raw histogram values. The cast toNumericon line 123 is type-safe sinceHistogramValuesSourceextendsValuesSource.Numeric.The same conditional format selection pattern appears in multiple files:
NumericHistogramAggregator.java(lines 91-93)RangeHistogramAggregator.java(lines 91-93)AbstractRangeAggregatorFactory.java(lines 111-116)Consider extracting this logic into a helper method to reduce duplication:
🔎 Suggested refactor to reduce duplication
Create a utility method in a common location:
public static DocValueFormat selectFormatForHistogram(ValuesSourceConfig config) { return config.getValuesSource() instanceof HistogramValuesSource ? DocValueFormat.RAW : config.format(); }Then use it consistently:
selectFormatForHistogram(valuesSourceConfig)Also applies to: 124-124
server/src/main/java/org/opensearch/indices/fielddata/Histogram.java (1)
11-30: Add constructor validation for array length consistency.The constructor accepts
valuesandcountsarrays but does not validate that they have the same length. This could lead to subtle bugs if mismatched arrays are passed, sincegetSize()only returnsvalues.length.🔎 Proposed validation
public Histogram(double[] values, long[] counts) { + if (values == null || counts == null) { + throw new IllegalArgumentException("values and counts must not be null"); + } + if (values.length != counts.length) { + throw new IllegalArgumentException("values and counts must have the same length"); + } this.values = values; this.counts = counts; }server/src/main/java/org/opensearch/search/aggregations/support/HistogramValuesSourceType.java (1)
43-46: Return a meaningful type name.Returning an empty string for
typeName()will make debugging and error messages unclear.🔎 Proposed fix
@Override public String typeName() { - return ""; + return "histogram"; }server/src/main/java/org/opensearch/index/fielddata/HistogramValuesSource.java (1)
121-128: Dead code:decodeMinmethod is never called.This private method is defined but never invoked anywhere in this class or accessible externally.
🔎 Proposed fix
Remove the unused method:
- private double decodeMin(BytesRef bytesRef) { - ByteBuffer buffer = ByteBuffer.wrap(bytesRef.bytes, bytesRef.offset, bytesRef.length); - int size = buffer.getInt(); - if (size > 0) { - return buffer.getDouble(); - } - return Double.NaN; - }server/src/main/java/org/opensearch/index/fielddata/HistogramLeafFieldData.java (3)
45-57: Consider adding defensive checks for malformed binary data.The
decodeHistogrammethod assumes the binary data is well-formed. If the storedsizevalue is corrupted or larger than the actual data,buffer.getDouble()orbuffer.getLong()will throwBufferUnderflowException. Consider adding validation.🔎 Proposed defensive check
private Histogram decodeHistogram(BytesRef bytesRef) { ByteBuffer buffer = ByteBuffer.wrap(bytesRef.bytes, bytesRef.offset, bytesRef.length); int size = buffer.getInt(); + int expectedBytes = 4 + (size * (Double.BYTES + Long.BYTES)); + if (bytesRef.length < expectedBytes || size < 0) { + throw new IllegalStateException("Corrupted histogram data: invalid size or insufficient bytes"); + } double[] values = new double[size]; long[] counts = new long[size];
62-70: ReturningnullfromgetScriptValues()may cause NPEs in script contexts.If a user attempts to access a histogram field from a Painless script, returning
nullhere will likely cause aNullPointerException. Consider throwingUnsupportedOperationExceptionwith a clear message, or implementing a basicScriptDocValueswrapper.🔎 Proposed fix
@Override public ScriptDocValues<?> getScriptValues() { - return null; + throw new UnsupportedOperationException("Histogram fields do not support script access"); } @Override public SortedBinaryDocValues getBytesValues() { - return null; + throw new UnsupportedOperationException("Histogram fields do not support bytes values"); }
72-75:ramBytesUsed()returning 0 is inaccurate.This method is used for memory accounting in circuit breakers. Returning 0 means histogram field data won't be properly tracked, which could lead to memory issues under heavy load. Consider estimating memory based on the reader's field info or document count.
server/src/main/java/org/opensearch/index/fielddata/HistogramIndexFieldData.java (1)
52-55:newBucketedSort()returningnullis inconsistent withsortField().While
sortField()throwsUnsupportedOperationException,newBucketedSort()silently returnsnull. This inconsistency could lead to confusing NPEs for callers. Consider throwing the same exception for consistency.🔎 Proposed fix
@Override public BucketedSort newBucketedSort(BigArrays bigArrays, Object missingValue, MultiValueMode sortMode, XFieldComparatorSource.Nested nested, SortOrder sortOrder, DocValueFormat format, int bucketSize, BucketedSort.ExtraData extra) { - return null; + throw new UnsupportedOperationException("Histogram fields do not support sorting"); }server/src/main/java/org/opensearch/index/mapper/HistogramFieldMapper.java (3)
14-14: Unused import:BytesBinaryIndexFieldData.This import is not used in the file and should be removed.
-import org.opensearch.index.fielddata.plain.BytesBinaryIndexFieldData;
33-39:FIELD_TYPEis defined but never used.The
FIELD_TYPEstatic field is configured but not referenced anywhere in the class. TheparseCreateFieldmethod creates aBinaryDocValuesFielddirectly without using this field type. Either removeFIELD_TYPEor use it in field creation.
42-44: Theindexedparameter is misleading for histogram fields.The
indexedparameter is exposed in the builder, butHistogramFieldType.termQuery()throwsUnsupportedOperationException. This could confuse users who setindex: trueexpecting search capability. Consider either removing the parameter or documenting this limitation.server/src/test/java/org/opensearch/index/mapper/HistogramFieldMapperTests.java (2)
30-30: Unused import:stringEncodefromGeohash.This static import is not used anywhere in the test file.
-import static org.opensearch.geometry.utils.Geohash.stringEncode;
57-63: Consider usingassertThatwithinstanceOfmatcher for type check.The type check can be more idiomatically written using the existing matcher.
🔎 Suggested improvement
-assertTrue("Type parser should be an instance of HistogramFieldMapper.TypeParser", - parser instanceof HistogramFieldMapper.TypeParser); +assertThat("Type parser should be an instance of HistogramFieldMapper.TypeParser", + parser, instanceOf(HistogramFieldMapper.TypeParser.class));server/src/test/java/org/opensearch/index/query/HistogramFieldQueryTests.java (4)
29-29: Unused import:List.The
Listimport is not used in this file.-import java.util.List;
53-55: Consider using@Beforeor unique index names for test isolation.All tests use the same
defaultIndexName = "test"and create the index at the start of each test. WhileOpenSearchSingleNodeTestCasetypically handles cleanup, using unique index names (e.g., with test method name) or explicit cleanup would make tests more robust against parallel execution or cleanup failures.
366-399: Range aggregation test assertions could be more specific.The test only verifies that bucket doc counts are
>= 0, which would pass even if the aggregation returned empty buckets. Consider asserting that at least one bucket has a positive count, or verify specific bucket counts based on the histogram data.🔎 Suggested stronger assertions
Range range = response.getAggregations().get("range_agg"); assertThat(range, notNullValue()); +assertThat("Should have 3 range buckets", range.getBuckets().size(), equalTo(3)); +long totalDocCount = 0; for (Range.Bucket bucket : range.getBuckets()) { assertThat(bucket.getDocCount(), greaterThanOrEqualTo(0L)); + totalDocCount += bucket.getDocCount(); } +assertThat("At least one bucket should have docs", totalDocCount, greaterThan(0L));
401-432: Histogram aggregation test could verify bucket keys and counts more precisely.Similar to the range test, this only checks that doc counts are non-negative. Given the known input data (values 0.1, 0.2, 0.3 with interval 0.1), you could verify expected bucket keys and counts.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (39)
server/src/main/java/org/opensearch/index/fielddata/HistogramIndexFieldData.javaserver/src/main/java/org/opensearch/index/fielddata/HistogramLeafFieldData.javaserver/src/main/java/org/opensearch/index/fielddata/HistogramValues.javaserver/src/main/java/org/opensearch/index/fielddata/HistogramValuesSource.javaserver/src/main/java/org/opensearch/index/mapper/HistogramFieldMapper.javaserver/src/main/java/org/opensearch/indices/IndicesModule.javaserver/src/main/java/org/opensearch/indices/fielddata/Histogram.javaserver/src/main/java/org/opensearch/indices/fielddata/HistogramValues.javaserver/src/main/java/org/opensearch/search/aggregations/bucket/histogram/AbstractHistogramAggregator.javaserver/src/main/java/org/opensearch/search/aggregations/bucket/histogram/HistogramAggregatorFactory.javaserver/src/main/java/org/opensearch/search/aggregations/bucket/histogram/InternalHistogram.javaserver/src/main/java/org/opensearch/search/aggregations/bucket/histogram/NumericHistogramAggregator.javaserver/src/main/java/org/opensearch/search/aggregations/bucket/histogram/RangeHistogramAggregator.javaserver/src/main/java/org/opensearch/search/aggregations/bucket/range/AbstractRangeAggregatorFactory.javaserver/src/main/java/org/opensearch/search/aggregations/bucket/range/RangeAggregator.javaserver/src/main/java/org/opensearch/search/aggregations/bucket/range/RangeAggregatorFactory.javaserver/src/main/java/org/opensearch/search/aggregations/metrics/AbstractHDRPercentilesAggregator.javaserver/src/main/java/org/opensearch/search/aggregations/metrics/AbstractInternalTDigestPercentiles.javaserver/src/main/java/org/opensearch/search/aggregations/metrics/AbstractTDigestPercentilesAggregator.javaserver/src/main/java/org/opensearch/search/aggregations/metrics/AvgAggregator.javaserver/src/main/java/org/opensearch/search/aggregations/metrics/AvgAggregatorFactory.javaserver/src/main/java/org/opensearch/search/aggregations/metrics/ExtendedStatsAggregator.javaserver/src/main/java/org/opensearch/search/aggregations/metrics/ExtendedStatsAggregatorFactory.javaserver/src/main/java/org/opensearch/search/aggregations/metrics/MaxAggregator.javaserver/src/main/java/org/opensearch/search/aggregations/metrics/MaxAggregatorFactory.javaserver/src/main/java/org/opensearch/search/aggregations/metrics/MinAggregator.javaserver/src/main/java/org/opensearch/search/aggregations/metrics/MinAggregatorFactory.javaserver/src/main/java/org/opensearch/search/aggregations/metrics/PercentileRanksAggregatorFactory.javaserver/src/main/java/org/opensearch/search/aggregations/metrics/PercentilesAggregatorFactory.javaserver/src/main/java/org/opensearch/search/aggregations/metrics/StatsAggregator.javaserver/src/main/java/org/opensearch/search/aggregations/metrics/StatsAggregatorFactory.javaserver/src/main/java/org/opensearch/search/aggregations/metrics/SumAggregator.javaserver/src/main/java/org/opensearch/search/aggregations/metrics/SumAggregatorFactory.javaserver/src/main/java/org/opensearch/search/aggregations/metrics/ValueCountAggregator.javaserver/src/main/java/org/opensearch/search/aggregations/support/CoreValuesSourceType.javaserver/src/main/java/org/opensearch/search/aggregations/support/HistogramValuesSourceType.javaserver/src/main/java/org/opensearch/search/aggregations/support/ValuesSourceConfig.javaserver/src/test/java/org/opensearch/index/mapper/HistogramFieldMapperTests.javaserver/src/test/java/org/opensearch/index/query/HistogramFieldQueryTests.java
🧰 Additional context used
🧬 Code graph analysis (23)
server/src/main/java/org/opensearch/index/fielddata/HistogramValues.java (1)
server/src/main/java/org/opensearch/indices/fielddata/Histogram.java (1)
Histogram(11-31)
server/src/main/java/org/opensearch/search/aggregations/support/ValuesSourceConfig.java (1)
server/src/main/java/org/opensearch/index/mapper/HistogramFieldMapper.java (1)
HistogramFieldMapper(28-246)
server/src/main/java/org/opensearch/index/fielddata/HistogramLeafFieldData.java (1)
server/src/main/java/org/opensearch/indices/fielddata/Histogram.java (1)
Histogram(11-31)
server/src/main/java/org/opensearch/search/aggregations/support/CoreValuesSourceType.java (2)
server/src/main/java/org/opensearch/index/fielddata/HistogramIndexFieldData.java (1)
HistogramIndexFieldData(20-61)server/src/main/java/org/opensearch/index/fielddata/HistogramValuesSource.java (1)
HistogramValuesSource(20-134)
server/src/main/java/org/opensearch/search/aggregations/bucket/range/RangeAggregator.java (6)
server/src/main/java/org/opensearch/index/fielddata/HistogramIndexFieldData.java (1)
HistogramIndexFieldData(20-61)server/src/main/java/org/opensearch/index/fielddata/HistogramLeafFieldData.java (1)
HistogramLeafFieldData(20-76)server/src/main/java/org/opensearch/index/fielddata/HistogramValues.java (1)
HistogramValues(44-58)server/src/main/java/org/opensearch/index/fielddata/HistogramValuesSource.java (1)
HistogramValuesSource(20-134)server/src/main/java/org/opensearch/indices/fielddata/Histogram.java (1)
Histogram(11-31)server/src/main/java/org/opensearch/search/aggregations/LeafBucketCollector.java (1)
LeafBucketCollector(51-179)
server/src/main/java/org/opensearch/indices/IndicesModule.java (1)
server/src/main/java/org/opensearch/index/mapper/HistogramFieldMapper.java (1)
HistogramFieldMapper(28-246)
server/src/main/java/org/opensearch/search/aggregations/metrics/AvgAggregator.java (4)
server/src/main/java/org/opensearch/index/fielddata/HistogramIndexFieldData.java (1)
HistogramIndexFieldData(20-61)server/src/main/java/org/opensearch/index/fielddata/HistogramLeafFieldData.java (1)
HistogramLeafFieldData(20-76)server/src/main/java/org/opensearch/index/fielddata/HistogramValues.java (1)
HistogramValues(44-58)server/src/main/java/org/opensearch/index/fielddata/HistogramValuesSource.java (1)
HistogramValuesSource(20-134)
server/src/main/java/org/opensearch/search/aggregations/metrics/MaxAggregator.java (5)
server/src/main/java/org/opensearch/index/fielddata/HistogramIndexFieldData.java (1)
HistogramIndexFieldData(20-61)server/src/main/java/org/opensearch/index/fielddata/HistogramLeafFieldData.java (1)
HistogramLeafFieldData(20-76)server/src/main/java/org/opensearch/index/fielddata/HistogramValues.java (1)
HistogramValues(44-58)server/src/main/java/org/opensearch/index/fielddata/HistogramValuesSource.java (1)
HistogramValuesSource(20-134)server/src/main/java/org/opensearch/indices/fielddata/Histogram.java (1)
Histogram(11-31)
server/src/main/java/org/opensearch/index/fielddata/HistogramValuesSource.java (1)
server/src/main/java/org/opensearch/indices/fielddata/Histogram.java (1)
Histogram(11-31)
server/src/main/java/org/opensearch/search/aggregations/metrics/AbstractInternalTDigestPercentiles.java (1)
server/src/main/java/org/opensearch/index/fielddata/HistogramValuesSource.java (1)
HistogramValuesSource(20-134)
server/src/main/java/org/opensearch/search/aggregations/bucket/histogram/RangeHistogramAggregator.java (2)
server/src/main/java/org/opensearch/index/fielddata/HistogramValuesSource.java (1)
HistogramValuesSource(20-134)server/src/main/java/org/opensearch/index/mapper/RangeFieldMapper.java (1)
RangeFieldMapper(87-687)
server/src/main/java/org/opensearch/search/aggregations/bucket/histogram/NumericHistogramAggregator.java (1)
server/src/main/java/org/opensearch/index/fielddata/HistogramValuesSource.java (1)
HistogramValuesSource(20-134)
server/src/main/java/org/opensearch/search/aggregations/metrics/MinAggregator.java (5)
server/src/main/java/org/opensearch/index/fielddata/HistogramIndexFieldData.java (1)
HistogramIndexFieldData(20-61)server/src/main/java/org/opensearch/index/fielddata/HistogramLeafFieldData.java (1)
HistogramLeafFieldData(20-76)server/src/main/java/org/opensearch/index/fielddata/HistogramValues.java (1)
HistogramValues(44-58)server/src/main/java/org/opensearch/index/fielddata/HistogramValuesSource.java (1)
HistogramValuesSource(20-134)server/src/main/java/org/opensearch/indices/fielddata/Histogram.java (1)
Histogram(11-31)
server/src/test/java/org/opensearch/index/mapper/HistogramFieldMapperTests.java (1)
libs/x-content/src/main/java/org/opensearch/common/xcontent/json/JsonXContent.java (1)
JsonXContent(64-146)
server/src/main/java/org/opensearch/search/aggregations/bucket/histogram/AbstractHistogramAggregator.java (1)
server/src/main/java/org/opensearch/index/fielddata/HistogramValuesSource.java (1)
HistogramValuesSource(20-134)
server/src/main/java/org/opensearch/search/aggregations/metrics/AbstractHDRPercentilesAggregator.java (5)
server/src/main/java/org/opensearch/index/fielddata/HistogramIndexFieldData.java (1)
HistogramIndexFieldData(20-61)server/src/main/java/org/opensearch/index/fielddata/HistogramLeafFieldData.java (1)
HistogramLeafFieldData(20-76)server/src/main/java/org/opensearch/index/fielddata/HistogramValues.java (1)
HistogramValues(44-58)server/src/main/java/org/opensearch/index/fielddata/HistogramValuesSource.java (1)
HistogramValuesSource(20-134)server/src/main/java/org/opensearch/indices/fielddata/Histogram.java (1)
Histogram(11-31)
server/src/main/java/org/opensearch/search/aggregations/bucket/histogram/InternalHistogram.java (1)
server/src/main/java/org/opensearch/index/fielddata/HistogramValuesSource.java (1)
HistogramValuesSource(20-134)
server/src/main/java/org/opensearch/indices/fielddata/HistogramValues.java (1)
server/src/main/java/org/opensearch/index/fielddata/HistogramValues.java (1)
HistogramValues(44-58)
server/src/main/java/org/opensearch/search/aggregations/metrics/StatsAggregator.java (1)
server/src/main/java/org/opensearch/index/fielddata/HistogramValuesSource.java (1)
HistogramValuesSource(20-134)
server/src/main/java/org/opensearch/search/aggregations/bucket/range/AbstractRangeAggregatorFactory.java (1)
server/src/main/java/org/opensearch/index/fielddata/HistogramValuesSource.java (1)
HistogramValuesSource(20-134)
server/src/main/java/org/opensearch/search/aggregations/metrics/AbstractTDigestPercentilesAggregator.java (1)
server/src/main/java/org/opensearch/index/fielddata/HistogramValuesSource.java (1)
HistogramValuesSource(20-134)
server/src/test/java/org/opensearch/index/query/HistogramFieldQueryTests.java (3)
server/src/main/java/org/opensearch/indices/fielddata/Histogram.java (1)
Histogram(11-31)server/src/main/java/org/opensearch/search/aggregations/bucket/range/RangeAggregator.java (1)
Range(110-265)server/src/main/java/org/opensearch/index/query/QueryBuilders.java (1)
QueryBuilders(60-804)
server/src/main/java/org/opensearch/search/aggregations/metrics/ExtendedStatsAggregator.java (1)
server/src/main/java/org/opensearch/index/fielddata/HistogramValuesSource.java (1)
HistogramValuesSource(20-134)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (21)
- GitHub Check: gradle-check
- GitHub Check: precommit (21, ubuntu-24.04-arm)
- GitHub Check: precommit (21, windows-2025, true)
- GitHub Check: precommit (25, macos-15-intel)
- GitHub Check: precommit (25, ubuntu-24.04-arm)
- GitHub Check: precommit (25, macos-15)
- GitHub Check: precommit (21, macos-15)
- GitHub Check: precommit (25, windows-latest)
- GitHub Check: precommit (25, ubuntu-latest)
- GitHub Check: precommit (21, ubuntu-latest)
- GitHub Check: precommit (21, windows-latest)
- GitHub Check: precommit (21, macos-15-intel)
- GitHub Check: detect-breaking-change
- GitHub Check: assemble (21, windows-latest)
- GitHub Check: assemble (25, windows-latest)
- GitHub Check: assemble (25, ubuntu-24.04-arm)
- GitHub Check: assemble (25, ubuntu-latest)
- GitHub Check: assemble (21, ubuntu-latest)
- GitHub Check: assemble (21, ubuntu-24.04-arm)
- GitHub Check: Analyze (java)
- GitHub Check: Mend Security Check
🔇 Additional comments (27)
server/src/main/java/org/opensearch/search/aggregations/metrics/AbstractTDigestPercentilesAggregator.java (1)
85-87: Formatter handling for histogram sources is consistent across both percentiles implementations.The conditional logic correctly uses
DocValueFormat.RAWfor histogram value sources since histogram data is pre-aggregated. This pattern is consistently applied in bothAbstractTDigestPercentilesAggregator(lines 85-87) andAbstractHDRPercentilesAggregator(lines 91-93), confirming the implementation is sound.server/src/main/java/org/opensearch/index/fielddata/HistogramValues.java (1)
56-56: Verify all implementations and callers are updated for the new return type.The return type changed from
HistogramValuetoHistogram, which is a breaking API change. Ensure that all implementations of this abstract method and all call sites have been updated to use the newHistogramtype.server/src/main/java/org/opensearch/indices/IndicesModule.java (1)
59-59: LGTM! Histogram field mapper registration follows the standard pattern.The import and registration of
HistogramFieldMappercorrectly follows the established pattern for built-in field mappers. This enables the "histogram" field type to be used in index mappings alongside other standard field types.Also applies to: 184-184
server/src/main/java/org/opensearch/search/aggregations/metrics/ExtendedStatsAggregatorFactory.java (1)
75-75: Histogram weighted statistics computation is correctly implemented.The code properly handles histogram value sources for extended stats aggregation:
- Histogram counts are correctly retrieved via
histogramValuesSource.getCounts(ctx)and applied as weights when computing statistics (lines 160-164:value * countfor weighted sum andvalue * value * countfor weighted sum of squares)- Values and counts are iterated in parallel with proper advancement checks (lines 143-144)
- The sigma parameter is correctly applied to variance-based calculations for standard deviation bounds (lines 262-265)
- RAW format is explicitly set for histogram sources (lines 88-90)
server/src/main/java/org/opensearch/search/aggregations/metrics/SumAggregatorFactory.java (1)
76-76: Verify histogram support implementation in SumAggregator.The registry change correctly adds HISTOGRAM as a supported value source type. Ensure that
SumAggregatorproperly handles histogram-based value sources, including correct accumulation logic for histogram values and counts.#!/bin/bash # Verify SumAggregator handles HistogramValuesSource ast-grep --pattern $'class SumAggregator { $$$ }' rg -n "HistogramValuesSource" server/src/main/java/org/opensearch/search/aggregations/metrics/SumAggregator.java -A5 -B5server/src/main/java/org/opensearch/search/aggregations/metrics/MaxAggregatorFactory.java (1)
60-60: Histogram support is properly implemented in MaxAggregator.The registry change to add HISTOGRAM as a supported value source type is backed by comprehensive histogram-specific logic in MaxAggregator. The implementation correctly extracts histogram field data, retrieves histogram values, and finds the maximum by selecting the last value from the sorted histogram values array, properly updating the max aggregation for each bucket.
server/src/main/java/org/opensearch/search/aggregations/metrics/PercentileRanksAggregatorFactory.java (1)
64-64: Histogram support in percentile ranks is properly implemented.The registry correctly includes HISTOGRAM as a supported value source type. Both TDigestPercentileRanksAggregator and HDRPercentileRanksAggregator inherit from their respective abstract base classes (AbstractTDigestPercentilesAggregator and AbstractHDRPercentilesAggregator), which explicitly handle histogram-based value sources with appropriate instanceof checks and dedicated histogram value extraction logic.
server/src/main/java/org/opensearch/search/aggregations/support/ValuesSourceConfig.java (1)
194-206: Early return is necessary for histogram script handling and is not redundant.The
HistogramIndexFieldData.getValuesSourceType()correctly returnsCoreValuesSourceType.HISTOGRAM, so that part of the early return aligns with what the normal path would produce. However, the critical difference is the explicitscriptValueType = ValueType.DOUBLEsetting (line 200). The normal field resolution path would instead useuserValueTypeHintfor scriptValueType (initialized at line 153), which is incorrect for histogram scripts. The existing inline comment already clarifies this requirement: "use DOUBLE for histogram". This early return is necessary to ensure histogram script aggregations have the correct value type.Likely an incorrect or invalid review comment.
server/src/main/java/org/opensearch/search/aggregations/bucket/histogram/NumericHistogramAggregator.java (1)
38-40: LGTM! Histogram-aware format selection implemented correctly.The conditional format selection appropriately uses
DocValueFormat.RAWfor histogram sources to preserve raw double values, aligning with the histogram support pattern across the codebase.Also applies to: 91-93
server/src/main/java/org/opensearch/search/aggregations/metrics/AvgAggregatorFactory.java (1)
76-76: LGTM! Correctly extends AVG aggregation support to histogram sources.The addition of
CoreValuesSourceType.HISTOGRAMenables histogram data to participate in average aggregations, consistent with the broader histogram support implementation.server/src/main/java/org/opensearch/search/aggregations/metrics/StatsAggregatorFactory.java (1)
71-71: LGTM! Correctly extends Stats aggregation support to histogram sources.The addition enables histogram data to participate in statistics aggregations, aligning with the histogram support pattern across multiple aggregation types.
server/src/main/java/org/opensearch/search/aggregations/bucket/range/AbstractRangeAggregatorFactory.java (1)
35-37: LGTM! Import additions and registry expansion are correct.The additions enable histogram sources to participate in range aggregations with proper type support.
Also applies to: 73-73
server/src/main/java/org/opensearch/search/aggregations/bucket/histogram/RangeHistogramAggregator.java (1)
91-93: The instanceof check for HistogramValuesSource is dead code due to registry-based type safety.The concern about type compatibility is mitigated by the ValuesSourceRegistry pattern. RangeHistogramAggregator is registered only for
CoreValuesSourceType.RANGE(HistogramAggregatorFactory.java:67), which ensures onlyValuesSource.Rangeinstances are passed to this aggregator. HistogramValuesSource is produced exclusively by a separateCoreValuesSourceType.HISTOGRAMvariant and will never reach this aggregator through normal execution.However, the isinstance check on line 91 is misleading dead code that should either be removed or replaced with an assertion. The check suggests defensive programming against a scenario the registry prevents, which reduces code clarity.
Likely an incorrect or invalid review comment.
server/src/main/java/org/opensearch/search/aggregations/support/CoreValuesSourceType.java (1)
163-189: LGTM - HISTOGRAM values source type implementation follows established patterns.The implementation correctly:
- Throws appropriate exceptions for unsupported operations (scripts, missing values)
- Validates the field data type before casting
- Returns
HistogramValuesSourcefor field-based accessOne consideration: you may want to override
getFormatter()to returnDocValueFormat.RAWexplicitly for consistency with how histogram values are formatted elsewhere in this PR (e.g.,ExtendedStatsAggregatorandSumAggregatorforceRAWformat for histogram sources).server/src/main/java/org/opensearch/search/aggregations/bucket/range/RangeAggregator.java (1)
394-417: LGTM - Non-histogram path correctly preserved.The original collection logic for numeric values is correctly maintained in the else branch.
server/src/main/java/org/opensearch/search/aggregations/metrics/ValueCountAggregator.java (1)
119-142: LGTM - Histogram value count implementation is correct.The implementation properly:
- Loads histogram field data and values
- Sums all histogram bin counts to get the total count
- Increments the bucket by the total count
This correctly interprets value_count for histogram fields as the total number of observations across all bins.
server/src/main/java/org/opensearch/search/aggregations/metrics/AbstractHDRPercentilesAggregator.java (1)
111-149: LGTM!The histogram-aware handling is correctly structured with proper branching before returning collectors. The histogram path appropriately uses
recordValueWithCountto record each value with its associated count from the histogram buckets.server/src/main/java/org/opensearch/index/fielddata/HistogramValuesSource.java (1)
46-80: LGTM - doubleValues implementation is correct.The
SortedNumericDoubleValuesimplementation properly manages iteration state withadvanceExact,nextValue, anddocValueCount. The bounds checking innextValue()prevents invalid access.server/src/main/java/org/opensearch/search/aggregations/metrics/MinAggregator.java (1)
183-188: The assumption about sorted histogram values is guaranteed and does not require changes.Histogram values are validated to be in strictly ascending order at index time in
HistogramFieldMapper.validateHistogramData(), which throwsIllegalArgumentExceptionif values violate this ordering. The code comment in MinAggregator accurately reflects this guarantee, makingvalues[0]correct as the minimum value.Likely an incorrect or invalid review comment.
server/src/main/java/org/opensearch/index/fielddata/HistogramIndexFieldData.java (2)
20-61: LGTM on overall structure.The class correctly implements
IndexFieldDatafor histogram fields with appropriate value source type registration and load semantics.
47-50: Remove redundantwideSortFieldoverride.The default
wideSortFieldimplementation delegates tosortField(), which HistogramIndexFieldData already overrides to throwUnsupportedOperationException. ThewideSortFieldoverride is unnecessary—removing it produces identical behavior since the exception is thrown through thesortField()method regardless.server/src/main/java/org/opensearch/index/mapper/HistogramFieldMapper.java (2)
158-178: LGTM on validation logic.The
validateHistogramDatamethod correctly validates that:
- Arrays have the same length
- Values are in strictly ascending order
- Counts are non-negative
181-191: LGTM on encoding logic.The encoding format
[int size][size × (double value, long count)]is correct and matches the decoding logic inHistogramLeafFieldData.decodeHistogram().server/src/test/java/org/opensearch/index/mapper/HistogramFieldMapperTests.java (2)
65-93: LGTM on histogram value encoding test.This test thoroughly validates the binary encoding format by parsing the encoded bytes and verifying each value/count pair.
95-163: LGTM on empty/missing array validation tests.Good coverage of edge cases for empty and missing arrays with proper exception message validation.
server/src/test/java/org/opensearch/index/query/HistogramFieldQueryTests.java (2)
67-102: LGTM on exists query test.Good test verifying that
existsquery works correctly with histogram fields, properly distinguishing documents with and without the histogram field.
156-181: LGTM on sum aggregation test with clear expected value calculation.The comment clearly documents the expected calculation:
(0.1 * 3) + (0.2 * 7) + (0.3 * 23) = 8.6. This is the correct weighted sum for histogram aggregations.
| * Get the {@link HistogramValue} associated with the current document. | ||
| * The returned {@link HistogramValue} might be reused across calls. | ||
| */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update javadoc to reflect the new return type.
The javadoc still references HistogramValue, but the method now returns Histogram. Update the comment to match the actual return type.
🔎 Proposed fix
/**
- * Get the {@link HistogramValue} associated with the current document.
- * The returned {@link HistogramValue} might be reused across calls.
+ * Get the {@link Histogram} associated with the current document.
+ * The returned {@link Histogram} might be reused across calls.
*/📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| * Get the {@link HistogramValue} associated with the current document. | |
| * The returned {@link HistogramValue} might be reused across calls. | |
| */ | |
| * Get the {@link Histogram} associated with the current document. | |
| * The returned {@link Histogram} might be reused across calls. | |
| */ |
🤖 Prompt for AI Agents
In @server/src/main/java/org/opensearch/index/fielddata/HistogramValues.java
around lines 53-55, The Javadoc for the method in class HistogramValues still
refers to HistogramValue but the method now returns Histogram; update the
documentation to reference the correct return type (Histogram) and any related
wording (e.g., "Get the Histogram associated with the current document" and note
reuse of the returned Histogram) so the comment matches the actual return type
and behavior of the method.
| @@ -0,0 +1,246 @@ | |||
| package org.opensearch.index.mapper; | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing SPDX license header.
This file is missing the standard Apache 2.0 license header that's present in other files in this PR. Add the license header for compliance.
🔎 Add license header
+/*
+ * SPDX-License-Identifier: Apache-2.0
+ *
+ * The OpenSearch Contributors require contributions made to
+ * this file be licensed under the Apache-2.0 license or a
+ * compatible open source license.
+ */
+
package org.opensearch.index.mapper;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| package org.opensearch.index.mapper; | |
| /* | |
| * SPDX-License-Identifier: Apache-2.0 | |
| * | |
| * The OpenSearch Contributors require contributions made to | |
| * this file be licensed under the Apache-2.0 license or a | |
| * compatible open source license. | |
| */ | |
| package org.opensearch.index.mapper; |
🤖 Prompt for AI Agents
In @server/src/main/java/org/opensearch/index/mapper/HistogramFieldMapper.java
around line 1, Add the standard Apache‑2.0 SPDX license header to this source;
update the top of HistogramFieldMapper.java (package
org.opensearch.index.mapper; / class HistogramFieldMapper) to include the same
SPDX/Apache‑2.0 license header block used across the project (copy from other
mapper files) so the file complies with licensing requirements.
| if (VALUES_FIELD.equals(currentFieldName)) { | ||
| while ((token = parser.nextToken()) != XContentParser.Token.END_ARRAY) { | ||
| if (token == XContentParser.Token.VALUE_NUMBER) { | ||
| values.add(parser.doubleValue()); | ||
| } else { | ||
| throw new IllegalArgumentException("values must be numbers"); | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No validation for NaN or Infinity in values array.
The parser accepts Double.NaN and Double.POSITIVE_INFINITY/NEGATIVE_INFINITY as valid double values. While the test file expects these to fail with "values must be numbers", the current implementation doesn't explicitly reject them. Add explicit validation.
🔎 Proposed fix
if (VALUES_FIELD.equals(currentFieldName)) {
while ((token = parser.nextToken()) != XContentParser.Token.END_ARRAY) {
if (token == XContentParser.Token.VALUE_NUMBER) {
- values.add(parser.doubleValue());
+ double val = parser.doubleValue();
+ if (Double.isNaN(val) || Double.isInfinite(val)) {
+ throw new IllegalArgumentException("values must be finite numbers, got: " + val);
+ }
+ values.add(val);
} else {
throw new IllegalArgumentException("values must be numbers");
}
}
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if (VALUES_FIELD.equals(currentFieldName)) { | |
| while ((token = parser.nextToken()) != XContentParser.Token.END_ARRAY) { | |
| if (token == XContentParser.Token.VALUE_NUMBER) { | |
| values.add(parser.doubleValue()); | |
| } else { | |
| throw new IllegalArgumentException("values must be numbers"); | |
| } | |
| } | |
| if (VALUES_FIELD.equals(currentFieldName)) { | |
| while ((token = parser.nextToken()) != XContentParser.Token.END_ARRAY) { | |
| if (token == XContentParser.Token.VALUE_NUMBER) { | |
| double val = parser.doubleValue(); | |
| if (Double.isNaN(val) || Double.isInfinite(val)) { | |
| throw new IllegalArgumentException("values must be finite numbers, got: " + val); | |
| } | |
| values.add(val); | |
| } else { | |
| throw new IllegalArgumentException("values must be numbers"); | |
| } | |
| } |
🤖 Prompt for AI Agents
In @server/src/main/java/org/opensearch/index/mapper/HistogramFieldMapper.java
around lines 119-126, The values array parsing accepts NaN and Infinity; update
the block that handles VALUES_FIELD in HistogramFieldMapper (the loop using
parser.nextToken(), token, and values.add(parser.doubleValue())) to explicitly
validate each parsed double (use Double.isFinite(value) or check
!Double.isNaN(value) && !Double.isInfinite(value)) and throw the same
IllegalArgumentException("values must be numbers") when a non-finite value is
encountered so NaN/Infinity are rejected.
| public interface HistogramValues { | ||
| boolean advanceExact(int doc) throws IOException; | ||
| Histogram histogram() throws IOException; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find both HistogramValues files
echo "=== Searching for HistogramValues files ==="
fd "HistogramValues.java" -type f
echo ""
echo "=== Checking org.opensearch.indices.fielddata.HistogramValues ==="
find . -path "*/opensearch/indices/fielddata/HistogramValues.java" -type f -exec cat -n {} \;
echo ""
echo "=== Checking org.opensearch.index.fielddata.HistogramValues ==="
find . -path "*/opensearch/index/fielddata/HistogramValues.java" -type f -exec cat -n {} \;Repository: opensearch-project/OpenSearch
Length of output: 3126
🏁 Script executed:
#!/bin/bash
# Search for imports and usage of both types
echo "=== Imports of org.opensearch.indices.fielddata.HistogramValues ==="
rg "import org\.opensearch\.indices\.fielddata\.HistogramValues" --type java -A 2
echo ""
echo "=== Imports of org.opensearch.index.fielddata.HistogramValues ==="
rg "import org\.opensearch\.index\.fielddata\.HistogramValues" --type java -A 2
echo ""
echo "=== References to indices.fielddata.HistogramValues (without import) ==="
rg "indices\.fielddata\.HistogramValues" --type java -B 2 -A 2
echo ""
echo "=== References to index.fielddata.HistogramValues (without import) ==="
rg "index\.fielddata\.HistogramValues" --type java -B 2 -A 2Repository: opensearch-project/OpenSearch
Length of output: 24162
Remove unused interface org.opensearch.indices.fielddata.HistogramValues.
A duplicate interface exists in org.opensearch.indices.fielddata with identical method signatures to the established abstract class in org.opensearch.index.fielddata.HistogramValues. The new interface is unused throughout the codebase while the abstract class is actively imported and used in 9+ aggregator implementations. This creates unnecessary confusion with identically named types in different packages and should be removed unless intended for plugin/SPI use.
🤖 Prompt for AI Agents
In @server/src/main/java/org/opensearch/indices/fielddata/HistogramValues.java
around lines 13-16, Delete the unused interface
org.opensearch.indices.fielddata.HistogramValues (remove the file and its
declaration), and update any imports/usages to reference the established
abstract class org.opensearch.index.fielddata.HistogramValues instead; ensure no
code still imports org.opensearch.indices.fielddata.HistogramValues and run a
compile to verify all aggregator classes that rely on HistogramValues are using
the abstract class symbol.
| builder.register( | ||
| HistogramAggregationBuilder.REGISTRY_KEY, | ||
| List.of(CoreValuesSourceType.NUMERIC, CoreValuesSourceType.DATE, CoreValuesSourceType.BOOLEAN), | ||
| List.of(CoreValuesSourceType.NUMERIC, CoreValuesSourceType.DATE, CoreValuesSourceType.BOOLEAN, CoreValuesSourceType.HISTOGRAM), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
find . -name "NumericHistogramAggregator.java" -type fRepository: opensearch-project/OpenSearch
Length of output: 179
🏁 Script executed:
find . -name "HistogramValuesSource.java" -type fRepository: opensearch-project/OpenSearch
Length of output: 255
🏁 Script executed:
find . -type f -name "*HistogramAggregator*.java" | head -20Repository: opensearch-project/OpenSearch
Length of output: 2298
🏁 Script executed:
wc -l ./server/src/main/java/org/opensearch/search/aggregations/bucket/histogram/NumericHistogramAggregator.javaRepository: opensearch-project/OpenSearch
Length of output: 183
🏁 Script executed:
cat -n ./server/src/main/java/org/opensearch/search/aggregations/bucket/histogram/NumericHistogramAggregator.javaRepository: opensearch-project/OpenSearch
Length of output: 7079
🏁 Script executed:
wc -l ./server/src/main/java/org/opensearch/index/fielddata/HistogramValuesSource.javaRepository: opensearch-project/OpenSearch
Length of output: 157
🏁 Script executed:
cat -n ./server/src/main/java/org/opensearch/index/fielddata/HistogramValuesSource.javaRepository: opensearch-project/OpenSearch
Length of output: 5706
🏁 Script executed:
rg -n "getCounts" --type=java server/src/main/java/org/opensearch/search/aggregations/Repository: opensearch-project/OpenSearch
Length of output: 1490
🏁 Script executed:
wc -l ./server/src/test/java/org/opensearch/search/aggregations/bucket/histogram/NumericHistogramAggregatorTests.javaRepository: opensearch-project/OpenSearch
Length of output: 188
🏁 Script executed:
sed -n '1,100p' ./server/src/test/java/org/opensearch/search/aggregations/bucket/histogram/NumericHistogramAggregatorTests.javaRepository: opensearch-project/OpenSearch
Length of output: 4763
🏁 Script executed:
sed -n '360,380p' server/src/main/java/org/opensearch/search/aggregations/bucket/range/RangeAggregator.javaRepository: opensearch-project/OpenSearch
Length of output: 1188
🏁 Script executed:
cat -n server/src/main/java/org/opensearch/search/aggregations/bucket/range/RangeAggregator.java | sed -n '350,390p'Repository: opensearch-project/OpenSearch
Length of output: 2385
🏁 Script executed:
grep -A 30 "histogram field" ./server/src/test/java/org/opensearch/search/aggregations/bucket/histogram/NumericHistogramAggregatorTests.javaRepository: opensearch-project/OpenSearch
Length of output: 55
NumericHistogramAggregator must handle histogram value counts to produce correct aggregation results.
The implementation correctly applies DocValueFormat.RAW and calculates bucket keys with interval/offset (line 127: Math.floor((value - offset) / interval)). However, it critically overlooks histogram-specific handling: it only extracts histogram values via doubleValues() but ignores the associated counts.
HistogramValuesSource provides a getCounts() method (similar to metrics aggregators like StatsAggregator), but NumericHistogramAggregator doesn't use it. Comparison with RangeAggregator (lines 359-390) shows the correct pattern: check instanceof HistogramValuesSource, extract both histogram.getValues() and histogram.getCounts(), and iterate through counts when aggregating. Without this, each histogram value is implicitly treated as having count=1, producing incorrect bucket aggregations.
🤖 Prompt for AI Agents
In
@server/src/main/java/org/opensearch/search/aggregations/bucket/histogram/HistogramAggregatorFactory.java
around line 71, NumericHistogramAggregator currently reads histogram values via
doubleValues() but ignores per-value counts, causing each histogram entry to be
treated as count=1; update the aggregation logic in NumericHistogramAggregator
(the code that handles values iteration and key computation using
Math.floor((value - offset) / interval) and DocValueFormat.RAW) to detect when
the values source is a HistogramValuesSource, call histogram.getCounts()
alongside histogram.getValues(), and multiply/add bucket doc counts by the
corresponding count for each value (iterate counts in sync with values),
mirroring the pattern used in RangeAggregator where instanceof
HistogramValuesSource triggers extraction of both getValues() and getCounts()
and iterates counts when updating buckets.
| builder.register( | ||
| MinAggregationBuilder.REGISTRY_KEY, | ||
| List.of(CoreValuesSourceType.NUMERIC, CoreValuesSourceType.DATE, CoreValuesSourceType.BOOLEAN), | ||
| List.of(CoreValuesSourceType.NUMERIC, CoreValuesSourceType.DATE, CoreValuesSourceType.BOOLEAN, CoreValuesSourceType.HISTOGRAM), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
find . -type f -name "MinAggregatorFactory.java" | head -5Repository: opensearch-project/OpenSearch
Length of output: 164
🏁 Script executed:
cat -n ./server/src/main/java/org/opensearch/search/aggregations/metrics/MinAggregatorFactory.javaRepository: opensearch-project/OpenSearch
Length of output: 4375
🏁 Script executed:
find . -type f -name "MinAggregator.java" | head -5Repository: opensearch-project/OpenSearch
Length of output: 157
🏁 Script executed:
cat -n ./server/src/main/java/org/opensearch/search/aggregations/metrics/MinAggregator.javaRepository: opensearch-project/OpenSearch
Length of output: 18224
🏁 Script executed:
find . -type f -name "Histogram.java" -path "*/indices/fielddata/*" | head -3Repository: opensearch-project/OpenSearch
Length of output: 143
🏁 Script executed:
cat -n ./server/src/main/java/org/opensearch/indices/fielddata/Histogram.javaRepository: opensearch-project/OpenSearch
Length of output: 944
🏁 Script executed:
find . -type f -name "SumAggregator.java" -path "*/aggregations/metrics/*" | head -3Repository: opensearch-project/OpenSearch
Length of output: 157
🏁 Script executed:
cat -n ./server/src/main/java/org/opensearch/search/aggregations/metrics/SumAggregator.java | head -250Repository: opensearch-project/OpenSearch
Length of output: 12776
🏁 Script executed:
cat -n ./server/src/main/java/org/opensearch/search/aggregations/metrics/SumAggregator.java | sed -n '183,216p'Repository: opensearch-project/OpenSearch
Length of output: 2127
🏁 Script executed:
find . -type f -name "*MinAggregator*Test*.java" -o -name "*Test*MinAggregator*.java" | head -5Repository: opensearch-project/OpenSearch
Length of output: 162
🏁 Script executed:
rg -n "histogram|Histogram" ./server/src/test/java/org/opensearch/search/aggregations/metrics/MinAggregatorTests.java -iRepository: opensearch-project/OpenSearch
Length of output: 474
🏁 Script executed:
rg -n "CoreValuesSourceType.HISTOGRAM|HistogramValuesSource" ./server/src/test/java/org/opensearch/search/aggregations/metrics/MinAggregatorTests.java -iRepository: opensearch-project/OpenSearch
Length of output: 55
🏁 Script executed:
find . -type f -name "*AggregatorTests.java" -path "*/metrics/*" -exec grep -l "HistogramValuesSource\|CoreValuesSourceType.HISTOGRAM" {} \;Repository: opensearch-project/OpenSearch
Length of output: 55
🏁 Script executed:
git log --oneline -20 ./server/src/main/java/org/opensearch/search/aggregations/metrics/MinAggregator.javaRepository: opensearch-project/OpenSearch
Length of output: 110
🏁 Script executed:
git show 2079fb11 --stat | head -30Repository: opensearch-project/OpenSearch
Length of output: 1667
🏁 Script executed:
git show 2079fb11 -- ./server/src/main/java/org/opensearch/search/aggregations/metrics/MinAggregator.java | head -200Repository: opensearch-project/OpenSearch
Length of output: 8847
Histogram weight/count handling missing in MinAggregator histogram collection.
The histogram support implementation in MinAggregator is incomplete. While RAW format is correctly applied and histogram values are accessed, the implementation ignores histogram counts entirely.
The collector (lines 180-189) only uses values[0] without checking histogram counts:
double[] values = histogram.getValues();
if (values.length > 0) {
double value = values[0];
// ...
}This differs from SumAggregator's histogram handling (lines 206-207), which correctly applies counts:
for (int i = 0; i < histValues.length; i++) {
kahanSummation.add(histValues[i] * counts[i]);
}For min aggregation to be semantically correct with histogram data, values with zero counts should not contribute to the minimum calculation. Add logic to iterate through all histogram values and their corresponding counts to find the minimum value among entries with non-zero counts, similar to how SumAggregator processes histogram weight information.
🤖 Prompt for AI Agents
In
@server/src/main/java/org/opensearch/search/aggregations/metrics/MinAggregatorFactory.java
around line 60, MinAggregatorFactory's histogram collection uses only
histogram.getValues()[0] and ignores counts, so update the histogram handling in
the collector to iterate over histogram.getValues() and histogram.getCounts(),
skipping entries where counts[i] == 0 and computing the minimum across only
entries with non-zero counts; use histogram.getValues() and
histogram.getCounts(), compare each value to the current min and set when lower
(ensure to handle empty/zero-count-only histograms appropriately, e.g., leave as
NaN/unset as current code expects).
| for (long j = 0; j < count; j++) { | ||
| kahanSummation.add(value); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Performance: Avoid repeated additions for histogram counts.
The inner loop for (long j = 0; j < count; j++) adds the same value repeatedly, which is O(count) per bucket. For histograms with large counts (e.g., millions), this will be extremely slow.
Use weighted addition value * count instead.
🔎 Proposed fix
// Process both values and counts
for (int i = 0; i < valuesCount; i++) {
double value = values.nextValue();
long count = countArray[i];
totalDocCount += count;
- for (long j = 0; j < count; j++) {
- kahanSummation.add(value);
- }
+ kahanSummation.add(value * count);
min = Math.min(min, value);
max = Math.max(max, value);
}Note: This changes the numerical precision characteristics slightly since Kahan summation compensates per-addition. If exact parity with repeated additions is required, consider implementing a weighted Kahan add method.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| for (long j = 0; j < count; j++) { | |
| kahanSummation.add(value); | |
| } | |
| kahanSummation.add(value * count); |
🤖 Prompt for AI Agents
In
@server/src/main/java/org/opensearch/search/aggregations/metrics/StatsAggregator.java
around lines 150-152, The histogram handling in StatsAggregator currently does
O(count) work by repeating kahanSummation.add(value) inside the loop; change
this to a single weighted addition (e.g., kahanSummation.add(value * count)) to
avoid the per-count loop and drastically improve performance for large counts,
or alternatively implement and call a weighted Kahan add method if you want to
preserve per-addition Kahan compensation semantics; update the code that uses
the long count and value inside the for-loop to perform one multiply-and-add via
kahanSummation instead of iterating.
|
|
||
| // Check if we're dealing with histogram values | ||
| if (valuesSource instanceof HistogramValuesSource) { | ||
| HistogramIndexFieldData indexFieldData = ((HistogramValuesSource) valuesSource).getHistogramFieldData(); | ||
| HistogramLeafFieldData leafFieldData = indexFieldData.load(ctx); | ||
| HistogramValues histogramValues = leafFieldData.getHistogramValues(); | ||
|
|
||
| return new LeafBucketCollectorBase(sub, values) { | ||
| @Override | ||
| public void collect(int doc, long bucket) throws IOException { | ||
| sums = bigArrays.grow(sums, bucket + 1); | ||
| compensations = bigArrays.grow(compensations, bucket + 1); | ||
|
|
||
| if (histogramValues.advanceExact(doc)) { | ||
| Histogram histogram = histogramValues.histogram(); | ||
| double[] histValues = histogram.getValues(); | ||
| long[] counts = histogram.getCounts(); | ||
|
|
||
| // Initialize Kahan summation with current values | ||
| double sum = sums.get(bucket); | ||
| double compensation = compensations.get(bucket); | ||
| kahanSummation.reset(sum, compensation); | ||
|
|
||
| // Calculate sum directly from histogram values and counts | ||
| for (int i = 0; i < histValues.length; i++) { | ||
| kahanSummation.add(histValues[i] * counts[i]); | ||
| } | ||
|
|
||
| // We don't need to process the values from doubleValues() as they're already counted | ||
| compensations.set(bucket, kahanSummation.delta()); | ||
| sums.set(bucket, kahanSummation.value()); | ||
| } | ||
| } | ||
| }; | ||
| } else { | ||
| // Original logic for non-histogram fields | ||
| return new LeafBucketCollectorBase(sub, values) { | ||
| @Override | ||
| public void collect(int doc, long bucket) throws IOException { | ||
| sums = bigArrays.grow(sums, bucket + 1); | ||
| compensations = bigArrays.grow(compensations, bucket + 1); | ||
|
|
||
| if (values.advanceExact(doc)) { | ||
| final int valuesCount = values.docValueCount(); | ||
| double sum = sums.get(bucket); | ||
| double compensation = compensations.get(bucket); | ||
| kahanSummation.reset(sum, compensation); | ||
|
|
||
| for (int i = 0; i < valuesCount; i++) { | ||
| double value = values.nextValue(); | ||
| kahanSummation.add(value); | ||
| } | ||
|
|
||
| compensations.set(bucket, kahanSummation.delta()); | ||
| sums.set(bucket, kahanSummation.value()); | ||
| } | ||
| } | ||
| }; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical: Unreachable code after return statement.
The code at lines 183-240 is unreachable because the method returns at line 181. The histogram-specific collection logic and the else branch for non-histogram fields will never execute.
The return statement at line 181 completes the LeafBucketCollectorBase block and returns it, making all subsequent code dead.
🔎 Proposed fix - restructure to proper if-else branching
@Override
public LeafBucketCollector getLeafCollector(LeafReaderContext ctx, final LeafBucketCollector sub) throws IOException {
if (valuesSource == null) {
return LeafBucketCollector.NO_OP_COLLECTOR;
}
final BigArrays bigArrays = context.bigArrays();
- final SortedNumericDoubleValues values = valuesSource.doubleValues(ctx);
final CompensatedSum kahanSummation = new CompensatedSum(0, 0);
- return new LeafBucketCollectorBase(sub, values) {
- @Override
- public void collect(int doc, long bucket) throws IOException {
- // ... existing logic ...
- }
- // ... other methods ...
- };
-
- // Check if we're dealing with histogram values
+
if (valuesSource instanceof HistogramValuesSource) {
- // histogram logic (currently unreachable)
+ HistogramIndexFieldData indexFieldData = ((HistogramValuesSource) valuesSource).getHistogramFieldData();
+ HistogramLeafFieldData leafFieldData = indexFieldData.load(ctx);
+ HistogramValues histogramValues = leafFieldData.getHistogramValues();
+
+ return new LeafBucketCollectorBase(sub, null) {
+ // ... histogram collection logic ...
+ };
} else {
- // non-histogram logic (currently unreachable)
+ final SortedNumericDoubleValues values = valuesSource.doubleValues(ctx);
+ return new LeafBucketCollectorBase(sub, values) {
+ // ... existing collection logic with stream/range support ...
+ };
}
}Note: The histogram path currently lacks collect(DocIdStream, long) and collectRange(int, int) implementations that exist in the original collector.
Committable suggestion skipped: line range outside the PR's diff.
| @Override | ||
| public ValuesSource getField(FieldContext fieldContext, AggregationScript.LeafFactory script) { | ||
| return null; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical: getField() returns null - histogram field resolution will fail.
This stub implementation returns null for getField(), which means histogram fields cannot be properly resolved through the standard ValuesSourceConfig infrastructure. Aggregations configured with a histogram field will likely fail with a NullPointerException or silently produce no results.
This method needs to return a proper ValuesSource (i.e., HistogramValuesSource) for the given field context.
🔎 Proposed implementation sketch
@Override
public ValuesSource getField(FieldContext fieldContext, AggregationScript.LeafFactory script) {
- return null;
+ if (fieldContext.indexFieldData() instanceof HistogramIndexFieldData) {
+ return new HistogramValuesSource((HistogramIndexFieldData) fieldContext.indexFieldData());
+ }
+ throw new IllegalArgumentException("Expected histogram field data but got: " + fieldContext.indexFieldData().getClass());
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| @Override | |
| public ValuesSource getField(FieldContext fieldContext, AggregationScript.LeafFactory script) { | |
| return null; | |
| } | |
| @Override | |
| public ValuesSource getField(FieldContext fieldContext, AggregationScript.LeafFactory script) { | |
| if (fieldContext.indexFieldData() instanceof HistogramIndexFieldData) { | |
| return new HistogramValuesSource((HistogramIndexFieldData) fieldContext.indexFieldData()); | |
| } | |
| throw new IllegalArgumentException("Expected histogram field data but got: " + fieldContext.indexFieldData().getClass()); | |
| } |
🤖 Prompt for AI Agents
In
@server/src/main/java/org/opensearch/search/aggregations/support/HistogramValuesSourceType.java
around lines 33-36, The getField method currently returns null; implement it to
resolve and return a HistogramValuesSource using the provided FieldContext and
AggregationScript.LeafFactory so histogram aggregations can obtain a proper
ValuesSource. Inside getField(FieldContext fieldContext,
AggregationScript.LeafFactory script) construct and return a
HistogramValuesSource (or use the existing factory/utility that wraps a
FieldContext and script into a ValuesSource) ensuring you handle missing/mapped
fields similarly to other ValuesSourceType implementations and propagate any
exceptions as before.
| // Test infinite values | ||
| e = expectThrows(MapperParsingException.class, () -> { | ||
| mapper.parse(source(b -> b.startObject("field") | ||
| .startArray("values") | ||
| .value(Double.POSITIVE_INFINITY) | ||
| .value(0.2) | ||
| .value(0.3) | ||
| .endArray() | ||
| .array("counts", 1, 2, 3) | ||
| .endObject())); | ||
| }); | ||
| assertThat(e.getMessage(), containsString("failed to parse field [field] of type [histogram]")); | ||
| assertThat(e.getCause().getMessage(), containsString("values must be numbers")); | ||
|
|
||
| // Test NaN values | ||
| e = expectThrows(MapperParsingException.class, () -> { | ||
| mapper.parse(source(b -> b.startObject("field") | ||
| .startArray("values") | ||
| .value(Double.NaN) | ||
| .value(0.2) | ||
| .value(0.3) | ||
| .endArray() | ||
| .array("counts", 1, 2, 3) | ||
| .endObject())); | ||
| }); | ||
| assertThat(e.getMessage(), containsString("failed to parse field [field] of type [histogram]")); | ||
| assertThat(e.getCause().getMessage(), containsString("values must be numbers")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
find . -name "HistogramFieldMapperTests.java" -type fRepository: opensearch-project/OpenSearch
Length of output: 154
🏁 Script executed:
sed -n '319,345p' ./server/src/test/java/org/opensearch/index/mapper/HistogramFieldMapperTests.javaRepository: opensearch-project/OpenSearch
Length of output: 1266
🏁 Script executed:
find . -name "HistogramFieldMapper.java" -type f | grep -v TestRepository: opensearch-project/OpenSearch
Length of output: 149
🏁 Script executed:
rg -n "parseCreateField" --type java | grep -i histogramRepository: opensearch-project/OpenSearch
Length of output: 229
🏁 Script executed:
rg -n "values must be numbers" --type javaRepository: opensearch-project/OpenSearch
Length of output: 760
🏁 Script executed:
rg -l "HistogramFieldMapper" --type java | grep -v TestRepository: opensearch-project/OpenSearch
Length of output: 298
🏁 Script executed:
rg -l "histogram" --type java -i | head -20Repository: opensearch-project/OpenSearch
Length of output: 1909
🏁 Script executed:
ls -la server/src/main/java/org/opensearch/index/mapper/ | grep -i histogramRepository: opensearch-project/OpenSearch
Length of output: 149
🏁 Script executed:
ls server/src/main/java/org/opensearch/index/mapper/ | head -30Repository: opensearch-project/OpenSearch
Length of output: 908
🏁 Script executed:
rg "class.*Histogram" --type java | head -20Repository: opensearch-project/OpenSearch
Length of output: 3879
🏁 Script executed:
head -50 ./server/src/test/java/org/opensearch/index/mapper/HistogramFieldMapperTests.javaRepository: opensearch-project/OpenSearch
Length of output: 1948
🏁 Script executed:
sed -n '99,160p' ./server/src/main/java/org/opensearch/index/mapper/HistogramFieldMapper.javaRepository: opensearch-project/OpenSearch
Length of output: 2845
🏁 Script executed:
wc -l ./server/src/main/java/org/opensearch/index/mapper/HistogramFieldMapper.javaRepository: opensearch-project/OpenSearch
Length of output: 153
🏁 Script executed:
sed -n '99,200p' ./server/src/main/java/org/opensearch/index/mapper/HistogramFieldMapper.javaRepository: opensearch-project/OpenSearch
Length of output: 4016
🏁 Script executed:
rg -n "isFinite|isInfinite|isNaN" ./server/src/main/java/org/opensearch/index/mapper/HistogramFieldMapper.javaRepository: opensearch-project/OpenSearch
Length of output: 55
🏁 Script executed:
cat ./server/src/main/java/org/opensearch/index/mapper/HistogramFieldMapper.javaRepository: opensearch-project/OpenSearch
Length of output: 9573
Tests for Infinity and NaN will fail—mapper does not validate these special double values.
The test assertions expect Infinity and NaN to be rejected with message "values must be numbers", but the parseCreateField() method accepts them silently. The error "values must be numbers" is only thrown when a token is not a VALUE_NUMBER, and XContentParser correctly identifies Infinity and NaN as valid numeric values. These will be parsed via parser.doubleValue() and added to the values list. The validateHistogramData() method has no checks for Double.isFinite() or Double.isNaN(). Add validation to reject these special values.
🤖 Prompt for AI Agents
In
@server/src/test/java/org/opensearch/index/mapper/HistogramFieldMapperTests.java
around lines 319-345, The parser currently accepts Double.POSITIVE_INFINITY and
Double.NaN because XContentParser treats them as numeric tokens and
parseCreateField() calls parser.doubleValue() without rejecting special values;
update validation so validateHistogramData() (or immediately after values are
parsed in parseCreateField()) checks each value with Double.isFinite(value) (or
equivalently !Double.isInfinite && !Double.isNaN) and throws a
MapperParsingException with a message containing "values must be numbers" when
any value is not finite; ensure the thrown exception becomes the cause so
existing tests asserting containsString("values must be numbers") continue to
pass.
|
❌ Gradle check result for 2079fb1: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Draft PR to support for Histogram field and aggregations.
Description
[Describe what this change achieves]
Related Issues
Addresses: #15284
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.
Summary by CodeRabbit
Release Notes
✏️ Tip: You can customize this high-level summary in your review settings.