-
Notifications
You must be signed in to change notification settings - Fork 181
Implement reverse performance optimization
#4775
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
4483045 to
0246535
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
QQ: I recall the major comment on original PR is early optimization in analyzer layer. Is this new PR trying to address the concern? Ref: #4056 (comment)
Hi Chen, I think that's a valid concern. However, after trying it out, I think it has significant complexity comparing to the current approach. I think |
noCharger
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add benchmark results on before VS after?
|
LGTM. Please get other signoffs. |
|
Hi @ahkcs , #4784 allows user to specify a timestamp field in Although I doubt that there isn't much impact because all timechart commands have a sort at the end of their plans, making them to fall into your first tier. Can you please double check? |
631a24d to
8977a8f
Compare
📝 WalkthroughSummary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings. WalkthroughThis change replaces a ROW_NUMBER-based reverse implementation with a collation-centric strategy in Calcite planning: it backtracks the RelNode tree to locate Sort collations, reverses them (or inserts reversed Sorts), falls back to sorting by Changes
Sequence Diagram(s)sequenceDiagram
actor User
participant Parser
participant CalciteVisitor
participant RelNodeTree
participant PlanUtils
participant OpenSearch
User->>Parser: submit query (includes Reverse)
Parser->>CalciteVisitor: produce RelNode with Reverse node
activate CalciteVisitor
CalciteVisitor->>RelNodeTree: backtrackForCollation(startingNode)
alt Sort with non-empty collation found
RelNodeTree-->>CalciteVisitor: returns Sort + RelCollation
CalciteVisitor->>PlanUtils: reverseCollation(collation)
PlanUtils-->>CalciteVisitor: reversed collation
CalciteVisitor->>RelNodeTree: insertReversedSortInTree(at located Sort)
RelNodeTree-->>CalciteVisitor: rewritten RelNode plan
else No sort found but `@timestamp` exists
RelNodeTree-->>CalciteVisitor: indicates `@timestamp` present
CalciteVisitor->>RelNodeTree: insert Sort(`@timestamp` DESC)
else Blocked or no sortable path
RelNodeTree-->>CalciteVisitor: blocked/no-op
end
deactivate CalciteVisitor
CalciteVisitor->>OpenSearch: pushdown request (with reversed sort if applied)
OpenSearch-->>CalciteVisitor: results
CalciteVisitor-->>User: deliver final results
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Suggested labels
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing touches
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Nitpick comments (4)
docs/user/ppl/cmd/reverse.rst (1)
64-81: Example 2 uses hardcoded future dates - consider updating for realism.The example shows timestamps from July 2025 (
2025-07-28), which are in the future relative to the current date (November 2025 based on context). While this doesn't affect functionality, using realistic past timestamps or noting these are sample values would improve documentation quality.This is a minor documentation nitpick.
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java (1)
468-473: Consider future-proofingtestExplainReverseWithTimestampfor configurable time fields
testExplainReverseWithTimestampcurrently assumes@timestampas the time field. With the separate work allowing configurable time fields (e.g., intimechart), you may eventually want a companion explain test that asserts reverse uses the resolved time field rather than hard-coding@timestamp, to prevent regressions when that logic evolves.ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLReverseTest.java (1)
345-360: Result-order assertion after aggregation may be fragileIn
testReverseAfterSortAndAggregationIsNoOp, the expected result string relies on a specific row order from an aggregation without an explicit ORDER BY. If the underlying engine ever changes its grouping or output-order behavior, this test could fail despite reverse still being a no-op.You might consider either:
- dropping the result-order assertion and only checking the logical plan, or
- adding an explicit ORDER BY in the PPL and adjusting expectations accordingly.
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteReverseCommandIT.java (1)
180-191: Consider using unordered verification for no-op test.When
reverseis a no-op, the "natural order" is technically undefined and could vary based on shard allocation or segment merges. While freshly-loaded test indices are typically stable, usingverifyDataRows(unordered) or explicitly sorting would make this test more robust against flakiness.- // Without sort or @timestamp, reverse is ignored, so data comes in natural order - // The first 3 documents in natural order (ascending by account_number) - verifyDataRowsInOrder(result, rows(1), rows(6), rows(13)); + // Without sort or @timestamp, reverse is ignored, so data comes in natural (undefined) order + // Just verify we get 3 rows with valid account numbers + verifyDataRows(result, rows(1), rows(6), rows(13));Alternatively, keep the ordered assertion if you're confident the test data loading order is deterministic.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (23)
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java(2 hunks)core/src/main/java/org/opensearch/sql/calcite/utils/PlanUtils.java(2 hunks)docs/category.json(1 hunks)docs/user/ppl/cmd/reverse.rst(5 hunks)integ-test/src/test/java/org/opensearch/sql/calcite/CalciteNoPushdownIT.java(1 hunks)integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java(1 hunks)integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteReverseCommandIT.java(6 hunks)integ-test/src/test/resources/expectedOutput/calcite/explain_double_reverse_ignored.yaml(1 hunks)integ-test/src/test/resources/expectedOutput/calcite/explain_double_reverse_pushdown_multiple.yaml(1 hunks)integ-test/src/test/resources/expectedOutput/calcite/explain_double_reverse_pushdown_single.yaml(1 hunks)integ-test/src/test/resources/expectedOutput/calcite/explain_reverse_ignored.yaml(1 hunks)integ-test/src/test/resources/expectedOutput/calcite/explain_reverse_pushdown_multiple.yaml(1 hunks)integ-test/src/test/resources/expectedOutput/calcite/explain_reverse_pushdown_single.yaml(1 hunks)integ-test/src/test/resources/expectedOutput/calcite/explain_reverse_with_timestamp.yaml(1 hunks)integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_double_reverse_ignored.yaml(1 hunks)integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_double_reverse_pushdown_multiple.yaml(1 hunks)integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_double_reverse_pushdown_single.yaml(1 hunks)integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_reverse_ignored.yaml(1 hunks)integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_reverse_pushdown_multiple.yaml(1 hunks)integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_reverse_pushdown_single.yaml(1 hunks)integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_reverse_with_timestamp.yaml(1 hunks)ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLReverseTest.java(5 hunks)ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLStreamstatsTest.java(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (27)
- GitHub Check: bwc-tests-full-restart (21)
- GitHub Check: bwc-tests-full-restart (25)
- GitHub Check: build-linux (25, doc)
- GitHub Check: build-linux (21, integration)
- GitHub Check: build-linux (25, integration)
- GitHub Check: build-linux (25, unit)
- GitHub Check: build-linux (21, unit)
- GitHub Check: bwc-tests-rolling-upgrade (21)
- GitHub Check: bwc-tests-rolling-upgrade (25)
- GitHub Check: build-linux (21, doc)
- GitHub Check: security-it-linux (25)
- GitHub Check: security-it-linux (21)
- GitHub Check: build-windows-macos (macos-14, 21, unit)
- GitHub Check: build-windows-macos (macos-14, 21, integration)
- GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
- GitHub Check: build-windows-macos (macos-14, 25, unit)
- GitHub Check: build-windows-macos (macos-14, 21, doc)
- GitHub Check: build-windows-macos (macos-14, 25, integration)
- GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
- GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
- GitHub Check: build-windows-macos (macos-14, 25, doc)
- GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
- GitHub Check: security-it-windows-macos (macos-14, 21)
- GitHub Check: security-it-windows-macos (macos-14, 25)
- GitHub Check: security-it-windows-macos (windows-latest, 21)
- GitHub Check: security-it-windows-macos (windows-latest, 25)
- GitHub Check: CodeQL-Scan (java)
🔇 Additional comments (36)
integ-test/src/test/resources/expectedOutput/calcite/explain_reverse_with_timestamp.yaml (1)
1-13: Let me analyze the original review comment based on the provided information without repository access.Looking at the original review comment, I can make the following observations:
- Content Analysis: The YAML snippet shows valid YAML structure with proper indentation and quoting
- Logical Plan: The logical plan shows proper DESC sort on @timestamp field ($0), which aligns with stated PR objectives
- Physical Plan: The physical plan correctly pushes down the sort operation with proper DESC ordering and
"missing": "_first"configuration- Review Structure: The original review has contradictory elements - it includes
tag while also requestingwith a verification scriptKey Issue: The original review requests verification via shell script execution but includes approval tags. Without access to verify companion test files or run the verification script, I cannot confirm the consistency claims across related test resources.
Test resource shows correct DESC sort on @timestamp with proper push-down to OpenSearch, but verification of consistency with related test files cannot be completed.
The logical and physical plans correctly implement DESC sorting on @timestamp with appropriate missing value handling. However, the requested verification against companion test cases (e.g.,
explain_reverse_ignored.yaml) could not be performed due to inability to access the repository file structure. The YAML syntax appears valid based on static inspection, and the logical progression from LogicalSort through LogicalProject to CalciteLogicalIndexScan is sound.docs/category.json (1)
45-45: LGTM!Documentation entry correctly added in alphabetical order within the
ppl_cli_calcitecategory.integ-test/src/test/resources/expectedOutput/calcite/explain_reverse_pushdown_multiple.yaml (1)
1-19: LGTM!The expected output correctly demonstrates:
- Sort direction reversal (DESC→ASC for age, ASC→DESC for firstname)
- Null direction reversal (nulls-last↔nulls-first)
- Proper pushdown of reversed sort to OpenSearch with correct JSON structure
core/src/main/java/org/opensearch/sql/calcite/utils/PlanUtils.java (1)
594-623: LGTM!The
reverseCollationutility is well-implemented:
- Properly handles null/empty input with early return
- Correctly uses
direction.reverse()API for sort direction- Appropriately flips null direction (FIRST↔LAST, preserving UNSPECIFIED)
- Returns immutable collation via
RelCollations.of()core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java (4)
786-788: The build/push pattern is correct for replacing the current node.The sequence
build()(pop) followed bypush(rebuiltTree)correctly replaces the current node on the RelBuilder stack with the rebuilt tree containing the reversed sort.
690-724: Backtracking logic is well-designed with comprehensive blocking operator checks.The method correctly identifies operators that would invalidate sort ordering:
- Aggregate (destroys row-level ordering)
- BiRel (covers Join, Correlate)
- SetOp (Union, Intersect, Except)
- Uncollect (unnesting)
- LogicalProject with window functions (ordering from window's ORDER BY)
Based on learnings from past review comments, the blocking operators were expanded to cover these edge cases.
735-761: I'm unable to clone the repository to verify the test cases and implementation details directly. However, based on the code snippet provided in the review comment, I can analyze the logical correctness of the concern.The code appears to correctly preserve LIMIT semantics as implemented.
Analysis of the
insertReversedSortInTreemethod:
Tree Structure: The method creates a new
LogicalSorton top of the original sort node (line 753-754), rather than replacing it. The original sort with its fetch/offset constraints remains in the tree.Execution Flow: When a query like
source=t | sort age | head 10 | reverseis executed:
- The original
LogicalSortnode (with fetch=10) applies the limit first- The reversed
LogicalSort(with null fetch/offset) is applied on top, reversing the already-limited result- The limit is preserved because the original sort retains its fetch constraint
Why null fetch/offset is correct: The reversed sort on top doesn't need a fetch/offset because it operates on an already-limited dataset from the original sort. Setting fetch/offset on the reversed sort would be redundant and incorrect.
However, I cannot conclusively verify this claim without access to test cases or the actual query execution flow due to repository access issues. The logical analysis suggests the implementation is sound, but there could be edge cases or runtime considerations not apparent from the code alone.
769-776: Based on my web search findings, I can now provide a verification of the review comment's concern.The review concern about edge cases is valid and well-founded.
The web search confirms that Calcite's
MetadataQuery.collations()can indeed return collations from multiple sources:
- TableScan/RelOptTable declared collations (index or table-level ordering)
- Upstream input collations (from Sort, Project, Calc, Window, Join operators)
- Provider-specific logic (enumerable implementations, planner-specific handlers)
This means your concern is correct: a collation found via
MetadataQuery.collations()may come from:
- An index's natural ordering (no explicit Sort node)
- A table's declared collation
- Upstream operators that preserve collations
When the code calls
relBuilder.sort(reversedCollation), it does add a new sort operation if the collation source isn't an actual Sort node. However, the semantics appear intentionally correct—reversing a known ordering (whether from index, table, or upstream) and applying it via sort is still semantically valid.The issue is verification coverage, not the logic itself. Without access to the repository to check test files, I cannot definitively verify whether all these edge cases (especially index natural ordering without explicit Sort nodes) are covered by tests.
Consider documenting the edge case: collation from metadata query sources beyond Sort nodes.
The code correctly reverses collations from all metadata sources (index ordering, table collations, upstream operators), not just explicit Sort nodes. However, for clarity with the PR's "flip sort directions" documentation, add a comment explaining that
MetadataQuery.collations()may return ordering information from index definitions or table metadata, and that the reversed collation is applied uniformly regardless of source. This would also clarify that new sort operations may be added for index-based orderings.docs/user/ppl/cmd/reverse.rst (1)
14-43: Documentation clearly explains the three-tier behavior and optimization benefits.The description, behavior section, and note effectively communicate:
- The conditional logic based on sort/timestamp presence
- Performance benefits of avoiding materialization
- Memory optimization rationale
This aligns well with the PR objectives of enabling pushdown and avoiding expensive operations on large datasets.
integ-test/src/test/java/org/opensearch/sql/calcite/CalciteNoPushdownIT.java (1)
91-91: Include reverse tests in no-pushdown suite – looks goodAdding
CalciteReverseCommandIT.classhere keeps reverse coverage consistent for both pushdown and no-pushdown modes. No further changes needed.integ-test/src/test/resources/expectedOutput/calcite/explain_double_reverse_ignored.yaml (1)
1-7: Double-reverse ignored plan matches no-op semanticsLogical/physical plans show no sort introduced for the double
reversecase; only limit/project are applied, which matches the described no-op behavior. Fixture looks consistent.integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_double_reverse_ignored.yaml (1)
1-9: No-pushdown double-reverse ignored fixture is consistentPlan omits any sort and only reflects system limit and projection, which aligns with the intended “double reverse is a no-op” behavior under no-pushdown.
integ-test/src/test/resources/expectedOutput/calcite/explain_reverse_pushdown_single.yaml (1)
1-10: Single-field reverse pushdown plan matches optimization intentLogical plan shows original + reversed sorts; physical plan collapses to a single pushed-down
age ASCsort with limit, consistent with the described reverse optimization.integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_reverse_with_timestamp.yaml (1)
1-12: Reverse with @timestamp (no-pushdown) plan matches documented behaviorPlan adds a DESC sort on
@timestampplus the head limit, implemented via EnumerableSort/Limit rather than pushdown, which is exactly what the no-pushdown path is expected to do.integ-test/src/test/resources/expectedOutput/calcite/explain_double_reverse_pushdown_multiple.yaml (1)
1-20: Double-reverse multi-field pushdown fixture reflects canceled reversesLogical sorts show the flip–flip pattern, while the physical plan pushes down the original
age DESC/firstname.keyword ASCsort with limit, which aligns with the double-reverse semantics.integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_reverse_pushdown_multiple.yaml (1)
1-12: No-pushdown multi-field reverse plan looks correctThe logical plan shows original + reversed sorts; the physical plan retains only the final reversed collation as an
EnumerableSort(no pushdown), which matches the intended behavior for this configuration despite the filename.integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_double_reverse_pushdown_single.yaml (1)
1-13: Double-reverse no-pushdown logical/physical shape looks consistentLogical and physical plans correctly reflect two reverses on a single sort (triple LogicalSort wrapper, single physical DESC sort under a LIMIT), matching the intended “double reverse = original ordering” behavior in the no-pushdown path.
integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_reverse_pushdown_single.yaml (1)
1-12: Single reverse no-pushdown plan matches flipped sort semanticsThe expected logical/physical plans correctly implement
sort - age | reverseas an ASC sort on age at the physical layer, while preserving the intermediate DESC/ASC logical structure.integ-test/src/test/resources/expectedOutput/calcite/explain_double_reverse_pushdown_single.yaml (1)
1-15: Double-reverse pushdown correctly preserves original DESC sort at indexThe explain output shows triple sorts in the logical plan but a single DESC sort in PushDownContext, so the pushed sort matches the original
sort - ageafter two reverses, as desired.integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_double_reverse_pushdown_multiple.yaml (1)
1-13: Multi-key double-reverse no-pushdown plan preserves original collationFinal physical sort on
(age DESC, firstname ASC)after two reverses matches the expected “last sort wins, double reverse = original” semantics for multi-field sorts.integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_reverse_ignored.yaml (1)
1-11: Reverse-no-op (no sort, no @timestamp) is represented correctlyThe plan omits any additional sort for
reverseand only shows thefetch=[5]limit/sort forhead 5, matching the “reverse ignored” behavior in this context.integ-test/src/test/resources/expectedOutput/calcite/explain_reverse_ignored.yaml (1)
1-8: Pushdown reverse-no-op keeps only LIMITs in contextThe pushed plan correctly omits any reverse sort and carries both the explicit
head 5and globalQUERY_SIZE_LIMITas LIMIT entries, withsize=5in the request builder.integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java (1)
417-473: Reverse explain coverage for sort/no-sort/timestamp looks solidThe new ITs exercise the key reverse scenarios (ignored w/o sort/@timestamp, single & multi-field pushdown, double reverse, and
@timestamp-driven sort) against explain output, which lines up well with the documented behavior matrix for this PR.ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLReverseTest.java (5)
12-23: Reverse behavior Javadoc is clear and matches the implemented test matrixThe class-level documentation concisely captures the three reverse modes (flip existing collation, use
@timestamp, no-op) and points to integration tests for the non-collation cases, which aligns well with the scenarios exercised below.
159-240: Planner tests for multiple/multi-field sorts with reverse are well-structuredThe tests from
testMultipleSortsWithReverseParserSuccessthroughtestReverseWithFieldsAndSortParserSuccessaccurately encode the expected logical plans and Spark SQL for:
- multiple sequential sorts where reverse targets the last one,
- multi-field collations with direction flipping, and
- interaction with
fieldsprojections.These should give good safety nets for future refactors of the reverse/backtracking logic.
242-290: Head-then-sort-then-reverse no-opt test correctly guards semantics
testHeadThenSortReverseNoOptandtestSortFieldsReverseexplicitly assert the presence and ordering of the three LogicalSort nodes (fetch, sort, reverse) and the backtracking case where the sort key is projected away, which is important to prevent “helpful” optimizations that would silently change PPL semantics.
297-360: Reverse-no-op tests after aggregation/join match the “collation destroyed” ruleThe tests asserting that reverse is ignored after aggregation and joins, and after
sortfollowed by aggregation, correctly expect no additional LogicalSort node and no ORDER BY in the generated SQL, demonstrating that reverse doesn’t try to infer collation past blocking operators.
362-458: Reverse through filters/eval/join+sort is comprehensively coveredThe remaining tests (reverse after where/eval/multiple filters, sort–join–sort–reverse, and reverse on a post-aggregation sort) collectively validate the backtracking strategy and ensure reverse only inverts collations where they’re still semantically valid, which is exactly what this optimization needs.
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteReverseCommandIT.java (8)
8-32: LGTM!The import additions and setup modifications properly support the new test cases. Loading
TIME_TEST_DATAfor@timestampfield tests andSTATE_COUNTRYforstreamstatstests is appropriate.
35-110: LGTM!The existing tests are correctly updated to include explicit
sortcommands beforereverse, aligning with the new pushdown-friendly implementation. The expected results remain consistent, validating the optimization preserves correctness.
193-216: LGTM!This test correctly validates that when no explicit sort exists but
@timestampis present,reverseadds a@timestamp DESCsort. The expected values align with the time-series test data.Minor note: The comment on line 208 mentions "IDs 100, 99, 98, 97, 96" which may be outdated or confusing since the test is ordered by timestamp, not IDs.
218-229: LGTM!This test correctly validates that explicit sort takes precedence over
@timestampfallback. The reverse operation flips thesort valuedirection rather than applying@timestamp DESC.
231-338: LGTM - addresses past review concerns about streamstats.These tests comprehensively cover the
streamstats | reversescenarios discussed in previous reviews:
- No-op when
__stream_seq__is projected out and not detectable- Working reverse when backtracking finds collation (partition case)
- Working reverse with explicit sort after streamstats
This aligns with the backtracking implementation mentioned in the past review discussion.
358-372: Inconsistency between expected behavior and verification method.The comment on line 367-368 states that with explicit sort and reverse, data should be in descending gender order (M, F). However, line 371 uses
verifyDataRows(unordered) rather thanverifyDataRowsInOrder.If
reverseis working correctly aftersort gender, the output order should be deterministic and you should use ordered verification:- // Note: Due to column reordering after stats (c, gender), the result order - // may differ from expected. Using unordered verification for robustness. - verifyDataRows(result, rows(4, "M"), rows(3, "F")); + // With explicit sort and reverse, data is in descending gender order: M, F + verifyDataRowsInOrder(result, rows(4, "M"), rows(3, "F"));If there's a known issue preventing ordered output after aggregation+sort+reverse, please clarify in the comment.
389-440: LGTM!These tests properly validate that the backtracking logic traverses through non-blocking operators (filters, eval/project) to find the underlying sort collation. The expected results correctly reflect the filtered data in reversed sort order.
442-456: LGTM!This test correctly validates that the
@timestampfallback only applies when@timestampis present in the output schema. Since aggregation (stats count() as c by category) doesn't include@timestamp, reverse correctly becomes a no-op.
| public void testReverseWithDescendingSort() throws IOException { | ||
| // Test reverse with descending sort (- age) | ||
| JSONObject result = | ||
| executeQuery( | ||
| String.format( | ||
| "source=%s | sort account_number | fields account_number | reverse | head 3", | ||
| "source=%s | sort - account_number | fields account_number | reverse", | ||
| TEST_INDEX_BANK)); | ||
| verifySchema(result, schema("account_number", "bigint")); | ||
| verifyDataRowsInOrder(result, rows(32), rows(25), rows(20)); | ||
| verifyDataRowsInOrder( | ||
| result, rows(1), rows(6), rows(13), rows(18), rows(20), rows(25), rows(32)); | ||
| } | ||
|
|
||
| @Test | ||
| public void testReverseWithMixedSortDirections() throws IOException { | ||
| // Test reverse with mixed sort directions (- age, + firstname) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix comments to match the actual sort fields.
The comments reference age but the code sorts by account_number:
- Line 114: "Test reverse with descending sort (- age)"
- Line 127: "Test reverse with mixed sort directions (- age, + firstname)"
@Test
public void testReverseWithDescendingSort() throws IOException {
- // Test reverse with descending sort (- age)
+ // Test reverse with descending sort (- account_number)
JSONObject result = @Test
public void testReverseWithMixedSortDirections() throws IOException {
- // Test reverse with mixed sort directions (- age, + firstname)
+ // Test reverse with mixed sort directions (- account_number, + firstname)
JSONObject result =🤖 Prompt for AI Agents
In
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteReverseCommandIT.java
around lines 113 to 127, update the misleading inline comments that mention
"age" to reflect the actual sort field(s) used in the tests: change the comment
on line 114 to indicate a descending sort on account_number (e.g., "Test reverse
with descending sort (- account_number)") and change the comment on line 127 to
match the mixed sort fields used in that test (replace "- age, + firstname" with
the actual sort expression present in the test, e.g., "- account_number, +
firstname" or the correct fields), keeping comment style consistent with
surrounding tests.
| @Test | ||
| public void testDoubleReverseWithDescendingSort() throws IOException { | ||
| // Test double reverse with descending sort (- age) | ||
| JSONObject result = | ||
| executeQuery( | ||
| String.format( | ||
| "source=%s | sort - account_number | fields account_number | reverse | reverse", | ||
| TEST_INDEX_BANK)); | ||
| verifySchema(result, schema("account_number", "bigint")); | ||
| verifyDataRowsInOrder( | ||
| result, rows(32), rows(25), rows(20), rows(18), rows(13), rows(6), rows(1)); | ||
| } | ||
|
|
||
| @Test | ||
| public void testDoubleReverseWithMixedSortDirections() throws IOException { | ||
| // Test double reverse with mixed sort directions (- age, + firstname) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment inconsistency: "age" should be "account_number".
Lines 148 and 161 also reference "- age" but the code uses "- account_number".
@Test
public void testDoubleReverseWithDescendingSort() throws IOException {
- // Test double reverse with descending sort (- age)
+ // Test double reverse with descending sort (- account_number)
JSONObject result = @Test
public void testDoubleReverseWithMixedSortDirections() throws IOException {
- // Test double reverse with mixed sort directions (- age, + firstname)
+ // Test double reverse with mixed sort directions (- account_number, + firstname)
JSONObject result =📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| @Test | |
| public void testDoubleReverseWithDescendingSort() throws IOException { | |
| // Test double reverse with descending sort (- age) | |
| JSONObject result = | |
| executeQuery( | |
| String.format( | |
| "source=%s | sort - account_number | fields account_number | reverse | reverse", | |
| TEST_INDEX_BANK)); | |
| verifySchema(result, schema("account_number", "bigint")); | |
| verifyDataRowsInOrder( | |
| result, rows(32), rows(25), rows(20), rows(18), rows(13), rows(6), rows(1)); | |
| } | |
| @Test | |
| public void testDoubleReverseWithMixedSortDirections() throws IOException { | |
| // Test double reverse with mixed sort directions (- age, + firstname) | |
| @Test | |
| public void testDoubleReverseWithDescendingSort() throws IOException { | |
| // Test double reverse with descending sort (- account_number) | |
| JSONObject result = | |
| executeQuery( | |
| String.format( | |
| "source=%s | sort - account_number | fields account_number | reverse | reverse", | |
| TEST_INDEX_BANK)); | |
| verifySchema(result, schema("account_number", "bigint")); | |
| verifyDataRowsInOrder( | |
| result, rows(32), rows(25), rows(20), rows(18), rows(13), rows(6), rows(1)); | |
| } | |
| @Test | |
| public void testDoubleReverseWithMixedSortDirections() throws IOException { | |
| // Test double reverse with mixed sort directions (- account_number, + firstname) |
🤖 Prompt for AI Agents
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteReverseCommandIT.java
around lines 146 to 161: the inline test comments incorrectly refer to "- age"
while the test actually sorts by account_number; update the two comments at
lines 148 and 161 to reference "- account_number" (or a neutral description like
"descending account_number") so the comment matches the code under test.
ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLStreamstatsTest.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteReverseCommandIT.java (1)
114-116: Inline comments still refer to “age” while tests sort byaccount_numberThe four comments in the descending/mixed-sort tests still mention “age”, but the queries all sort on
account_number:
- Line 115:
// Test reverse with descending sort (- age)- Line 128:
// Test reverse with mixed sort directions (- age, + firstname)- Line 149:
// Test double reverse with descending sort (- age)- Line 162:
// Test double reverse with mixed sort directions (- age, + firstname)This can mislead readers when debugging test failures.
Consider updating them along these lines:
- // Test reverse with descending sort (- age) + // Test reverse with descending sort (- account_number) ... - // Test reverse with mixed sort directions (- age, + firstname) + // Test reverse with mixed sort directions (- account_number, + firstname) ... - // Test double reverse with descending sort (- age) + // Test double reverse with descending sort (- account_number) ... - // Test double reverse with mixed sort directions (- age, + firstname) + // Test double reverse with mixed sort directions (- account_number, + firstname)Also applies to: 128-129, 149-150, 162-163
🧹 Nitpick comments (4)
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteReverseCommandIT.java (4)
289-307: Comment intestStreamstatsByWithReversecontradicts the expected reversed orderingThe leading comment says:
// Test that reverse is ignored after streamstats with partitioning (by clause)but the assertion below uses
verifyDataRowsInOrder(...)with the rows clearly reversed by__stream_seq__within eachcountry:// With backtracking, reverse now works and reverses the __stream_seq__ order verifyDataRowsInOrder( result, rows("Jane", "Canada", ..., 20, 2, 22.5), rows("John", "Canada", ..., 25, 1, 25), rows("Hello", "USA", ..., 30, 2, 50), rows("Jake", "USA", ..., 70, 1, 70));To avoid confusion, the first comment should reflect that reverse does work here, for example:
- // Test that reverse is ignored after streamstats with partitioning (by clause) + // Test that reverse backtracks through streamstats-by and reverses __stream_seq__ order
445-456: Clarify intent intestReverseWithTimestampAfterAggregationcommentThe header says:
// Test that reverse uses @timestamp when aggregation destroys collation // TIME_TEST_DATA has @timestamp fieldbut the body and expectations assert the opposite:
// Even though aggregation destroys collation, there's no @timestamp in the // aggregated result, so reverse is a no-op verifyDataRows(result, rows(26, "A"), rows(25, "B"), rows(25, "C"), rows(24, "D"));Since the query
stats count() as c by category | reversedrops@timestampfrom the result, reverse is intentionally a no-op here.Consider rephrasing the header to match the actual behavior, e.g.:
- // Test that reverse uses @timestamp when aggregation destroys collation - // TIME_TEST_DATA has @timestamp field + // Test that reverse is a no-op after aggregation, even if the source index has @timestamp + // TIME_TEST_DATA has @timestamp, but the aggregated result no longer exposes it
359-373:testReverseAfterAggregationWithSortdoesn’t actually validate the reversed orderingThe comments state that reverse should produce descending gender order:
// With explicit sort and reverse, data should be in descending gender order // Sort by gender ASC: F, M -> Reverse: M, Fbut the assertion uses
verifyDataRows(...), which ignores row order and only checks the multiset of rows:verifyDataRows(result, rows(4, "M"), rows(3, "F"));If the goal is to verify that reverse correctly flips the post-aggregation sort, consider switching to
verifyDataRowsInOrder(...)and tightening the comment; otherwise, adjust the comment to say you’re only validating the result set, not its ordering.This is test quality, not functionality, so can be deferred but would strengthen coverage of the reverse behavior.
498-515:testTimechartWithGroupByAndReversecomment claims reversed order, but the assertion is unorderedThe test description:
// Test timechart with group by and reverse // The sort is on both @timestamp and the group by field ... // All events are in the same hour, so only one time bucket // Hosts are grouped and results are reversedsuggests you expect a specific row ordering after reverse, but the assertion uses
verifyDataRows(...), which doesn’t check order:verifyDataRows( result, rows("2024-07-01 00:00:00", "db-01", 1), rows("2024-07-01 00:00:00", "web-01", 2), rows("2024-07-01 00:00:00", "web-02", 2));Given timechart appends an explicit ORDER BY and reverse adds another, the final ordering should be deterministic here. To actually exercise that, you could:
- Use
verifyDataRowsInOrder(...)with the expected reversed order, or- Relax the comment to just say you’re validating the grouped result content.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteReverseCommandIT.java(6 hunks)ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLStreamstatsTest.java(1 hunks)ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLTimechartTest.java(3 hunks)
🔇 Additional comments (2)
ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLStreamstatsTest.java (1)
222-250: NewtestStreamstatsWithReverseexpectations look consistent with non-null-bucket semanticsThe logical and Spark SQL plans correctly model
reverseas an added DESC sort on__stream_seq__on top of the existing ASC sort fromstreamstats, and the aggregation uses the same non-null-bucket MAX window shape astestStreamstatsBy. Looks good.ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLTimechartTest.java (1)
53-78: Events table schema and timechart+reverse tests are coherent
- The added
created_atTIMESTAMP column is wired consistently: row literals,EventsTableprotoRowType, and the newtimefield=created_attest all agree on column ordering and types.testTimechartWithReverseandtestTimechartWithCustomTimefieldAndReversecorrectly expect:
- Inner ASC sort on the appropriate bucket (
SPAN(@timestamp, ...)orSPAN(created_at, ...)) fromtimechart.- An outer DESC sort from
reverse, with NULLS flipped toNULLS FIRST.
This aligns with the PR’s “tier 1” behavior for reversing existing collations.Also applies to: 370-425, 437-445
|
This PR is stalled because it has been open for 2 weeks with no activity. |
This commit optimizes the `reverse` command in the Calcite planner by intelligently reversing existing sort collations instead of always using the ROW_NUMBER() approach. Key changes: - Added PlanUtils.reverseCollation() method to flip sort directions and null directions - Updated CalciteRelNodeVisitor.visitReverse() to: - Check for existing sort collations - Reverse them if present (more efficient) - Fall back to ROW_NUMBER() when no sort exists - Added comprehensive integration test expected outputs for: - Single field reverse pushdown - Multiple field reverse pushdown - Reverse fallback cases - Double reverse no-op optimizations This optimization significantly improves performance when reversing already-sorted data by leveraging database-native sort reversal. Based on PR opensearch-project#4056 by @selsong Signed-off-by: Kai Huang <ahkcs@amazon.com> # Conflicts: # core/src/main/java/org/opensearch/sql/calcite/utils/PlanUtils.java
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
22426ba to
2b969fd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Fix all issues with AI Agents 🤖
In @docs/user/ppl/cmd/reverse.md:
- Around line 27-29: The fenced code block that currently contains just
"reverse" in the Syntax section lacks a language identifier; update that fence
to include a language tag (preferably "ppl" or "text") so it follows markdown
style guidelines for PPL commands—for example change the starting fence from ```
to ```ppl (or ```text) while leaving the inner content "reverse" unchanged.
♻️ Duplicate comments (1)
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteReverseCommandIT.java (1)
113-162: Fix reverse sort test comments to matchaccount_numberfields.The four comments in this block still mention
age, but all the queries sort onaccount_number:
- Line 115:
// Test reverse with descending sort (- age)- Line 128:
// Test reverse with mixed sort directions (- age, + firstname)- Line 149:
// Test double reverse with descending sort (- age)- Line 162:
// Test double reverse with mixed sort directions (- age, + firstname)Updating them to reference
account_numberwill avoid confusion when reading or grepping tests.Suggested comment updates
@Test public void testReverseWithDescendingSort() throws IOException { - // Test reverse with descending sort (- age) + // Test reverse with descending sort (- account_number) @@ @Test public void testReverseWithMixedSortDirections() throws IOException { - // Test reverse with mixed sort directions (- age, + firstname) + // Test reverse with mixed sort directions (- account_number, + firstname) @@ @Test public void testDoubleReverseWithDescendingSort() throws IOException { - // Test double reverse with descending sort (- age) + // Test double reverse with descending sort (- account_number) @@ @Test public void testDoubleReverseWithMixedSortDirections() throws IOException { - // Test double reverse with mixed sort directions (- age, + firstname) + // Test double reverse with mixed sort directions (- account_number, + firstname)
🧹 Nitpick comments (1)
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java (1)
683-809: Reverse backtracking logic looks correct; consider using existing helper to replace top of stack.The new
backtrackForCollation/insertReversedSortInTreeplus thevisitReverserewrite correctly:
- Stop at blocking nodes (Aggregate, BiRel, SetOp, Uncollect, windowed
LogicalProject), so you don’t reverse implicit or destroyed order.- Walk back through projections/filters to find the first real
Sortand insert a reversedLogicalSortjust above it.- Fall back to
@timestamp(IMPLICIT_FIELD_TIMESTAMP) DESC and treat reverse as a no-op when neither collation nor timestamp exists, which matches the documented semantics.One small maintainability tweak: instead of
RelNode rebuiltTree = insertReversedSortInTree(currentNode, reversedCollation, context); // Replace the current node in the builder with the rebuilt tree context.relBuilder.build(); context.relBuilder.push(rebuiltTree);you could use the existing utility that already encapsulates “replace top of relBuilder stack” behavior:
PlanUtils.replaceTop(context.relBuilder, rebuiltTree);This keeps stack manipulation consistent with the rest of the visitor and makes the intent clearer.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.javacore/src/main/java/org/opensearch/sql/calcite/utils/PlanUtils.javadocs/user/ppl/cmd/reverse.mdinteg-test/src/test/java/org/opensearch/sql/calcite/CalciteNoPushdownIT.javainteg-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.javainteg-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteReverseCommandIT.java
🚧 Files skipped from review as they are similar to previous changes (1)
- core/src/main/java/org/opensearch/sql/calcite/utils/PlanUtils.java
🧰 Additional context used
📓 Path-based instructions (7)
**/*.java
📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)
**/*.java: UsePascalCasefor class names (e.g.,QueryExecutor)
UsecamelCasefor method and variable names (e.g.,executeQuery)
UseUPPER_SNAKE_CASEfor constants (e.g.,MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
PreferOptional<T>for nullable returns in Java
Avoid unnecessary object creation in loops
UseStringBuilderfor string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code
Files:
integ-test/src/test/java/org/opensearch/sql/calcite/CalciteNoPushdownIT.javacore/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.javainteg-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteReverseCommandIT.javainteg-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
⚙️ CodeRabbit configuration file
**/*.java: - Flag methods >50 lines as potentially too complex - suggest refactoring
- Flag classes >500 lines as needing organization review
- Check for dead code, unused imports, and unused variables
- Identify code reuse opportunities across similar implementations
- Assess holistic maintainability - is code easy to understand and modify?
- Flag code that appears AI-generated without sufficient human review
- Verify Java naming conventions (PascalCase for classes, camelCase for methods/variables)
- Check for proper JavaDoc on public classes and methods
- Flag redundant comments that restate obvious code
- Ensure proper error handling with specific exception types
- Check for Optional usage instead of null returns
- Validate proper use of try-with-resources for resource management
Files:
integ-test/src/test/java/org/opensearch/sql/calcite/CalciteNoPushdownIT.javacore/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.javainteg-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteReverseCommandIT.javainteg-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
integ-test/**/*IT.java
📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)
End-to-end scenarios need integration tests in
integ-test/module
Files:
integ-test/src/test/java/org/opensearch/sql/calcite/CalciteNoPushdownIT.javainteg-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteReverseCommandIT.javainteg-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
⚙️ CodeRabbit configuration file
integ-test/**/*IT.java: - Integration tests MUST use valid test data from resources
- Verify test data files exist in integ-test/src/test/resources/
- Check test assertions are meaningful and specific
- Validate tests clean up resources after execution
- Ensure tests are independent and can run in any order
- Flag tests that reference non-existent indices (e.g., EMP)
- Verify integration tests are in correct module (integ-test/)
- Check tests can be run with ./gradlew :integ-test:integTest
- Ensure proper test data setup and teardown
- Validate end-to-end scenario coverage
Files:
integ-test/src/test/java/org/opensearch/sql/calcite/CalciteNoPushdownIT.javainteg-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteReverseCommandIT.javainteg-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
**/*IT.java
📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)
Name integration tests with
*IT.javasuffix in OpenSearch SQL
Files:
integ-test/src/test/java/org/opensearch/sql/calcite/CalciteNoPushdownIT.javainteg-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteReverseCommandIT.javainteg-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
**/test/**/*.java
⚙️ CodeRabbit configuration file
**/test/**/*.java: - Verify NULL input tests for all new functions
- Check boundary condition tests (min/max values, empty inputs)
- Validate error condition tests (invalid inputs, exceptions)
- Ensure multi-document tests for per-document operations
- Flag smoke tests without meaningful assertions
- Check test naming follows pattern: test
- Verify test data is realistic and covers edge cases
- Verify test coverage for new business logic
- Ensure tests are independent and don't rely on execution order
- Validate meaningful test data that reflects real-world scenarios
- Check for proper cleanup of test resources
Files:
integ-test/src/test/java/org/opensearch/sql/calcite/CalciteNoPushdownIT.javainteg-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteReverseCommandIT.javainteg-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
**/calcite/**/*.java
⚙️ CodeRabbit configuration file
**/calcite/**/*.java: - Follow existing Calcite integration patterns
- Verify RelNode visitor implementations are complete
- Check RexNode handling follows project conventions
- Validate SQL generation is correct and optimized
- Ensure Calcite version compatibility
- Follow existing patterns in CalciteRelNodeVisitor and CalciteRexNodeVisitor
- Document any Calcite-specific workarounds
- Test compatibility with Calcite version constraints
Files:
integ-test/src/test/java/org/opensearch/sql/calcite/CalciteNoPushdownIT.javacore/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.javainteg-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteReverseCommandIT.javainteg-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
core/src/main/java/**/*.java
⚙️ CodeRabbit configuration file
core/src/main/java/**/*.java: - New functions MUST have unit tests in the same commit
Files:
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
⚙️ CodeRabbit configuration file
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java: - Flag methods >50 lines - this file is known to be hard to read
- Suggest extracting complex logic into helper methods
- Check for code organization and logical grouping
- Validate all RelNode types are handled
Files:
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
🧠 Learnings (8)
📓 Common learnings
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
Applied to files:
integ-test/src/test/java/org/opensearch/sql/calcite/CalciteNoPushdownIT.javacore/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.javainteg-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteReverseCommandIT.javainteg-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*IT.java : Name integration tests with `*IT.java` suffix in OpenSearch SQL
Applied to files:
integ-test/src/test/java/org/opensearch/sql/calcite/CalciteNoPushdownIT.javainteg-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteReverseCommandIT.javainteg-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*.java : Document Calcite-specific workarounds in code
Applied to files:
integ-test/src/test/java/org/opensearch/sql/calcite/CalciteNoPushdownIT.javacore/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.javainteg-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteReverseCommandIT.java
📚 Learning: 2025-12-29T05:32:03.491Z
Learnt from: LantaoJin
Repo: opensearch-project/sql PR: 4993
File: opensearch/src/main/java/org/opensearch/sql/opensearch/planner/physical/CalciteEnumerableTopK.java:20-20
Timestamp: 2025-12-29T05:32:03.491Z
Learning: For any custom Calcite RelNode class (e.g., ones that extend EnumerableLimitSort or other Calcite RelNode types), always override the copy method. If copy is not overridden, cloning/copy operations may downgrade the instance to the parent class type, losing the custom behavior. In your implementation, ensure copy returns a new instance of the concrete class with all relevant fields and traits preserved, mirroring the current instance state.
Applied to files:
integ-test/src/test/java/org/opensearch/sql/calcite/CalciteNoPushdownIT.javacore/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.javainteg-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteReverseCommandIT.javainteg-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Follow existing patterns in `CalciteRelNodeVisitor` and `CalciteRexNodeVisitor` for Calcite integration
Applied to files:
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
📚 Learning: 2025-12-11T05:27:39.856Z
Learnt from: LantaoJin
Repo: opensearch-project/sql PR: 0
File: :0-0
Timestamp: 2025-12-11T05:27:39.856Z
Learning: In opensearch-project/sql, for SEMI and ANTI join types in CalciteRelNodeVisitor.java, the `max` option has no effect because these join types only use the left side to filter records based on the existence of matches in the right side. The join results are identical regardless of max value (max=1, max=2, or max=∞). The early return for SEMI/ANTI joins before processing the `max` option is intentional and correct behavior.
Applied to files:
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*Test.java : Name unit tests with `*Test.java` suffix in OpenSearch SQL
Applied to files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteReverseCommandIT.java
🪛 markdownlint-cli2 (0.18.1)
docs/user/ppl/cmd/reverse.md
27-27: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)
- GitHub Check: build-linux (25, unit)
- GitHub Check: build-linux (21, doc)
- GitHub Check: build-linux (25, doc)
- GitHub Check: build-linux (21, integration)
- GitHub Check: bwc-tests-full-restart (25)
- GitHub Check: build-linux (25, integration)
- GitHub Check: build-linux (21, unit)
- GitHub Check: bwc-tests-rolling-upgrade (25)
- GitHub Check: bwc-tests-rolling-upgrade (21)
- GitHub Check: bwc-tests-full-restart (21)
- GitHub Check: security-it-linux (21)
- GitHub Check: security-it-linux (25)
- GitHub Check: build-windows-macos (macos-14, 25, integration)
- GitHub Check: build-windows-macos (macos-14, 21, doc)
- GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
- GitHub Check: build-windows-macos (macos-14, 25, doc)
- GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
- GitHub Check: build-windows-macos (macos-14, 21, integration)
- GitHub Check: build-windows-macos (macos-14, 21, unit)
- GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
- GitHub Check: build-windows-macos (macos-14, 25, unit)
- GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
- GitHub Check: security-it-windows-macos (windows-latest, 21)
- GitHub Check: security-it-windows-macos (macos-14, 21)
- GitHub Check: security-it-windows-macos (windows-latest, 25)
- GitHub Check: security-it-windows-macos (macos-14, 25)
- GitHub Check: test-sql-cli-integration (21)
- GitHub Check: CodeQL-Scan (java)
🔇 Additional comments (3)
docs/user/ppl/cmd/reverse.md (1)
1-186: Documentation comprehensively and accurately reflects the new three-tier reverse optimization.The content correctly describes all three tiers (explicit sort flip, @timestamp fallback, no-op) with appropriate examples covering single sorts, multiple sorts, @timestamp cases, double reversal, and interactions with
head. Expected outputs are logically consistent with the described behavior. The optimization rationale and performance benefits are well-articulated. Excellent progressive structure from simple to complex scenarios.integ-test/src/test/java/org/opensearch/sql/calcite/CalciteNoPushdownIT.java (1)
93-93: Include reverse IT in no-pushdown suite – looks correct.Adding
CalciteReverseCommandIT.classto the suite keeps reverse behavior covered when pushdown is disabled; no issues spotted.integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java (1)
415-473: Reverse explain-plan coverage matches new semantics.The new explain tests cleanly cover: no-op reverse, single/multi-field reverse pushdown, double reverse (both ignored and cancelling out), and the @timestamp-based fallback. They follow existing patterns (
explainQueryYaml+loadExpectedPlan) and look consistent with the CalciteRelNodeVisitor implementation.
| ``` | ||
| reverse | ||
| * No parameters: The reverse command takes no arguments or options. | ||
|
|
||
| ## Note | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add language specification to code fence.
The code block in the Syntax section is missing a language identifier, which violates markdown style guidelines. Since this documents a PPL command, use ```ppl or ```text instead of an empty fence. Based on learnings, this aligns with repository markdown standards.
🔎 Proposed fix
## Syntax
-```
+```ppl
reverse</details>
<details>
<summary>🧰 Tools</summary>
<details>
<summary>🪛 markdownlint-cli2 (0.18.1)</summary>
27-27: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
</details>
</details>
<details>
<summary>🤖 Prompt for AI Agents</summary>
In @docs/user/ppl/cmd/reverse.md around lines 27-29, The fenced code block that
currently contains just "reverse" in the Syntax section lacks a language
identifier; update that fence to include a language tag (preferably "ppl" or
"text") so it follows markdown style guidelines for PPL commands—for example
change the starting fence from toppl (or ```text) while leaving the
inner content "reverse" unchanged.
</details>
<!-- fingerprinting:phantom:triton:mongoose -->
<!-- This is an auto-generated comment by CodeRabbit -->
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (4)
ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLReverseTest.java (4)
94-112: Consider adding result verification for double reverse cancellation.This test validates that double reverse produces the original ASC order in the logical plan and SQL, but doesn't verify actual result data. Adding
verifyResult()would confirm the cancellation semantics work correctly end-to-end.🔎 Suggested addition
verifyLogical(root, expectedLogical); + // Verify result data matches original ascending order + String expectedResult = + "EMPNO=7369; ENAME=SMITH; JOB=CLERK; MGR=7902; HIREDATE=1980-12-17; SAL=800.00;" + + " COMM=null; DEPTNO=20\n" + + "EMPNO=7499; ENAME=ALLEN; JOB=SALESMAN; MGR=7698; HIREDATE=1981-02-20; SAL=1600.00;" + + " COMM=300.00; DEPTNO=30\n" + // ... remaining rows in ascending EMPNO order + ; + verifyResult(root, expectedResult); + String expectedSparkSql =
135-157: Consider using specific exception types for negative tests.Using
Exception.classis broad. If the parser throws a specific exception (e.g.,SyntaxCheckException,IllegalArgumentException), using that type would make tests more precise and prevent false positives from unrelated failures.
386-413: Consider adding Spark SQL verification for consistency.
testReverseAfterEvalWithSortandtestReverseAfterMultipleFiltersWithSortonly verify the logical plan but skip Spark SQL generation verification. For consistency with other tests and to validate the full optimization path, consider addingverifyPPLToSparkSQL()calls.
292-296: Consider adding window function blocking test.The comment mentions window functions destroy collation, but there's no test for this case. Adding a test like
testReverseAfterWindowFunctionIsNoOpwould complete the blocking operator coverage.@Test public void testReverseAfterWindowFunctionIsNoOp() { // Window functions destroy input ordering String ppl = "source=EMP | eval row_num = row_number() over(order by SAL) | reverse"; RelNode root = getRelNode(ppl); // Verify no additional sort node for reverse // ... assertions }
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLReverseTest.java
🧰 Additional context used
📓 Path-based instructions (5)
**/*.java
📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)
**/*.java: UsePascalCasefor class names (e.g.,QueryExecutor)
UsecamelCasefor method and variable names (e.g.,executeQuery)
UseUPPER_SNAKE_CASEfor constants (e.g.,MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
PreferOptional<T>for nullable returns in Java
Avoid unnecessary object creation in loops
UseStringBuilderfor string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code
Files:
ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLReverseTest.java
⚙️ CodeRabbit configuration file
**/*.java: - Flag methods >50 lines as potentially too complex - suggest refactoring
- Flag classes >500 lines as needing organization review
- Check for dead code, unused imports, and unused variables
- Identify code reuse opportunities across similar implementations
- Assess holistic maintainability - is code easy to understand and modify?
- Flag code that appears AI-generated without sufficient human review
- Verify Java naming conventions (PascalCase for classes, camelCase for methods/variables)
- Check for proper JavaDoc on public classes and methods
- Flag redundant comments that restate obvious code
- Ensure proper error handling with specific exception types
- Check for Optional usage instead of null returns
- Validate proper use of try-with-resources for resource management
Files:
ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLReverseTest.java
**/*Test.java
📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)
**/*Test.java: All new business logic requires unit tests
Name unit tests with*Test.javasuffix in OpenSearch SQL
Files:
ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLReverseTest.java
**/test/**/*.java
⚙️ CodeRabbit configuration file
**/test/**/*.java: - Verify NULL input tests for all new functions
- Check boundary condition tests (min/max values, empty inputs)
- Validate error condition tests (invalid inputs, exceptions)
- Ensure multi-document tests for per-document operations
- Flag smoke tests without meaningful assertions
- Check test naming follows pattern: test
- Verify test data is realistic and covers edge cases
- Verify test coverage for new business logic
- Ensure tests are independent and don't rely on execution order
- Validate meaningful test data that reflects real-world scenarios
- Check for proper cleanup of test resources
Files:
ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLReverseTest.java
**/ppl/**/*.java
⚙️ CodeRabbit configuration file
**/ppl/**/*.java: - For PPL parser changes, verify grammar tests with positive/negative cases
- Check AST generation for new syntax
- Ensure corresponding AST builder classes are updated
- Validate edge cases and boundary conditions
Files:
ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLReverseTest.java
**/calcite/**/*.java
⚙️ CodeRabbit configuration file
**/calcite/**/*.java: - Follow existing Calcite integration patterns
- Verify RelNode visitor implementations are complete
- Check RexNode handling follows project conventions
- Validate SQL generation is correct and optimized
- Ensure Calcite version compatibility
- Follow existing patterns in CalciteRelNodeVisitor and CalciteRexNodeVisitor
- Document any Calcite-specific workarounds
- Test compatibility with Calcite version constraints
Files:
ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLReverseTest.java
🧠 Learnings (4)
📓 Common learnings
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
Learnt from: LantaoJin
Repo: opensearch-project/sql PR: 4993
File: opensearch/src/main/java/org/opensearch/sql/opensearch/planner/physical/CalciteEnumerableTopK.java:20-20
Timestamp: 2025-12-29T05:32:11.893Z
Learning: In opensearch-project/sql, when creating custom Calcite RelNode classes that extend EnumerableLimitSort or other Calcite RelNode types, always override the `copy` method. Without overriding copy, the class will downgrade to its parent class type during copy operations, losing the custom implementation.
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
Applied to files:
ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLReverseTest.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*.java : Document Calcite-specific workarounds in code
Applied to files:
ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLReverseTest.java
📚 Learning: 2025-12-29T05:32:03.491Z
Learnt from: LantaoJin
Repo: opensearch-project/sql PR: 4993
File: opensearch/src/main/java/org/opensearch/sql/opensearch/planner/physical/CalciteEnumerableTopK.java:20-20
Timestamp: 2025-12-29T05:32:03.491Z
Learning: For any custom Calcite RelNode class (e.g., ones that extend EnumerableLimitSort or other Calcite RelNode types), always override the copy method. If copy is not overridden, cloning/copy operations may downgrade the instance to the parent class type, losing the custom behavior. In your implementation, ensure copy returns a new instance of the concrete class with all relevant fields and traits preserved, mirroring the current instance state.
Applied to files:
ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLReverseTest.java
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)
- GitHub Check: security-it-linux (21)
- GitHub Check: security-it-linux (25)
- GitHub Check: build-linux (21, doc)
- GitHub Check: build-linux (25, doc)
- GitHub Check: build-linux (25, integration)
- GitHub Check: build-linux (21, unit)
- GitHub Check: build-linux (25, unit)
- GitHub Check: build-linux (21, integration)
- GitHub Check: bwc-tests-full-restart (21)
- GitHub Check: bwc-tests-rolling-upgrade (21)
- GitHub Check: bwc-tests-full-restart (25)
- GitHub Check: bwc-tests-rolling-upgrade (25)
- GitHub Check: build-windows-macos (macos-14, 25, doc)
- GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
- GitHub Check: build-windows-macos (macos-14, 25, integration)
- GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
- GitHub Check: build-windows-macos (macos-14, 25, unit)
- GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
- GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
- GitHub Check: build-windows-macos (macos-14, 21, integration)
- GitHub Check: build-windows-macos (macos-14, 21, unit)
- GitHub Check: build-windows-macos (macos-14, 21, doc)
- GitHub Check: security-it-windows-macos (macos-14, 21)
- GitHub Check: security-it-windows-macos (windows-latest, 25)
- GitHub Check: test-sql-cli-integration (21)
- GitHub Check: security-it-windows-macos (macos-14, 25)
- GitHub Check: security-it-windows-macos (windows-latest, 21)
- GitHub Check: CodeQL-Scan (java)
🔇 Additional comments (7)
ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLReverseTest.java (7)
12-27: Well-documented test class.The Javadoc clearly explains the three-tier reverse behavior and correctly references integration tests for scenarios not covered here. This helps maintainers understand test scope.
29-72: Good comprehensive test with logical, result, and SQL verification.The test effectively validates the reverse optimization by checking the logical plan structure, actual result ordering, and generated Spark SQL.
159-179: Good coverage of "last sort wins" semantics.This test correctly validates that reverse applies only to the most recent sort (
- ENAME), aligning with the PPL semantics mentioned in the PR objectives.
242-267: Good test for fetch limit semantics preservation.The comment clearly documents why this should NOT be optimized—preserving "take first 5, then sort" semantics. Consider adding result verification to confirm exactly 5 rows are returned.
292-360: Thorough blocking operator tests with clear documentation.These tests effectively validate that reverse becomes a no-op after operators that destroy collation (aggregate, join). The inline comments explain the rationale well, and
testReverseAfterSortAndAggregationIsNoOpincludes result verification.
415-432: Good test for sort-join-sort-reverse interaction.This test validates an important edge case: the sort before the join is preserved in the plan but its collation is destroyed, while the sort after the join can be reversed. This aligns with the PR's backtracking logic.
434-457: Good complementary test to aggregation no-op case.This test effectively demonstrates that while reverse after aggregation alone is a no-op, adding a sort after aggregation restores the ability to reverse. This pair of tests (
testReverseAfterAggregationIsNoOpandtestReverseAfterAggregationWithSort) provides clear coverage of the aggregation boundary behavior.
Description
Originally from #4056 by @selsong
This PR implements a significant performance optimization for the
reversecommand by eliminating the expensive ROW_NUMBER() window function and implementing a three-tier logic based on query context.Motivation
The previous implementation used ROW_NUMBER() window function which:
Solution: Three-Tier Reverse Logic
The
reversecommand now follows context-aware behavior:Implementation Details
1. Reverse with Explicit Sort (Primary Use Case)
Query:
Behavior: Flips all sort directions:
+balance, -firstname→-balance, +firstnameLogical Plan:
Physical Plan: (efficiently pushes reversed sort to OpenSearch)
2. Reverse with @timestamp (Time-Series Optimization)
Query:
Behavior: When no explicit sort exists but the index has an @timestamp field, reverse automatically sorts by @timestamp DESC to show most recent events first.
Use Case: Common pattern in log analysis - users want recent logs first
Logical Plan:
3. Reverse Ignored (No-Op Case)
Query:
Behavior: When there's no explicit sort AND no @timestamp field, reverse is ignored. Results appear in natural index order.
Rationale: Avoid expensive operations when reverse has no meaningful semantic interpretation.
Logical Plan:
Note: No sort node is added - reverse is completely ignored.
4. Double Reverse (Cancellation)
Query:
Behavior: Two reverses cancel each other out, returning to original sort order.
Logical Plan:
Final sort order matches original query:
+balance, -firstname5. Multiple Sorts + Reverse
Query:
Behavior: Reverse applies to the most recent sort (from PPL semantics, last sort wins).
Logical Plan:
Result: Only
firstnamesort is reversed (DESC → ASC). Thebalancesort is overridden by PPL's "last sort wins" rule.Related Issues
Resolves #3924
Check List
--signoff.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.