Skip to content

Conversation

@ahkcs
Copy link
Contributor

@ahkcs ahkcs commented Nov 10, 2025

Description

Originally from #4056 by @selsong

This PR implements a significant performance optimization for the reverse command by eliminating the expensive ROW_NUMBER() window function and implementing a three-tier logic based on query context.

Motivation

The previous implementation used ROW_NUMBER() window function which:

  • Required materializing the entire dataset
  • Caused excessive memory usage
  • Failed on large datasets (100M+ records) with "insufficient resources" errors

Solution: Three-Tier Reverse Logic

The reverse command now follows context-aware behavior:

  1. With existing sort/collation: Reverses all sort directions (ASC ↔ DESC)
  2. With @timestamp field (no explicit sort): Sorts by @timestamp in descending order
  3. Without sort or @timestamp: The command is ignored (no-op)

Implementation Details

1. Reverse with Explicit Sort (Primary Use Case)

Query:

source=accounts | sort +balance, -firstname | reverse

Behavior: Flips all sort directions: +balance, -firstname-balance, +firstname

Logical Plan:

LogicalSystemLimit(sort0=[$3], sort1=[$1], dir0=[DESC-nulls-last], dir1=[ASC-nulls-first], fetch=[10000], type=[QUERY_SIZE_LIMIT])
  LogicalProject(account_number=[$0], firstname=[$1], ...)
    LogicalSort(sort0=[$3], sort1=[$1], dir0=[DESC-nulls-last], dir1=[ASC-nulls-first])
      CalciteLogicalIndexScan(table=[[OpenSearch, accounts]])

Physical Plan: (efficiently pushes reversed sort to OpenSearch)

CalciteEnumerableIndexScan(table=[[OpenSearch, accounts]],
  PushDownContext=[[..., SORT->[
    {"balance": {"order": "desc", "missing": "_last"}},
    {"firstname.keyword": {"order": "asc", "missing": "_first"}}
  ], LIMIT->10000]])

2. Reverse with @timestamp (Time-Series Optimization)

Query:

source=time_series_logs | reverse | head 100

Behavior: When no explicit sort exists but the index has an @timestamp field, reverse automatically sorts by @timestamp DESC to show most recent events first.

Use Case: Common pattern in log analysis - users want recent logs first

Logical Plan:

LogicalSystemLimit(sort0=[$0], dir0=[DESC], fetch=[10000], type=[QUERY_SIZE_LIMIT])
  LogicalProject(@timestamp=[$0], category=[$1], value=[$2])
    LogicalSort(sort0=[$0], dir0=[DESC])
      CalciteLogicalIndexScan(table=[[OpenSearch, time_data]])

3. Reverse Ignored (No-Op Case)

Query:

source=accounts | reverse | head 100

Behavior: When there's no explicit sort AND no @timestamp field, reverse is ignored. Results appear in natural index order.

Rationale: Avoid expensive operations when reverse has no meaningful semantic interpretation.

Logical Plan:

LogicalSystemLimit(fetch=[10000], type=[QUERY_SIZE_LIMIT])
  LogicalProject(account_number=[$0], firstname=[$1], ...)
    CalciteLogicalIndexScan(table=[[OpenSearch, accounts]])

Note: No sort node is added - reverse is completely ignored.


4. Double Reverse (Cancellation)

Query:

source=accounts | sort +balance, -firstname | reverse | reverse

Behavior: Two reverses cancel each other out, returning to original sort order.

Logical Plan:

LogicalSystemLimit(sort0=[$3], sort1=[$1], dir0=[ASC-nulls-first], dir1=[DESC-nulls-last], fetch=[10000])
  LogicalProject(account_number=[$0], firstname=[$1], ...)
    LogicalSort(sort0=[$3], sort1=[$1], dir0=[ASC-nulls-first], dir1=[DESC-nulls-last])
      CalciteLogicalIndexScan(table=[[OpenSearch, accounts]])

Final sort order matches original query: +balance, -firstname


5. Multiple Sorts + Reverse

Query:

source=accounts | sort +balance | sort -firstname | reverse

Behavior: Reverse applies to the most recent sort (from PPL semantics, last sort wins).

Logical Plan:

LogicalSystemLimit(sort0=[$1], dir0=[ASC-nulls-first], fetch=[10000])
  LogicalProject(account_number=[$0], firstname=[$1], ...)
    LogicalSort(sort0=[$1], dir0=[ASC-nulls-first])
      CalciteLogicalIndexScan(table=[[OpenSearch, accounts]])

Result: Only firstname sort is reversed (DESC → ASC). The balance sort is overridden by PPL's "last sort wins" rule.


Related Issues

Resolves #3924

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link
Collaborator

@dai-chen dai-chen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QQ: I recall the major comment on original PR is early optimization in analyzer layer. Is this new PR trying to address the concern? Ref: #4056 (comment)

@ahkcs
Copy link
Contributor Author

ahkcs commented Nov 11, 2025

QQ: I recall the major comment on original PR is early optimization in analyzer layer. Is this new PR trying to address the concern? Ref: #4056 (comment)

Hi Chen, I think that's a valid concern. However, after trying it out, I think it has significant complexity comparing to the current approach. I think CalciteRelNodeVisitor is used as a logical plan builder that constructs the logical representation of the query, so I think optimization can also happen here. In our approach, our visitReverse is choosing LogicalSort(reversed) vs LogicalSort(ROW_NUMBER), and I think this is appropriate for logical plan builder. If we moved the optimization to Calcite rule, we'd be doing something more complex - starting with a naive representation (always ROW_NUMBER) and rewriting it. That adds significant complexity.

@ahkcs ahkcs requested a review from dai-chen November 11, 2025 22:29
Copy link
Collaborator

@noCharger noCharger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add benchmark results on before VS after?

@songkant-aws
Copy link
Contributor

LGTM. Please get other signoffs.

@yuancu
Copy link
Collaborator

yuancu commented Nov 26, 2025

Hi @ahkcs , #4784 allows user to specify a timestamp field in timechart command. For this specific case, maybe we need to use the specified timefield instead of the hard-coded @timestamp.

Although I doubt that there isn't much impact because all timechart commands have a sort at the end of their plans, making them to fall into your first tier. Can you please double check?

@ahkcs ahkcs force-pushed the feat/reverse_optimization branch from 631a24d to 8977a8f Compare November 26, 2025 19:05
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 26, 2025

📝 Walkthrough

Summary by CodeRabbit

  • New Features

    • Enhanced reverse command: flips existing sorts, backtracks to prior sorts when needed, falls back to @timestamp DESC if available, otherwise ignored.
  • Documentation

    • Rewritten reverse docs with detailed behavior model, expanded examples, and version note.
  • Tests

    • Added extensive integration and unit tests covering multi-field sorts, double-reverse, aggregations, windowing, streamstats, timechart, and timestamp scenarios.

✏️ Tip: You can customize this high-level summary in your review settings.

Walkthrough

This change replaces a ROW_NUMBER-based reverse implementation with a collation-centric strategy in Calcite planning: it backtracks the RelNode tree to locate Sort collations, reverses them (or inserts reversed Sorts), falls back to sorting by @timestamp DESC if present, and treats other cases as no-ops. Extensive tests and expected-plan fixtures plus docs were added/updated.

Changes

Cohort / File(s) Summary
Core Query Planning Logic
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
Adds backtracking to find non-empty RelCollation up the RelNode tree (stops at Aggregate, BiRel, SetOp, Uncollect, or LogicalProject with window). Adds backtrackForCollation() and insertReversedSortInTree() to reverse or insert reversed Sorts and removes ROW_NUMBER-based reverse logic.
Query Plan Utilities
core/src/main/java/org/opensearch/sql/calcite/utils/PlanUtils.java
New public static reverseCollation(RelCollation) to flip field directions and nullDirections; imports related Calcite RelCollation types.
Calcite Integration Test Suite
integ-test/src/test/java/org/opensearch/sql/calcite/CalciteNoPushdownIT.java
Registers CalciteReverseCommandIT.class in the no-pushdown test suite.
Calcite Explain Tests
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
Replaced a single reverse explain test with multiple specialized tests covering ignored, pushdown (single/multiple fields), double-reverse, and timestamp fallback cases.
Calcite Reverse Integration Tests
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteReverseCommandIT.java
Adds extensive reverse behavior tests (descending/mixed sorts, double reverse, timestamp fallback, interactions with aggregation/window/streamstats/timechart, and blocking-operator cases).
Expected Explain Fixtures (Pushdown)
integ-test/src/test/resources/expectedOutput/calcite/*.yaml
Adds multiple YAML expected-plan fixtures for reverse pushdown, double-reverse, timestamp fallback, and ignored cases.
Expected Explain Fixtures (No-Pushdown)
integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/*.yaml
Adds mirror YAML fixtures demonstrating no-pushdown explain outputs (EnumerableSort/Calc layers).
PPL Planner & Tests
ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLReverseTest.java, ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLStreamstatsTest.java, ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLTimechartTest.java
Adds many PPL tests for reverse behavior, streamstats/timechart reverse scenarios; adds created_at TIMESTAMP column to EventsTable test schema for timechart tests.
Documentation
docs/user/ppl/cmd/reverse.md
Rewrites reverse docs to a three-tier behavior model: flip explicit preceding sorts; else sort by @timestamp DESC if available; else ignore. Expands examples and notes.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant Parser
    participant CalciteVisitor
    participant RelNodeTree
    participant PlanUtils
    participant OpenSearch

    User->>Parser: submit query (includes Reverse)
    Parser->>CalciteVisitor: produce RelNode with Reverse node

    activate CalciteVisitor
    CalciteVisitor->>RelNodeTree: backtrackForCollation(startingNode)
    alt Sort with non-empty collation found
        RelNodeTree-->>CalciteVisitor: returns Sort + RelCollation
        CalciteVisitor->>PlanUtils: reverseCollation(collation)
        PlanUtils-->>CalciteVisitor: reversed collation
        CalciteVisitor->>RelNodeTree: insertReversedSortInTree(at located Sort)
        RelNodeTree-->>CalciteVisitor: rewritten RelNode plan
    else No sort found but `@timestamp` exists
        RelNodeTree-->>CalciteVisitor: indicates `@timestamp` present
        CalciteVisitor->>RelNodeTree: insert Sort(`@timestamp` DESC)
    else Blocked or no sortable path
        RelNodeTree-->>CalciteVisitor: blocked/no-op
    end
    deactivate CalciteVisitor

    CalciteVisitor->>OpenSearch: pushdown request (with reversed sort if applied)
    OpenSearch-->>CalciteVisitor: results
    CalciteVisitor-->>User: deliver final results
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Suggested labels

pushdown, PPL

Suggested reviewers

  • penghuo
  • ps48
  • ykmr1224
  • anirudha
  • yuancu
  • kavithacm
  • qianheng-aws

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 12.68% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and concisely summarizes the primary change: implementing a performance optimization for the reverse command by replacing ROW_NUMBER with context-aware logic.
Description check ✅ Passed The description comprehensively documents the motivation, solution, and implementation details of the reverse optimization, including specific examples and logical plans.
Linked Issues check ✅ Passed The PR implements all primary coding objectives from issue #3924: context-aware reverse with collation flipping [#3924], @timestamp fallback [#3924], no-op behavior [#3924], and double-reverse cancellation [#3924].
Out of Scope Changes check ✅ Passed All code changes are directly related to implementing the reverse optimization, from core logic in CalciteRelNodeVisitor and PlanUtils to comprehensive test coverage and expected outputs.
✨ Finishing touches
  • 📝 Generate docstrings

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (4)
docs/user/ppl/cmd/reverse.rst (1)

64-81: Example 2 uses hardcoded future dates - consider updating for realism.

The example shows timestamps from July 2025 (2025-07-28), which are in the future relative to the current date (November 2025 based on context). While this doesn't affect functionality, using realistic past timestamps or noting these are sample values would improve documentation quality.

This is a minor documentation nitpick.

integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java (1)

468-473: Consider future-proofing testExplainReverseWithTimestamp for configurable time fields

testExplainReverseWithTimestamp currently assumes @timestamp as the time field. With the separate work allowing configurable time fields (e.g., in timechart), you may eventually want a companion explain test that asserts reverse uses the resolved time field rather than hard-coding @timestamp, to prevent regressions when that logic evolves.

ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLReverseTest.java (1)

345-360: Result-order assertion after aggregation may be fragile

In testReverseAfterSortAndAggregationIsNoOp, the expected result string relies on a specific row order from an aggregation without an explicit ORDER BY. If the underlying engine ever changes its grouping or output-order behavior, this test could fail despite reverse still being a no-op.

You might consider either:

  • dropping the result-order assertion and only checking the logical plan, or
  • adding an explicit ORDER BY in the PPL and adjusting expectations accordingly.
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteReverseCommandIT.java (1)

180-191: Consider using unordered verification for no-op test.

When reverse is a no-op, the "natural order" is technically undefined and could vary based on shard allocation or segment merges. While freshly-loaded test indices are typically stable, using verifyDataRows (unordered) or explicitly sorting would make this test more robust against flakiness.

-    // Without sort or @timestamp, reverse is ignored, so data comes in natural order
-    // The first 3 documents in natural order (ascending by account_number)
-    verifyDataRowsInOrder(result, rows(1), rows(6), rows(13));
+    // Without sort or @timestamp, reverse is ignored, so data comes in natural (undefined) order
+    // Just verify we get 3 rows with valid account numbers
+    verifyDataRows(result, rows(1), rows(6), rows(13));

Alternatively, keep the ordered assertion if you're confident the test data loading order is deterministic.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b88bf56 and 8977a8f.

📒 Files selected for processing (23)
  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java (2 hunks)
  • core/src/main/java/org/opensearch/sql/calcite/utils/PlanUtils.java (2 hunks)
  • docs/category.json (1 hunks)
  • docs/user/ppl/cmd/reverse.rst (5 hunks)
  • integ-test/src/test/java/org/opensearch/sql/calcite/CalciteNoPushdownIT.java (1 hunks)
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java (1 hunks)
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteReverseCommandIT.java (6 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_double_reverse_ignored.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_double_reverse_pushdown_multiple.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_double_reverse_pushdown_single.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_reverse_ignored.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_reverse_pushdown_multiple.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_reverse_pushdown_single.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_reverse_with_timestamp.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_double_reverse_ignored.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_double_reverse_pushdown_multiple.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_double_reverse_pushdown_single.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_reverse_ignored.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_reverse_pushdown_multiple.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_reverse_pushdown_single.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_reverse_with_timestamp.yaml (1 hunks)
  • ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLReverseTest.java (5 hunks)
  • ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLStreamstatsTest.java (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (27)
  • GitHub Check: bwc-tests-full-restart (21)
  • GitHub Check: bwc-tests-full-restart (25)
  • GitHub Check: build-linux (25, doc)
  • GitHub Check: build-linux (21, integration)
  • GitHub Check: build-linux (25, integration)
  • GitHub Check: build-linux (25, unit)
  • GitHub Check: build-linux (21, unit)
  • GitHub Check: bwc-tests-rolling-upgrade (21)
  • GitHub Check: bwc-tests-rolling-upgrade (25)
  • GitHub Check: build-linux (21, doc)
  • GitHub Check: security-it-linux (25)
  • GitHub Check: security-it-linux (21)
  • GitHub Check: build-windows-macos (macos-14, 21, unit)
  • GitHub Check: build-windows-macos (macos-14, 21, integration)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (macos-14, 25, unit)
  • GitHub Check: build-windows-macos (macos-14, 21, doc)
  • GitHub Check: build-windows-macos (macos-14, 25, integration)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (macos-14, 25, doc)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
  • GitHub Check: security-it-windows-macos (macos-14, 21)
  • GitHub Check: security-it-windows-macos (macos-14, 25)
  • GitHub Check: security-it-windows-macos (windows-latest, 21)
  • GitHub Check: security-it-windows-macos (windows-latest, 25)
  • GitHub Check: CodeQL-Scan (java)
🔇 Additional comments (36)
integ-test/src/test/resources/expectedOutput/calcite/explain_reverse_with_timestamp.yaml (1)

1-13: Let me analyze the original review comment based on the provided information without repository access.

Looking at the original review comment, I can make the following observations:

  1. Content Analysis: The YAML snippet shows valid YAML structure with proper indentation and quoting
  2. Logical Plan: The logical plan shows proper DESC sort on @timestamp field ($0), which aligns with stated PR objectives
  3. Physical Plan: The physical plan correctly pushes down the sort operation with proper DESC ordering and "missing": "_first" configuration
  4. Review Structure: The original review has contradictory elements - it includes tag while also requesting with a verification script

Key Issue: The original review requests verification via shell script execution but includes approval tags. Without access to verify companion test files or run the verification script, I cannot confirm the consistency claims across related test resources.


Test resource shows correct DESC sort on @timestamp with proper push-down to OpenSearch, but verification of consistency with related test files cannot be completed.

The logical and physical plans correctly implement DESC sorting on @timestamp with appropriate missing value handling. However, the requested verification against companion test cases (e.g., explain_reverse_ignored.yaml) could not be performed due to inability to access the repository file structure. The YAML syntax appears valid based on static inspection, and the logical progression from LogicalSort through LogicalProject to CalciteLogicalIndexScan is sound.

docs/category.json (1)

45-45: LGTM!

Documentation entry correctly added in alphabetical order within the ppl_cli_calcite category.

integ-test/src/test/resources/expectedOutput/calcite/explain_reverse_pushdown_multiple.yaml (1)

1-19: LGTM!

The expected output correctly demonstrates:

  • Sort direction reversal (DESC→ASC for age, ASC→DESC for firstname)
  • Null direction reversal (nulls-last↔nulls-first)
  • Proper pushdown of reversed sort to OpenSearch with correct JSON structure
core/src/main/java/org/opensearch/sql/calcite/utils/PlanUtils.java (1)

594-623: LGTM!

The reverseCollation utility is well-implemented:

  • Properly handles null/empty input with early return
  • Correctly uses direction.reverse() API for sort direction
  • Appropriately flips null direction (FIRST↔LAST, preserving UNSPECIFIED)
  • Returns immutable collation via RelCollations.of()
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java (4)

786-788: The build/push pattern is correct for replacing the current node.

The sequence build() (pop) followed by push(rebuiltTree) correctly replaces the current node on the RelBuilder stack with the rebuilt tree containing the reversed sort.


690-724: Backtracking logic is well-designed with comprehensive blocking operator checks.

The method correctly identifies operators that would invalidate sort ordering:

  • Aggregate (destroys row-level ordering)
  • BiRel (covers Join, Correlate)
  • SetOp (Union, Intersect, Except)
  • Uncollect (unnesting)
  • LogicalProject with window functions (ordering from window's ORDER BY)

Based on learnings from past review comments, the blocking operators were expanded to cover these edge cases.


735-761: I'm unable to clone the repository to verify the test cases and implementation details directly. However, based on the code snippet provided in the review comment, I can analyze the logical correctness of the concern.

The code appears to correctly preserve LIMIT semantics as implemented.

Analysis of the insertReversedSortInTree method:

  1. Tree Structure: The method creates a new LogicalSort on top of the original sort node (line 753-754), rather than replacing it. The original sort with its fetch/offset constraints remains in the tree.

  2. Execution Flow: When a query like source=t | sort age | head 10 | reverse is executed:

    • The original LogicalSort node (with fetch=10) applies the limit first
    • The reversed LogicalSort (with null fetch/offset) is applied on top, reversing the already-limited result
    • The limit is preserved because the original sort retains its fetch constraint
  3. Why null fetch/offset is correct: The reversed sort on top doesn't need a fetch/offset because it operates on an already-limited dataset from the original sort. Setting fetch/offset on the reversed sort would be redundant and incorrect.

However, I cannot conclusively verify this claim without access to test cases or the actual query execution flow due to repository access issues. The logical analysis suggests the implementation is sound, but there could be edge cases or runtime considerations not apparent from the code alone.


769-776: Based on my web search findings, I can now provide a verification of the review comment's concern.

The review concern about edge cases is valid and well-founded.

The web search confirms that Calcite's MetadataQuery.collations() can indeed return collations from multiple sources:

  1. TableScan/RelOptTable declared collations (index or table-level ordering)
  2. Upstream input collations (from Sort, Project, Calc, Window, Join operators)
  3. Provider-specific logic (enumerable implementations, planner-specific handlers)

This means your concern is correct: a collation found via MetadataQuery.collations() may come from:

  • An index's natural ordering (no explicit Sort node)
  • A table's declared collation
  • Upstream operators that preserve collations

When the code calls relBuilder.sort(reversedCollation), it does add a new sort operation if the collation source isn't an actual Sort node. However, the semantics appear intentionally correct—reversing a known ordering (whether from index, table, or upstream) and applying it via sort is still semantically valid.

The issue is verification coverage, not the logic itself. Without access to the repository to check test files, I cannot definitively verify whether all these edge cases (especially index natural ordering without explicit Sort nodes) are covered by tests.


Consider documenting the edge case: collation from metadata query sources beyond Sort nodes.

The code correctly reverses collations from all metadata sources (index ordering, table collations, upstream operators), not just explicit Sort nodes. However, for clarity with the PR's "flip sort directions" documentation, add a comment explaining that MetadataQuery.collations() may return ordering information from index definitions or table metadata, and that the reversed collation is applied uniformly regardless of source. This would also clarify that new sort operations may be added for index-based orderings.

docs/user/ppl/cmd/reverse.rst (1)

14-43: Documentation clearly explains the three-tier behavior and optimization benefits.

The description, behavior section, and note effectively communicate:

  • The conditional logic based on sort/timestamp presence
  • Performance benefits of avoiding materialization
  • Memory optimization rationale

This aligns well with the PR objectives of enabling pushdown and avoiding expensive operations on large datasets.

integ-test/src/test/java/org/opensearch/sql/calcite/CalciteNoPushdownIT.java (1)

91-91: Include reverse tests in no-pushdown suite – looks good

Adding CalciteReverseCommandIT.class here keeps reverse coverage consistent for both pushdown and no-pushdown modes. No further changes needed.

integ-test/src/test/resources/expectedOutput/calcite/explain_double_reverse_ignored.yaml (1)

1-7: Double-reverse ignored plan matches no-op semantics

Logical/physical plans show no sort introduced for the double reverse case; only limit/project are applied, which matches the described no-op behavior. Fixture looks consistent.

integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_double_reverse_ignored.yaml (1)

1-9: No-pushdown double-reverse ignored fixture is consistent

Plan omits any sort and only reflects system limit and projection, which aligns with the intended “double reverse is a no-op” behavior under no-pushdown.

integ-test/src/test/resources/expectedOutput/calcite/explain_reverse_pushdown_single.yaml (1)

1-10: Single-field reverse pushdown plan matches optimization intent

Logical plan shows original + reversed sorts; physical plan collapses to a single pushed-down age ASC sort with limit, consistent with the described reverse optimization.

integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_reverse_with_timestamp.yaml (1)

1-12: Reverse with @timestamp (no-pushdown) plan matches documented behavior

Plan adds a DESC sort on @timestamp plus the head limit, implemented via EnumerableSort/Limit rather than pushdown, which is exactly what the no-pushdown path is expected to do.

integ-test/src/test/resources/expectedOutput/calcite/explain_double_reverse_pushdown_multiple.yaml (1)

1-20: Double-reverse multi-field pushdown fixture reflects canceled reverses

Logical sorts show the flip–flip pattern, while the physical plan pushes down the original age DESC / firstname.keyword ASC sort with limit, which aligns with the double-reverse semantics.

integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_reverse_pushdown_multiple.yaml (1)

1-12: No-pushdown multi-field reverse plan looks correct

The logical plan shows original + reversed sorts; the physical plan retains only the final reversed collation as an EnumerableSort (no pushdown), which matches the intended behavior for this configuration despite the filename.

integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_double_reverse_pushdown_single.yaml (1)

1-13: Double-reverse no-pushdown logical/physical shape looks consistent

Logical and physical plans correctly reflect two reverses on a single sort (triple LogicalSort wrapper, single physical DESC sort under a LIMIT), matching the intended “double reverse = original ordering” behavior in the no-pushdown path.

integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_reverse_pushdown_single.yaml (1)

1-12: Single reverse no-pushdown plan matches flipped sort semantics

The expected logical/physical plans correctly implement sort - age | reverse as an ASC sort on age at the physical layer, while preserving the intermediate DESC/ASC logical structure.

integ-test/src/test/resources/expectedOutput/calcite/explain_double_reverse_pushdown_single.yaml (1)

1-15: Double-reverse pushdown correctly preserves original DESC sort at index

The explain output shows triple sorts in the logical plan but a single DESC sort in PushDownContext, so the pushed sort matches the original sort - age after two reverses, as desired.

integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_double_reverse_pushdown_multiple.yaml (1)

1-13: Multi-key double-reverse no-pushdown plan preserves original collation

Final physical sort on (age DESC, firstname ASC) after two reverses matches the expected “last sort wins, double reverse = original” semantics for multi-field sorts.

integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_reverse_ignored.yaml (1)

1-11: Reverse-no-op (no sort, no @timestamp) is represented correctly

The plan omits any additional sort for reverse and only shows the fetch=[5] limit/sort for head 5, matching the “reverse ignored” behavior in this context.

integ-test/src/test/resources/expectedOutput/calcite/explain_reverse_ignored.yaml (1)

1-8: Pushdown reverse-no-op keeps only LIMITs in context

The pushed plan correctly omits any reverse sort and carries both the explicit head 5 and global QUERY_SIZE_LIMIT as LIMIT entries, with size=5 in the request builder.

integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java (1)

417-473: Reverse explain coverage for sort/no-sort/timestamp looks solid

The new ITs exercise the key reverse scenarios (ignored w/o sort/@timestamp, single & multi-field pushdown, double reverse, and @timestamp-driven sort) against explain output, which lines up well with the documented behavior matrix for this PR.

ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLReverseTest.java (5)

12-23: Reverse behavior Javadoc is clear and matches the implemented test matrix

The class-level documentation concisely captures the three reverse modes (flip existing collation, use @timestamp, no-op) and points to integration tests for the non-collation cases, which aligns well with the scenarios exercised below.


159-240: Planner tests for multiple/multi-field sorts with reverse are well-structured

The tests from testMultipleSortsWithReverseParserSuccess through testReverseWithFieldsAndSortParserSuccess accurately encode the expected logical plans and Spark SQL for:

  • multiple sequential sorts where reverse targets the last one,
  • multi-field collations with direction flipping, and
  • interaction with fields projections.

These should give good safety nets for future refactors of the reverse/backtracking logic.


242-290: Head-then-sort-then-reverse no-opt test correctly guards semantics

testHeadThenSortReverseNoOpt and testSortFieldsReverse explicitly assert the presence and ordering of the three LogicalSort nodes (fetch, sort, reverse) and the backtracking case where the sort key is projected away, which is important to prevent “helpful” optimizations that would silently change PPL semantics.


297-360: Reverse-no-op tests after aggregation/join match the “collation destroyed” rule

The tests asserting that reverse is ignored after aggregation and joins, and after sort followed by aggregation, correctly expect no additional LogicalSort node and no ORDER BY in the generated SQL, demonstrating that reverse doesn’t try to infer collation past blocking operators.


362-458: Reverse through filters/eval/join+sort is comprehensively covered

The remaining tests (reverse after where/eval/multiple filters, sort–join–sort–reverse, and reverse on a post-aggregation sort) collectively validate the backtracking strategy and ensure reverse only inverts collations where they’re still semantically valid, which is exactly what this optimization needs.

integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteReverseCommandIT.java (8)

8-32: LGTM!

The import additions and setup modifications properly support the new test cases. Loading TIME_TEST_DATA for @timestamp field tests and STATE_COUNTRY for streamstats tests is appropriate.


35-110: LGTM!

The existing tests are correctly updated to include explicit sort commands before reverse, aligning with the new pushdown-friendly implementation. The expected results remain consistent, validating the optimization preserves correctness.


193-216: LGTM!

This test correctly validates that when no explicit sort exists but @timestamp is present, reverse adds a @timestamp DESC sort. The expected values align with the time-series test data.

Minor note: The comment on line 208 mentions "IDs 100, 99, 98, 97, 96" which may be outdated or confusing since the test is ordered by timestamp, not IDs.


218-229: LGTM!

This test correctly validates that explicit sort takes precedence over @timestamp fallback. The reverse operation flips the sort value direction rather than applying @timestamp DESC.


231-338: LGTM - addresses past review concerns about streamstats.

These tests comprehensively cover the streamstats | reverse scenarios discussed in previous reviews:

  • No-op when __stream_seq__ is projected out and not detectable
  • Working reverse when backtracking finds collation (partition case)
  • Working reverse with explicit sort after streamstats

This aligns with the backtracking implementation mentioned in the past review discussion.


358-372: Inconsistency between expected behavior and verification method.

The comment on line 367-368 states that with explicit sort and reverse, data should be in descending gender order (M, F). However, line 371 uses verifyDataRows (unordered) rather than verifyDataRowsInOrder.

If reverse is working correctly after sort gender, the output order should be deterministic and you should use ordered verification:

-    // Note: Due to column reordering after stats (c, gender), the result order
-    // may differ from expected. Using unordered verification for robustness.
-    verifyDataRows(result, rows(4, "M"), rows(3, "F"));
+    // With explicit sort and reverse, data is in descending gender order: M, F
+    verifyDataRowsInOrder(result, rows(4, "M"), rows(3, "F"));

If there's a known issue preventing ordered output after aggregation+sort+reverse, please clarify in the comment.


389-440: LGTM!

These tests properly validate that the backtracking logic traverses through non-blocking operators (filters, eval/project) to find the underlying sort collation. The expected results correctly reflect the filtered data in reversed sort order.


442-456: LGTM!

This test correctly validates that the @timestamp fallback only applies when @timestamp is present in the output schema. Since aggregation (stats count() as c by category) doesn't include @timestamp, reverse correctly becomes a no-op.

Comment on lines +113 to +128
public void testReverseWithDescendingSort() throws IOException {
// Test reverse with descending sort (- age)
JSONObject result =
executeQuery(
String.format(
"source=%s | sort account_number | fields account_number | reverse | head 3",
"source=%s | sort - account_number | fields account_number | reverse",
TEST_INDEX_BANK));
verifySchema(result, schema("account_number", "bigint"));
verifyDataRowsInOrder(result, rows(32), rows(25), rows(20));
verifyDataRowsInOrder(
result, rows(1), rows(6), rows(13), rows(18), rows(20), rows(25), rows(32));
}

@Test
public void testReverseWithMixedSortDirections() throws IOException {
// Test reverse with mixed sort directions (- age, + firstname)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix comments to match the actual sort fields.

The comments reference age but the code sorts by account_number:

  • Line 114: "Test reverse with descending sort (- age)"
  • Line 127: "Test reverse with mixed sort directions (- age, + firstname)"
   @Test
   public void testReverseWithDescendingSort() throws IOException {
-    // Test reverse with descending sort (- age)
+    // Test reverse with descending sort (- account_number)
     JSONObject result =
   @Test
   public void testReverseWithMixedSortDirections() throws IOException {
-    // Test reverse with mixed sort directions (- age, + firstname)
+    // Test reverse with mixed sort directions (- account_number, + firstname)
     JSONObject result =
🤖 Prompt for AI Agents
In
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteReverseCommandIT.java
around lines 113 to 127, update the misleading inline comments that mention
"age" to reflect the actual sort field(s) used in the tests: change the comment
on line 114 to indicate a descending sort on account_number (e.g., "Test reverse
with descending sort (- account_number)") and change the comment on line 127 to
match the mixed sort fields used in that test (replace "- age, + firstname" with
the actual sort expression present in the test, e.g., "- account_number, +
firstname" or the correct fields), keeping comment style consistent with
surrounding tests.

Comment on lines +146 to +162
@Test
public void testDoubleReverseWithDescendingSort() throws IOException {
// Test double reverse with descending sort (- age)
JSONObject result =
executeQuery(
String.format(
"source=%s | sort - account_number | fields account_number | reverse | reverse",
TEST_INDEX_BANK));
verifySchema(result, schema("account_number", "bigint"));
verifyDataRowsInOrder(
result, rows(32), rows(25), rows(20), rows(18), rows(13), rows(6), rows(1));
}

@Test
public void testDoubleReverseWithMixedSortDirections() throws IOException {
// Test double reverse with mixed sort directions (- age, + firstname)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Same comment inconsistency: "age" should be "account_number".

Lines 148 and 161 also reference "- age" but the code uses "- account_number".

   @Test
   public void testDoubleReverseWithDescendingSort() throws IOException {
-    // Test double reverse with descending sort (- age)
+    // Test double reverse with descending sort (- account_number)
     JSONObject result =
   @Test
   public void testDoubleReverseWithMixedSortDirections() throws IOException {
-    // Test double reverse with mixed sort directions (- age, + firstname)
+    // Test double reverse with mixed sort directions (- account_number, + firstname)
     JSONObject result =
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@Test
public void testDoubleReverseWithDescendingSort() throws IOException {
// Test double reverse with descending sort (- age)
JSONObject result =
executeQuery(
String.format(
"source=%s | sort - account_number | fields account_number | reverse | reverse",
TEST_INDEX_BANK));
verifySchema(result, schema("account_number", "bigint"));
verifyDataRowsInOrder(
result, rows(32), rows(25), rows(20), rows(18), rows(13), rows(6), rows(1));
}
@Test
public void testDoubleReverseWithMixedSortDirections() throws IOException {
// Test double reverse with mixed sort directions (- age, + firstname)
@Test
public void testDoubleReverseWithDescendingSort() throws IOException {
// Test double reverse with descending sort (- account_number)
JSONObject result =
executeQuery(
String.format(
"source=%s | sort - account_number | fields account_number | reverse | reverse",
TEST_INDEX_BANK));
verifySchema(result, schema("account_number", "bigint"));
verifyDataRowsInOrder(
result, rows(32), rows(25), rows(20), rows(18), rows(13), rows(6), rows(1));
}
@Test
public void testDoubleReverseWithMixedSortDirections() throws IOException {
// Test double reverse with mixed sort directions (- account_number, + firstname)
🤖 Prompt for AI Agents
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteReverseCommandIT.java
around lines 146 to 161: the inline test comments incorrectly refer to "- age"
while the test actually sorts by account_number; update the two comments at
lines 148 and 161 to reference "- account_number" (or a neutral description like
"descending account_number") so the comment matches the code under test.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteReverseCommandIT.java (1)

114-116: Inline comments still refer to “age” while tests sort by account_number

The four comments in the descending/mixed-sort tests still mention “age”, but the queries all sort on account_number:

  • Line 115: // Test reverse with descending sort (- age)
  • Line 128: // Test reverse with mixed sort directions (- age, + firstname)
  • Line 149: // Test double reverse with descending sort (- age)
  • Line 162: // Test double reverse with mixed sort directions (- age, + firstname)

This can mislead readers when debugging test failures.

Consider updating them along these lines:

-    // Test reverse with descending sort (- age)
+    // Test reverse with descending sort (- account_number)
...
-    // Test reverse with mixed sort directions (- age, + firstname)
+    // Test reverse with mixed sort directions (- account_number, + firstname)
...
-    // Test double reverse with descending sort (- age)
+    // Test double reverse with descending sort (- account_number)
...
-    // Test double reverse with mixed sort directions (- age, + firstname)
+    // Test double reverse with mixed sort directions (- account_number, + firstname)

Also applies to: 128-129, 149-150, 162-163

🧹 Nitpick comments (4)
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteReverseCommandIT.java (4)

289-307: Comment in testStreamstatsByWithReverse contradicts the expected reversed ordering

The leading comment says:

// Test that reverse is ignored after streamstats with partitioning (by clause)

but the assertion below uses verifyDataRowsInOrder(...) with the rows clearly reversed by __stream_seq__ within each country:

// With backtracking, reverse now works and reverses the __stream_seq__ order
verifyDataRowsInOrder(
    result,
    rows("Jane", "Canada", ..., 20, 2, 22.5),
    rows("John", "Canada", ..., 25, 1, 25),
    rows("Hello", "USA",  ..., 30, 2, 50),
    rows("Jake", "USA",   ..., 70, 1, 70));

To avoid confusion, the first comment should reflect that reverse does work here, for example:

-    // Test that reverse is ignored after streamstats with partitioning (by clause)
+    // Test that reverse backtracks through streamstats-by and reverses __stream_seq__ order

445-456: Clarify intent in testReverseWithTimestampAfterAggregation comment

The header says:

// Test that reverse uses @timestamp when aggregation destroys collation
// TIME_TEST_DATA has @timestamp field

but the body and expectations assert the opposite:

// Even though aggregation destroys collation, there's no @timestamp in the
// aggregated result, so reverse is a no-op
verifyDataRows(result, rows(26, "A"), rows(25, "B"), rows(25, "C"), rows(24, "D"));

Since the query stats count() as c by category | reverse drops @timestamp from the result, reverse is intentionally a no-op here.

Consider rephrasing the header to match the actual behavior, e.g.:

-    // Test that reverse uses @timestamp when aggregation destroys collation
-    // TIME_TEST_DATA has @timestamp field
+    // Test that reverse is a no-op after aggregation, even if the source index has @timestamp
+    // TIME_TEST_DATA has @timestamp, but the aggregated result no longer exposes it

359-373: testReverseAfterAggregationWithSort doesn’t actually validate the reversed ordering

The comments state that reverse should produce descending gender order:

// With explicit sort and reverse, data should be in descending gender order
// Sort by gender ASC: F, M -> Reverse: M, F

but the assertion uses verifyDataRows(...), which ignores row order and only checks the multiset of rows:

verifyDataRows(result, rows(4, "M"), rows(3, "F"));

If the goal is to verify that reverse correctly flips the post-aggregation sort, consider switching to verifyDataRowsInOrder(...) and tightening the comment; otherwise, adjust the comment to say you’re only validating the result set, not its ordering.

This is test quality, not functionality, so can be deferred but would strengthen coverage of the reverse behavior.


498-515: testTimechartWithGroupByAndReverse comment claims reversed order, but the assertion is unordered

The test description:

// Test timechart with group by and reverse
// The sort is on both @timestamp and the group by field
...
// All events are in the same hour, so only one time bucket
// Hosts are grouped and results are reversed

suggests you expect a specific row ordering after reverse, but the assertion uses verifyDataRows(...), which doesn’t check order:

verifyDataRows(
    result,
    rows("2024-07-01 00:00:00", "db-01", 1),
    rows("2024-07-01 00:00:00", "web-01", 2),
    rows("2024-07-01 00:00:00", "web-02", 2));

Given timechart appends an explicit ORDER BY and reverse adds another, the final ordering should be deterministic here. To actually exercise that, you could:

  • Use verifyDataRowsInOrder(...) with the expected reversed order, or
  • Relax the comment to just say you’re validating the grouped result content.
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8977a8f and 22426ba.

📒 Files selected for processing (3)
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteReverseCommandIT.java (6 hunks)
  • ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLStreamstatsTest.java (1 hunks)
  • ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLTimechartTest.java (3 hunks)
🔇 Additional comments (2)
ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLStreamstatsTest.java (1)

222-250: New testStreamstatsWithReverse expectations look consistent with non-null-bucket semantics

The logical and Spark SQL plans correctly model reverse as an added DESC sort on __stream_seq__ on top of the existing ASC sort from streamstats, and the aggregation uses the same non-null-bucket MAX window shape as testStreamstatsBy. Looks good.

ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLTimechartTest.java (1)

53-78: Events table schema and timechart+reverse tests are coherent

  • The added created_at TIMESTAMP column is wired consistently: row literals, EventsTable protoRowType, and the new timefield=created_at test all agree on column ordering and types.
  • testTimechartWithReverse and testTimechartWithCustomTimefieldAndReverse correctly expect:
    • Inner ASC sort on the appropriate bucket (SPAN(@timestamp, ...) or SPAN(created_at, ...)) from timechart.
    • An outer DESC sort from reverse, with NULLS flipped to NULLS FIRST.
      This aligns with the PR’s “tier 1” behavior for reversing existing collations.

Also applies to: 370-425, 437-445

@opensearch-trigger-bot
Copy link
Contributor

This PR is stalled because it has been open for 2 weeks with no activity.

ahkcs added 15 commits January 5, 2026 11:46
This commit optimizes the `reverse` command in the Calcite planner by
intelligently reversing existing sort collations instead of always using
the ROW_NUMBER() approach.

Key changes:
- Added PlanUtils.reverseCollation() method to flip sort directions and
  null directions
- Updated CalciteRelNodeVisitor.visitReverse() to:
  - Check for existing sort collations
  - Reverse them if present (more efficient)
  - Fall back to ROW_NUMBER() when no sort exists
- Added comprehensive integration test expected outputs for:
  - Single field reverse pushdown
  - Multiple field reverse pushdown
  - Reverse fallback cases
  - Double reverse no-op optimizations

This optimization significantly improves performance when reversing
already-sorted data by leveraging database-native sort reversal.

Based on PR opensearch-project#4056 by @selsong

Signed-off-by: Kai Huang <ahkcs@amazon.com>

# Conflicts:
#	core/src/main/java/org/opensearch/sql/calcite/utils/PlanUtils.java
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>

# Conflicts:
#	core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
@ahkcs ahkcs force-pushed the feat/reverse_optimization branch from 22426ba to 2b969fd Compare January 5, 2026 20:29
ahkcs added 2 commits January 5, 2026 12:29
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Fix all issues with AI Agents 🤖
In @docs/user/ppl/cmd/reverse.md:
- Around line 27-29: The fenced code block that currently contains just
"reverse" in the Syntax section lacks a language identifier; update that fence
to include a language tag (preferably "ppl" or "text") so it follows markdown
style guidelines for PPL commands—for example change the starting fence from ```
to ```ppl (or ```text) while leaving the inner content "reverse" unchanged.
♻️ Duplicate comments (1)
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteReverseCommandIT.java (1)

113-162: Fix reverse sort test comments to match account_number fields.

The four comments in this block still mention age, but all the queries sort on account_number:

  • Line 115: // Test reverse with descending sort (- age)
  • Line 128: // Test reverse with mixed sort directions (- age, + firstname)
  • Line 149: // Test double reverse with descending sort (- age)
  • Line 162: // Test double reverse with mixed sort directions (- age, + firstname)

Updating them to reference account_number will avoid confusion when reading or grepping tests.

Suggested comment updates
   @Test
   public void testReverseWithDescendingSort() throws IOException {
-    // Test reverse with descending sort (- age)
+    // Test reverse with descending sort (- account_number)
@@
   @Test
   public void testReverseWithMixedSortDirections() throws IOException {
-    // Test reverse with mixed sort directions (- age, + firstname)
+    // Test reverse with mixed sort directions (- account_number, + firstname)
@@
   @Test
   public void testDoubleReverseWithDescendingSort() throws IOException {
-    // Test double reverse with descending sort (- age)
+    // Test double reverse with descending sort (- account_number)
@@
   @Test
   public void testDoubleReverseWithMixedSortDirections() throws IOException {
-    // Test double reverse with mixed sort directions (- age, + firstname)
+    // Test double reverse with mixed sort directions (- account_number, + firstname)
🧹 Nitpick comments (1)
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java (1)

683-809: Reverse backtracking logic looks correct; consider using existing helper to replace top of stack.

The new backtrackForCollation / insertReversedSortInTree plus the visitReverse rewrite correctly:

  • Stop at blocking nodes (Aggregate, BiRel, SetOp, Uncollect, windowed LogicalProject), so you don’t reverse implicit or destroyed order.
  • Walk back through projections/filters to find the first real Sort and insert a reversed LogicalSort just above it.
  • Fall back to @timestamp (IMPLICIT_FIELD_TIMESTAMP) DESC and treat reverse as a no-op when neither collation nor timestamp exists, which matches the documented semantics.

One small maintainability tweak: instead of

RelNode rebuiltTree = insertReversedSortInTree(currentNode, reversedCollation, context);
// Replace the current node in the builder with the rebuilt tree
context.relBuilder.build();
context.relBuilder.push(rebuiltTree);

you could use the existing utility that already encapsulates “replace top of relBuilder stack” behavior:

PlanUtils.replaceTop(context.relBuilder, rebuiltTree);

This keeps stack manipulation consistent with the rest of the visitor and makes the intent clearer.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 22426ba and e1bb16e.

📒 Files selected for processing (6)
  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
  • core/src/main/java/org/opensearch/sql/calcite/utils/PlanUtils.java
  • docs/user/ppl/cmd/reverse.md
  • integ-test/src/test/java/org/opensearch/sql/calcite/CalciteNoPushdownIT.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteReverseCommandIT.java
🚧 Files skipped from review as they are similar to previous changes (1)
  • core/src/main/java/org/opensearch/sql/calcite/utils/PlanUtils.java
🧰 Additional context used
📓 Path-based instructions (7)
**/*.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

**/*.java: Use PascalCase for class names (e.g., QueryExecutor)
Use camelCase for method and variable names (e.g., executeQuery)
Use UPPER_SNAKE_CASE for constants (e.g., MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
Prefer Optional<T> for nullable returns in Java
Avoid unnecessary object creation in loops
Use StringBuilder for string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/CalciteNoPushdownIT.java
  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteReverseCommandIT.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java

⚙️ CodeRabbit configuration file

**/*.java: - Flag methods >50 lines as potentially too complex - suggest refactoring

  • Flag classes >500 lines as needing organization review
  • Check for dead code, unused imports, and unused variables
  • Identify code reuse opportunities across similar implementations
  • Assess holistic maintainability - is code easy to understand and modify?
  • Flag code that appears AI-generated without sufficient human review
  • Verify Java naming conventions (PascalCase for classes, camelCase for methods/variables)
  • Check for proper JavaDoc on public classes and methods
  • Flag redundant comments that restate obvious code
  • Ensure proper error handling with specific exception types
  • Check for Optional usage instead of null returns
  • Validate proper use of try-with-resources for resource management

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/CalciteNoPushdownIT.java
  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteReverseCommandIT.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
integ-test/**/*IT.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

End-to-end scenarios need integration tests in integ-test/ module

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/CalciteNoPushdownIT.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteReverseCommandIT.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java

⚙️ CodeRabbit configuration file

integ-test/**/*IT.java: - Integration tests MUST use valid test data from resources

  • Verify test data files exist in integ-test/src/test/resources/
  • Check test assertions are meaningful and specific
  • Validate tests clean up resources after execution
  • Ensure tests are independent and can run in any order
  • Flag tests that reference non-existent indices (e.g., EMP)
  • Verify integration tests are in correct module (integ-test/)
  • Check tests can be run with ./gradlew :integ-test:integTest
  • Ensure proper test data setup and teardown
  • Validate end-to-end scenario coverage

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/CalciteNoPushdownIT.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteReverseCommandIT.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
**/*IT.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

Name integration tests with *IT.java suffix in OpenSearch SQL

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/CalciteNoPushdownIT.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteReverseCommandIT.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
**/test/**/*.java

⚙️ CodeRabbit configuration file

**/test/**/*.java: - Verify NULL input tests for all new functions

  • Check boundary condition tests (min/max values, empty inputs)
  • Validate error condition tests (invalid inputs, exceptions)
  • Ensure multi-document tests for per-document operations
  • Flag smoke tests without meaningful assertions
  • Check test naming follows pattern: test
  • Verify test data is realistic and covers edge cases
  • Verify test coverage for new business logic
  • Ensure tests are independent and don't rely on execution order
  • Validate meaningful test data that reflects real-world scenarios
  • Check for proper cleanup of test resources

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/CalciteNoPushdownIT.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteReverseCommandIT.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
**/calcite/**/*.java

⚙️ CodeRabbit configuration file

**/calcite/**/*.java: - Follow existing Calcite integration patterns

  • Verify RelNode visitor implementations are complete
  • Check RexNode handling follows project conventions
  • Validate SQL generation is correct and optimized
  • Ensure Calcite version compatibility
  • Follow existing patterns in CalciteRelNodeVisitor and CalciteRexNodeVisitor
  • Document any Calcite-specific workarounds
  • Test compatibility with Calcite version constraints

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/CalciteNoPushdownIT.java
  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteReverseCommandIT.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
core/src/main/java/**/*.java

⚙️ CodeRabbit configuration file

core/src/main/java/**/*.java: - New functions MUST have unit tests in the same commit

  • Public methods MUST have JavaDoc with @param, @return, and @throws
  • Follow existing function implementation patterns in the same package
  • New expression functions should follow ExpressionFunction interface patterns
  • Validate function naming follows project conventions (camelCase)

Files:

  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java

⚙️ CodeRabbit configuration file

core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java: - Flag methods >50 lines - this file is known to be hard to read

  • Suggest extracting complex logic into helper methods
  • Check for code organization and logical grouping
  • Validate all RelNode types are handled

Files:

  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
🧠 Learnings (8)
📓 Common learnings
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes

Applied to files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/CalciteNoPushdownIT.java
  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteReverseCommandIT.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*IT.java : Name integration tests with `*IT.java` suffix in OpenSearch SQL

Applied to files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/CalciteNoPushdownIT.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteReverseCommandIT.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*.java : Document Calcite-specific workarounds in code

Applied to files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/CalciteNoPushdownIT.java
  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteReverseCommandIT.java
📚 Learning: 2025-12-29T05:32:03.491Z
Learnt from: LantaoJin
Repo: opensearch-project/sql PR: 4993
File: opensearch/src/main/java/org/opensearch/sql/opensearch/planner/physical/CalciteEnumerableTopK.java:20-20
Timestamp: 2025-12-29T05:32:03.491Z
Learning: For any custom Calcite RelNode class (e.g., ones that extend EnumerableLimitSort or other Calcite RelNode types), always override the copy method. If copy is not overridden, cloning/copy operations may downgrade the instance to the parent class type, losing the custom behavior. In your implementation, ensure copy returns a new instance of the concrete class with all relevant fields and traits preserved, mirroring the current instance state.

Applied to files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/CalciteNoPushdownIT.java
  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteReverseCommandIT.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Follow existing patterns in `CalciteRelNodeVisitor` and `CalciteRexNodeVisitor` for Calcite integration

Applied to files:

  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
📚 Learning: 2025-12-11T05:27:39.856Z
Learnt from: LantaoJin
Repo: opensearch-project/sql PR: 0
File: :0-0
Timestamp: 2025-12-11T05:27:39.856Z
Learning: In opensearch-project/sql, for SEMI and ANTI join types in CalciteRelNodeVisitor.java, the `max` option has no effect because these join types only use the left side to filter records based on the existence of matches in the right side. The join results are identical regardless of max value (max=1, max=2, or max=∞). The early return for SEMI/ANTI joins before processing the `max` option is intentional and correct behavior.

Applied to files:

  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*Test.java : Name unit tests with `*Test.java` suffix in OpenSearch SQL

Applied to files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteReverseCommandIT.java
🪛 markdownlint-cli2 (0.18.1)
docs/user/ppl/cmd/reverse.md

27-27: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)
  • GitHub Check: build-linux (25, unit)
  • GitHub Check: build-linux (21, doc)
  • GitHub Check: build-linux (25, doc)
  • GitHub Check: build-linux (21, integration)
  • GitHub Check: bwc-tests-full-restart (25)
  • GitHub Check: build-linux (25, integration)
  • GitHub Check: build-linux (21, unit)
  • GitHub Check: bwc-tests-rolling-upgrade (25)
  • GitHub Check: bwc-tests-rolling-upgrade (21)
  • GitHub Check: bwc-tests-full-restart (21)
  • GitHub Check: security-it-linux (21)
  • GitHub Check: security-it-linux (25)
  • GitHub Check: build-windows-macos (macos-14, 25, integration)
  • GitHub Check: build-windows-macos (macos-14, 21, doc)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (macos-14, 25, doc)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
  • GitHub Check: build-windows-macos (macos-14, 21, integration)
  • GitHub Check: build-windows-macos (macos-14, 21, unit)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (macos-14, 25, unit)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
  • GitHub Check: security-it-windows-macos (windows-latest, 21)
  • GitHub Check: security-it-windows-macos (macos-14, 21)
  • GitHub Check: security-it-windows-macos (windows-latest, 25)
  • GitHub Check: security-it-windows-macos (macos-14, 25)
  • GitHub Check: test-sql-cli-integration (21)
  • GitHub Check: CodeQL-Scan (java)
🔇 Additional comments (3)
docs/user/ppl/cmd/reverse.md (1)

1-186: Documentation comprehensively and accurately reflects the new three-tier reverse optimization.

The content correctly describes all three tiers (explicit sort flip, @timestamp fallback, no-op) with appropriate examples covering single sorts, multiple sorts, @timestamp cases, double reversal, and interactions with head. Expected outputs are logically consistent with the described behavior. The optimization rationale and performance benefits are well-articulated. Excellent progressive structure from simple to complex scenarios.

integ-test/src/test/java/org/opensearch/sql/calcite/CalciteNoPushdownIT.java (1)

93-93: Include reverse IT in no-pushdown suite – looks correct.

Adding CalciteReverseCommandIT.class to the suite keeps reverse behavior covered when pushdown is disabled; no issues spotted.

integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java (1)

415-473: Reverse explain-plan coverage matches new semantics.

The new explain tests cleanly cover: no-op reverse, single/multi-field reverse pushdown, double reverse (both ignored and cancelling out), and the @timestamp-based fallback. They follow existing patterns (explainQueryYaml + loadExpectedPlan) and look consistent with the CalciteRelNodeVisitor implementation.

Comment on lines +27 to +29
```
reverse
* No parameters: The reverse command takes no arguments or options.

## Note
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add language specification to code fence.

The code block in the Syntax section is missing a language identifier, which violates markdown style guidelines. Since this documents a PPL command, use ```ppl or ```text instead of an empty fence. Based on learnings, this aligns with repository markdown standards.

🔎 Proposed fix
 ## Syntax
 
-```
+```ppl
 reverse
</details>

<details>
<summary>🧰 Tools</summary>

<details>
<summary>🪛 markdownlint-cli2 (0.18.1)</summary>

27-27: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

</details>

</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

In @docs/user/ppl/cmd/reverse.md around lines 27-29, The fenced code block that
currently contains just "reverse" in the Syntax section lacks a language
identifier; update that fence to include a language tag (preferably "ppl" or
"text") so it follows markdown style guidelines for PPL commands—for example
change the starting fence from toppl (or ```text) while leaving the
inner content "reverse" unchanged.


</details>

<!-- fingerprinting:phantom:triton:mongoose -->

<!-- This is an auto-generated comment by CodeRabbit -->

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLReverseTest.java (4)

94-112: Consider adding result verification for double reverse cancellation.

This test validates that double reverse produces the original ASC order in the logical plan and SQL, but doesn't verify actual result data. Adding verifyResult() would confirm the cancellation semantics work correctly end-to-end.

🔎 Suggested addition
     verifyLogical(root, expectedLogical);

+    // Verify result data matches original ascending order
+    String expectedResult =
+        "EMPNO=7369; ENAME=SMITH; JOB=CLERK; MGR=7902; HIREDATE=1980-12-17; SAL=800.00;"
+            + " COMM=null; DEPTNO=20\n"
+            + "EMPNO=7499; ENAME=ALLEN; JOB=SALESMAN; MGR=7698; HIREDATE=1981-02-20; SAL=1600.00;"
+            + " COMM=300.00; DEPTNO=30\n"
+            // ... remaining rows in ascending EMPNO order
+            ;
+    verifyResult(root, expectedResult);
+
     String expectedSparkSql =

135-157: Consider using specific exception types for negative tests.

Using Exception.class is broad. If the parser throws a specific exception (e.g., SyntaxCheckException, IllegalArgumentException), using that type would make tests more precise and prevent false positives from unrelated failures.


386-413: Consider adding Spark SQL verification for consistency.

testReverseAfterEvalWithSort and testReverseAfterMultipleFiltersWithSort only verify the logical plan but skip Spark SQL generation verification. For consistency with other tests and to validate the full optimization path, consider adding verifyPPLToSparkSQL() calls.


292-296: Consider adding window function blocking test.

The comment mentions window functions destroy collation, but there's no test for this case. Adding a test like testReverseAfterWindowFunctionIsNoOp would complete the blocking operator coverage.

@Test
public void testReverseAfterWindowFunctionIsNoOp() {
    // Window functions destroy input ordering
    String ppl = "source=EMP | eval row_num = row_number() over(order by SAL) | reverse";
    RelNode root = getRelNode(ppl);
    // Verify no additional sort node for reverse
    // ... assertions
}
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e1bb16e and dd9f4d6.

📒 Files selected for processing (1)
  • ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLReverseTest.java
🧰 Additional context used
📓 Path-based instructions (5)
**/*.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

**/*.java: Use PascalCase for class names (e.g., QueryExecutor)
Use camelCase for method and variable names (e.g., executeQuery)
Use UPPER_SNAKE_CASE for constants (e.g., MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
Prefer Optional<T> for nullable returns in Java
Avoid unnecessary object creation in loops
Use StringBuilder for string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code

Files:

  • ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLReverseTest.java

⚙️ CodeRabbit configuration file

**/*.java: - Flag methods >50 lines as potentially too complex - suggest refactoring

  • Flag classes >500 lines as needing organization review
  • Check for dead code, unused imports, and unused variables
  • Identify code reuse opportunities across similar implementations
  • Assess holistic maintainability - is code easy to understand and modify?
  • Flag code that appears AI-generated without sufficient human review
  • Verify Java naming conventions (PascalCase for classes, camelCase for methods/variables)
  • Check for proper JavaDoc on public classes and methods
  • Flag redundant comments that restate obvious code
  • Ensure proper error handling with specific exception types
  • Check for Optional usage instead of null returns
  • Validate proper use of try-with-resources for resource management

Files:

  • ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLReverseTest.java
**/*Test.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

**/*Test.java: All new business logic requires unit tests
Name unit tests with *Test.java suffix in OpenSearch SQL

Files:

  • ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLReverseTest.java
**/test/**/*.java

⚙️ CodeRabbit configuration file

**/test/**/*.java: - Verify NULL input tests for all new functions

  • Check boundary condition tests (min/max values, empty inputs)
  • Validate error condition tests (invalid inputs, exceptions)
  • Ensure multi-document tests for per-document operations
  • Flag smoke tests without meaningful assertions
  • Check test naming follows pattern: test
  • Verify test data is realistic and covers edge cases
  • Verify test coverage for new business logic
  • Ensure tests are independent and don't rely on execution order
  • Validate meaningful test data that reflects real-world scenarios
  • Check for proper cleanup of test resources

Files:

  • ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLReverseTest.java
**/ppl/**/*.java

⚙️ CodeRabbit configuration file

**/ppl/**/*.java: - For PPL parser changes, verify grammar tests with positive/negative cases

  • Check AST generation for new syntax
  • Ensure corresponding AST builder classes are updated
  • Validate edge cases and boundary conditions

Files:

  • ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLReverseTest.java
**/calcite/**/*.java

⚙️ CodeRabbit configuration file

**/calcite/**/*.java: - Follow existing Calcite integration patterns

  • Verify RelNode visitor implementations are complete
  • Check RexNode handling follows project conventions
  • Validate SQL generation is correct and optimized
  • Ensure Calcite version compatibility
  • Follow existing patterns in CalciteRelNodeVisitor and CalciteRexNodeVisitor
  • Document any Calcite-specific workarounds
  • Test compatibility with Calcite version constraints

Files:

  • ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLReverseTest.java
🧠 Learnings (4)
📓 Common learnings
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
Learnt from: LantaoJin
Repo: opensearch-project/sql PR: 4993
File: opensearch/src/main/java/org/opensearch/sql/opensearch/planner/physical/CalciteEnumerableTopK.java:20-20
Timestamp: 2025-12-29T05:32:11.893Z
Learning: In opensearch-project/sql, when creating custom Calcite RelNode classes that extend EnumerableLimitSort or other Calcite RelNode types, always override the `copy` method. Without overriding copy, the class will downgrade to its parent class type during copy operations, losing the custom implementation.
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes

Applied to files:

  • ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLReverseTest.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*.java : Document Calcite-specific workarounds in code

Applied to files:

  • ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLReverseTest.java
📚 Learning: 2025-12-29T05:32:03.491Z
Learnt from: LantaoJin
Repo: opensearch-project/sql PR: 4993
File: opensearch/src/main/java/org/opensearch/sql/opensearch/planner/physical/CalciteEnumerableTopK.java:20-20
Timestamp: 2025-12-29T05:32:03.491Z
Learning: For any custom Calcite RelNode class (e.g., ones that extend EnumerableLimitSort or other Calcite RelNode types), always override the copy method. If copy is not overridden, cloning/copy operations may downgrade the instance to the parent class type, losing the custom behavior. In your implementation, ensure copy returns a new instance of the concrete class with all relevant fields and traits preserved, mirroring the current instance state.

Applied to files:

  • ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLReverseTest.java
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)
  • GitHub Check: security-it-linux (21)
  • GitHub Check: security-it-linux (25)
  • GitHub Check: build-linux (21, doc)
  • GitHub Check: build-linux (25, doc)
  • GitHub Check: build-linux (25, integration)
  • GitHub Check: build-linux (21, unit)
  • GitHub Check: build-linux (25, unit)
  • GitHub Check: build-linux (21, integration)
  • GitHub Check: bwc-tests-full-restart (21)
  • GitHub Check: bwc-tests-rolling-upgrade (21)
  • GitHub Check: bwc-tests-full-restart (25)
  • GitHub Check: bwc-tests-rolling-upgrade (25)
  • GitHub Check: build-windows-macos (macos-14, 25, doc)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (macos-14, 25, integration)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
  • GitHub Check: build-windows-macos (macos-14, 25, unit)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (macos-14, 21, integration)
  • GitHub Check: build-windows-macos (macos-14, 21, unit)
  • GitHub Check: build-windows-macos (macos-14, 21, doc)
  • GitHub Check: security-it-windows-macos (macos-14, 21)
  • GitHub Check: security-it-windows-macos (windows-latest, 25)
  • GitHub Check: test-sql-cli-integration (21)
  • GitHub Check: security-it-windows-macos (macos-14, 25)
  • GitHub Check: security-it-windows-macos (windows-latest, 21)
  • GitHub Check: CodeQL-Scan (java)
🔇 Additional comments (7)
ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLReverseTest.java (7)

12-27: Well-documented test class.

The Javadoc clearly explains the three-tier reverse behavior and correctly references integration tests for scenarios not covered here. This helps maintainers understand test scope.


29-72: Good comprehensive test with logical, result, and SQL verification.

The test effectively validates the reverse optimization by checking the logical plan structure, actual result ordering, and generated Spark SQL.


159-179: Good coverage of "last sort wins" semantics.

This test correctly validates that reverse applies only to the most recent sort (- ENAME), aligning with the PPL semantics mentioned in the PR objectives.


242-267: Good test for fetch limit semantics preservation.

The comment clearly documents why this should NOT be optimized—preserving "take first 5, then sort" semantics. Consider adding result verification to confirm exactly 5 rows are returned.


292-360: Thorough blocking operator tests with clear documentation.

These tests effectively validate that reverse becomes a no-op after operators that destroy collation (aggregate, join). The inline comments explain the rationale well, and testReverseAfterSortAndAggregationIsNoOp includes result verification.


415-432: Good test for sort-join-sort-reverse interaction.

This test validates an important edge case: the sort before the join is preserved in the plan but its collation is destroyed, while the sort after the join can be reversed. This aligns with the PR's backtracking logic.


434-457: Good complementary test to aggregation no-op case.

This test effectively demonstrates that while reverse after aggregation alone is a no-op, adding a sort after aggregation restores the ability to reverse. This pair of tests (testReverseAfterAggregationIsNoOp and testReverseAfterAggregationWithSort) provides clear coverage of the aggregation boundary behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Support reverse pushdown with Calcite

6 participants