Skip to content

Conversation

@sandeshkr419
Copy link
Owner

@sandeshkr419 sandeshkr419 commented Dec 3, 2025

Description

Version forced to 3.3.0.

Testing:

./gradlew :integ-test:integTest --tests "org.opensearch.sql.calcite.clickbench.CalcitePPLClickBenchIT" -Dtests.method="testDataFusion" -Dtests.cluster=localhost:9200 -Dtests.rest.cluster=localhost:9200 -DignorePrometheus=true -Dtests.clustername=opensearch -Dtests.output=true

    Total
    42
     queries succeed. Average duration:
    139
     ms


    Total:
    42
     | Passed:
    25
     | Failed (200):
    8
     | Failed (non-200):
    9

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

opensearch-trigger-bot bot and others added 30 commits October 1, 2025 11:09
)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
---------

Signed-off-by: Kai Huang <ahkcs@amazon.com>
* PPL fillnull command enhancement

Signed-off-by: Kai Huang <ahkcs@amazon.com>

# Conflicts:
#	integ-test/src/test/java/org/opensearch/sql/calcite/CalciteNoPushdownIT.java

* add to searchableKeyWord

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* fixes

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* fix CI

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* update error message handling

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* formatting

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* put file back

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* removal

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* update doc

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* update doc

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* add IT

Signed-off-by: Kai Huang <ahkcs@amazon.com>

---------

Signed-off-by: Kai Huang <ahkcs@amazon.com>
…ct#4442)

* Add ignorePrometheus flag

Signed-off-by: Peng Huo <penghuo@gmail.com>

* support -DignorePrometheus in integTest and docTest

Signed-off-by: Peng Huo <penghuo@gmail.com>

* Update

Signed-off-by: Peng Huo <penghuo@gmail.com>

* Update

Signed-off-by: Peng Huo <penghuo@gmail.com>

---------

Signed-off-by: Peng Huo <penghuo@gmail.com>
Fixed typo: evenstats -->  eventstats

Signed-off-by: Alexey Temnikov <alexey.temnikov@improving.com>
…ches (opensearch-project#4025)

* Update delete_backport_branch workflow to include release-chores branches

Signed-off-by: Riley Jerger <rjerger@amazon.com>

* Update delete_backport_branch workflow to use github-script with proper permissions

Signed-off-by: Riley Jerger <rjerger@amazon.com>

---------

Signed-off-by: Riley Jerger <rjerger@amazon.com>
…nsearch-project#4454)

* Resolve concurrency issue

Signed-off-by: Louis Chu <lingzhichu.clz@gmail.com>

* Update fix

Signed-off-by: Louis Chu <lingzhichu.clz@gmail.com>

* Add UT

Signed-off-by: Louis Chu <lingzhichu.clz@gmail.com>

* Revise comments

Signed-off-by: Louis Chu <lingzhichu.clz@gmail.com>

* Fix style

Signed-off-by: Louis Chu <lingzhichu.clz@gmail.com>

---------

Signed-off-by: Louis Chu <lingzhichu.clz@gmail.com>
* Refactor qualified name resolution in PPL

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* Add tests

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* fix naming

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* Minor fix

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* Fix test failure

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

---------

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>
…e join criteria (opensearch-project#4474)

Signed-off-by: Lantao Jin <ltjin@amazon.com>
…ITs (opensearch-project#4462)

* Use  Guice.createInjector

Signed-off-by: Peng Huo <penghuo@gmail.com>

* Update

Signed-off-by: Peng Huo <penghuo@gmail.com>

---------

Signed-off-by: Peng Huo <penghuo@gmail.com>
Signed-off-by: Peng Huo <penghuo@gmail.com>
* Add mvappend function for Calcite PPL

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* Fix annonymizer test

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* Fix IT

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* Minor fix

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* Fix type coercion issue

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* Fix test

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

---------

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>
* Fix missing keywordsCanBeId

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* revert partially

Signed-off-by: Lantao Jin <ltjin@amazon.com>

---------

Signed-off-by: Lantao Jin <ltjin@amazon.com>
…pensearch-project#4413)

* Fallback to sub-aggregation if composite aggregation doesn't support

Signed-off-by: Heng Qian <qianheng@amazon.com>

* merging main

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Address comments

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Address comments

Signed-off-by: Heng Qian <qianheng@amazon.com>

---------

Signed-off-by: Heng Qian <qianheng@amazon.com>
Signed-off-by: Tomoyuki Morita <moritato@amazon.com>
* Fix mapping after aggregation push down

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Fix IT and UT

Signed-off-by: Heng Qian <qianheng@amazon.com>

* address comments

Signed-off-by: Heng Qian <qianheng@amazon.com>

---------

Signed-off-by: Heng Qian <qianheng@amazon.com>
* Add MAP_CONCAT internal function

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* Minor fix

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

---------

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>
Signed-off-by: Tomoyuki MORITA <moritato@amazon.com>
…-project#4464)

Add per_second() support to the timechart command by implementing Option 3 (Eval Transformation).

---------

Signed-off-by: Chen Dai <daichen@amazon.com>
…opensearch-project#4501)

* Add configurable sytem limitations for subsearch and join command

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* Fix IT

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* typo

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* fix IT

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* remove rollback in doc

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* address comments

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* fix typo

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* Fix IT

Signed-off-by: Lantao Jin <ltjin@amazon.com>

---------

Signed-off-by: Lantao Jin <ltjin@amazon.com>
…pensearch-project#4534)

* [FollowUp] Set 0 and negative value of subsearch.maxout as unlimited

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* fix doctest

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* Fix conflicts

Signed-off-by: Lantao Jin <ltjin@amazon.com>

---------

Signed-off-by: Lantao Jin <ltjin@amazon.com>
* fix percentile bug

Signed-off-by: xinyual <xinyual@amazon.com>

* add IT

Signed-off-by: xinyual <xinyual@amazon.com>

* optimize it

Signed-off-by: xinyual <xinyual@amazon.com>

---------

Signed-off-by: xinyual <xinyual@amazon.com>
…opensearch-project#4522)

* Including metadata fields type when doing agg/filter script push down

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Fix IT

Signed-off-by: Heng Qian <qianheng@amazon.com>

---------

Signed-off-by: Heng Qian <qianheng@amazon.com>
expani and others added 23 commits November 7, 2025 10:41
Signed-off-by: expani <anijainc@amazon.com>
Signed-off-by: expani <anijainc@amazon.com>
Signed-off-by: expani <anijainc@amazon.com>
Signed-off-by: Sandesh Kumar <sandeshkr419@gmail.com>
Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>
Signed-off-by: Marc Handalian <marc.handalian@gmail.com>
…lan Adding e2e test workflow

add e2e test workflow to sql
* Added support for Timestamp fields in filter

Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>

* Updated the condition in TypeConverted to handle timestamp udt

Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>

* Modifed the updateTimeStampFunction to handle recursively and updated checker method for timestamp udt

Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>

---------

Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>
* Add assertions on all 43 queries against expected results from 3.4.
Updated test to categorize into passing/failing (non200) and failing with 200.

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* add CalcitePPLClickBenchIT.testDataFusion to e2e test workflow

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* Update dataset to have hits on all queries, update expected fixtures.

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

---------

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>
… UDTs (opensearch-project#12)

Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>
Signed-off-by: Marc Handalian <marc.handalian@gmail.com>
…plan

update gh workflow to display pass/fail output and update supported list
…ing on.

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>
…eries

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>
…plan

update index to multi shard and update ignored set to ignore fetch queries
Signed-off-by: Marc Handalian <marc.handalian@gmail.com>
…unctions (opensearch-project#16)

* Added common visitor to update timestamp and extract functions in relnode

Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>

* Added mapping for regexp_replace

Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>

* Removed some unnecessary lines

Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>

* Updated the mapStringToTimeUnitRange method with all supported cases

Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>

* Refactors for SPAN and LIKE function preprocess

Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>

* Removed the resource dir which added in earlier commit

Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>

* Added UDF in substrait for date_part and date_format

Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>

* Removed isExtractFunction

Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>

---------

Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>
Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>
…pensearch-project#15)

* avg fix field mapping

Signed-off-by: Sandesh Kumar <sandeshkr419@gmail.com>

* fix dependency resolution

Signed-off-by: Sandesh Kumar <sandeshkr419@gmail.com>

---------

Signed-off-by: Sandesh Kumar <sandeshkr419@gmail.com>
@gemini-code-assist
Copy link

Summary of Changes

Hello @sandeshkr419, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly expands the PPL query engine's capabilities by introducing a new Unified Query API and several powerful commands and functions. The changes focus on improving data fusion, query planning, and optimization, while also addressing type handling, error reporting, and resource management. These updates aim to provide a more robust and versatile analytical experience for users.

Highlights

  • Unified Query API: Introduced a new api module with a UnifiedQueryPlanner class, providing a high-level, declarative API for parsing and analyzing PPL queries into Calcite RelNode logical plans. This simplifies integration for external systems like Apache Spark.
  • New PPL Commands: Added support for several new PPL commands: multisearch to combine results from multiple subsearches, replace for text replacement with wildcard and regex support, and streamstats for calculating cumulative or rolling statistics over event streams.
  • New PPL Functions: Implemented new collection and JSON functions: mvappend to append elements to an array, map_append and map_remove for map manipulation, and json_extract_all to extract all fields from JSON into a map.
  • Improved Query Planning and Optimization: Enhanced the Calcite query planner with a new PPLAggGroupMergeRule for optimizing aggregate group fields. Significant improvements to fillnull, bin, rex, and timechart commands, including better type compatibility checks, named capture group support for rex, and dynamic rate calculation for timechart's per_second functions.
  • Subquery and Join Limit Settings: Introduced new settings (plugins.ppl.subsearch.maxout and plugins.ppl.join.subsearch_maxout) to control the maximum number of rows returned by subsearches and subsearches involved in join operations, preventing excessive resource consumption.
  • Enhanced Type Coercion and Error Handling: Refactored type coercion logic for improved accuracy and added stricter validation for fillnull type compatibility and rex named capture group syntax, providing clearer error messages.
  • Build and Dependency Updates: Updated Apache Calcite to version 1.41.0, integrated Substrait dependencies, and consolidated Maven snapshot repository URLs. Also included dependency version bumps for Jackson and Protobuf, and added a dependency substitution for commons-lang to address a CVE.
Ignored Files
  • Ignored by pattern: .github/workflows/** (8)
    • .github/workflows/datafusion-e2e-test.yml
    • .github/workflows/delete_backport_branch.yml
    • .github/workflows/maven-publish-modules.yml
    • .github/workflows/maven-publish.yml
    • .github/workflows/publish-async-query-core.yml
    • .github/workflows/publish-grammar-files.yml
    • .github/workflows/sql-test-and-build-workflow.yml
    • .github/workflows/stalled.yml
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Signed-off-by: Sandesh Kumar <sandeshkr419@gmail.com>
Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant set of changes, primarily focused on a new "Data Fusion" or "Unified Query" API, along with a major Calcite version upgrade and the introduction of Substrait dependencies. It also adds several new PPL commands (multisearch, replace, streamstats) and functions, and includes numerous refactorings and documentation updates.

My review has identified a few critical and high-severity issues that should be addressed before merging:

  • A hardcoded change in QueryService.java enables the Calcite engine for all query types, including SQL, which is not yet fully supported and could cause breakages.
  • The mapping of IP and binary data types to BIGINT in OpenSearchTypeFactory.java is a breaking change that will affect functions relying on these types.
  • There is a minor duplication in the root build.gradle file that should be cleaned up for maintainability.

Overall, the introduction of the Unified Query API and the refactoring efforts (like QualifiedNameResolver and SchemaUnifier) are positive steps towards a more robust and extensible query engine. The documentation updates are also very thorough and helpful. Once the identified issues are resolved, this will be a strong contribution.

Comment on lines 310 to 312
private boolean shouldUseCalcite(QueryType queryType) {
return isCalciteEnabled(settings) && queryType == QueryType.PPL;
return true;//isCalciteEnabled(settings) && queryType == QueryType.PPL;
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The shouldUseCalcite method is hardcoded to return true. Based on the commented-out code and the comment // Calcite is not available for SQL query now., it seems this change is intended for development or testing purposes. Merging this will enable the Calcite engine for all query types, including SQL, which could lead to unexpected behavior or breakages for SQL queries. This should be reverted to the original logic before merging.

Suggested change
private boolean shouldUseCalcite(QueryType queryType) {
return isCalciteEnabled(settings) && queryType == QueryType.PPL;
return true;//isCalciteEnabled(settings) && queryType == QueryType.PPL;
}
private boolean shouldUseCalcite(QueryType queryType) {
return isCalciteEnabled(settings) && queryType == QueryType.PPL;
}

Comment on lines +167 to +168
return TYPE_FACTORY.createSqlType(SqlTypeName.BIGINT, nullable);
// return TYPE_FACTORY.createUDT(ExprUDT.EXPR_IP, nullable);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The IP data type is now being mapped to BIGINT instead of the user-defined EXPR_IP type. The comment suggests this is for Substrait compatibility. However, this is a breaking change that will likely cause functions that operate on IP addresses, such as CIDRMATCH, to fail because they will receive a BIGINT instead of the expected IP type. This change should be made conditional or handled in a way that doesn't break existing functionality for non-Substrait use cases.

Suggested change
return TYPE_FACTORY.createSqlType(SqlTypeName.BIGINT, nullable);
// return TYPE_FACTORY.createUDT(ExprUDT.EXPR_IP, nullable);
return TYPE_FACTORY.createUDT(ExprUDT.EXPR_IP, nullable);

Comment on lines +199 to 200
// return TYPE_FACTORY.createUDT(ExprUDT.EXPR_BINARY, nullable);
} else if (fieldType.legacyTypeName().equalsIgnoreCase("timestamp")) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The legacy binary data type is now being mapped to BIGINT instead of the user-defined EXPR_BINARY type. Similar to the change for the IP type, this is a breaking change motivated by Substrait compatibility. This could cause issues for any functionality that relies on the binary type. This change should be conditional to avoid breaking existing behavior.

Suggested change
// return TYPE_FACTORY.createUDT(ExprUDT.EXPR_BINARY, nullable);
} else if (fieldType.legacyTypeName().equalsIgnoreCase("timestamp")) {
return TYPE_FACTORY.createUDT(ExprUDT.EXPR_BINARY, nullable);

Comment on lines +214 to +215
return TYPE_FACTORY.createSqlType(SqlTypeName.BIGINT, nullable);
// return TYPE_FACTORY.createUDT(ExprUDT.EXPR_IP, nullable);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The legacy ip data type is also being mapped to BIGINT. This is a breaking change for the same reasons as the IP enum case. This should be reverted to use the EXPR_IP UDT to maintain compatibility.

Suggested change
return TYPE_FACTORY.createSqlType(SqlTypeName.BIGINT, nullable);
// return TYPE_FACTORY.createUDT(ExprUDT.EXPR_IP, nullable);
return TYPE_FACTORY.createUDT(ExprUDT.EXPR_IP, nullable);

Comment on lines +163 to +164
resolutionStrategy.force 'org.apache.calcite.avatica:avatica-core:1.26.0'
resolutionStrategy.force 'org.slf4j:slf4j-api:2.0.13'

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

These force declarations for avatica-core and slf4j-api are duplicates of lines 158-159. The redundant lines can be removed for better maintainability.

Signed-off-by: Sandesh Kumar <sandeshkr419@gmail.com>
@sandeshkr419 sandeshkr419 merged commit 6f7fd68 into 3.3-df Dec 3, 2025
6 of 44 checks passed
@sandeshkr419 sandeshkr419 deleted the feature/substrait-plan branch December 3, 2025 06:53
@sandeshkr419 sandeshkr419 restored the feature/substrait-plan branch December 3, 2025 06:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.