-
Notifications
You must be signed in to change notification settings - Fork 0
Data fusion changes #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
--------- Signed-off-by: Kai Huang <ahkcs@amazon.com>
* PPL fillnull command enhancement Signed-off-by: Kai Huang <ahkcs@amazon.com> # Conflicts: # integ-test/src/test/java/org/opensearch/sql/calcite/CalciteNoPushdownIT.java * add to searchableKeyWord Signed-off-by: Kai Huang <ahkcs@amazon.com> * fixes Signed-off-by: Kai Huang <ahkcs@amazon.com> * fix CI Signed-off-by: Kai Huang <ahkcs@amazon.com> * update error message handling Signed-off-by: Kai Huang <ahkcs@amazon.com> * formatting Signed-off-by: Kai Huang <ahkcs@amazon.com> * put file back Signed-off-by: Kai Huang <ahkcs@amazon.com> * removal Signed-off-by: Kai Huang <ahkcs@amazon.com> * update doc Signed-off-by: Kai Huang <ahkcs@amazon.com> * update doc Signed-off-by: Kai Huang <ahkcs@amazon.com> * add IT Signed-off-by: Kai Huang <ahkcs@amazon.com> --------- Signed-off-by: Kai Huang <ahkcs@amazon.com>
…ct#4442) * Add ignorePrometheus flag Signed-off-by: Peng Huo <penghuo@gmail.com> * support -DignorePrometheus in integTest and docTest Signed-off-by: Peng Huo <penghuo@gmail.com> * Update Signed-off-by: Peng Huo <penghuo@gmail.com> * Update Signed-off-by: Peng Huo <penghuo@gmail.com> --------- Signed-off-by: Peng Huo <penghuo@gmail.com>
Fixed typo: evenstats --> eventstats Signed-off-by: Alexey Temnikov <alexey.temnikov@improving.com>
…ches (opensearch-project#4025) * Update delete_backport_branch workflow to include release-chores branches Signed-off-by: Riley Jerger <rjerger@amazon.com> * Update delete_backport_branch workflow to use github-script with proper permissions Signed-off-by: Riley Jerger <rjerger@amazon.com> --------- Signed-off-by: Riley Jerger <rjerger@amazon.com>
…nsearch-project#4454) * Resolve concurrency issue Signed-off-by: Louis Chu <lingzhichu.clz@gmail.com> * Update fix Signed-off-by: Louis Chu <lingzhichu.clz@gmail.com> * Add UT Signed-off-by: Louis Chu <lingzhichu.clz@gmail.com> * Revise comments Signed-off-by: Louis Chu <lingzhichu.clz@gmail.com> * Fix style Signed-off-by: Louis Chu <lingzhichu.clz@gmail.com> --------- Signed-off-by: Louis Chu <lingzhichu.clz@gmail.com>
* Refactor qualified name resolution in PPL Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Add tests Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * fix naming Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Minor fix Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Fix test failure Signed-off-by: Tomoyuki Morita <moritato@amazon.com> --------- Signed-off-by: Tomoyuki Morita <moritato@amazon.com>
…e join criteria (opensearch-project#4474) Signed-off-by: Lantao Jin <ltjin@amazon.com>
…ITs (opensearch-project#4462) * Use Guice.createInjector Signed-off-by: Peng Huo <penghuo@gmail.com> * Update Signed-off-by: Peng Huo <penghuo@gmail.com> --------- Signed-off-by: Peng Huo <penghuo@gmail.com>
Signed-off-by: Peng Huo <penghuo@gmail.com>
* Add mvappend function for Calcite PPL Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Fix annonymizer test Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Fix IT Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Minor fix Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Fix type coercion issue Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Fix test Signed-off-by: Tomoyuki Morita <moritato@amazon.com> --------- Signed-off-by: Tomoyuki Morita <moritato@amazon.com>
* Fix missing keywordsCanBeId Signed-off-by: Lantao Jin <ltjin@amazon.com> * revert partially Signed-off-by: Lantao Jin <ltjin@amazon.com> --------- Signed-off-by: Lantao Jin <ltjin@amazon.com>
…oject#4475) Signed-off-by: Songkan Tang <songkant@amazon.com>
…pensearch-project#4413) * Fallback to sub-aggregation if composite aggregation doesn't support Signed-off-by: Heng Qian <qianheng@amazon.com> * merging main Signed-off-by: Heng Qian <qianheng@amazon.com> * Address comments Signed-off-by: Heng Qian <qianheng@amazon.com> * Address comments Signed-off-by: Heng Qian <qianheng@amazon.com> --------- Signed-off-by: Heng Qian <qianheng@amazon.com>
…search-project#4440) --------- Signed-off-by: Peng Huo <penghuo@gmail.com>
Signed-off-by: Tomoyuki Morita <moritato@amazon.com>
…pensearch-project#4520) Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
* Fix mapping after aggregation push down Signed-off-by: Heng Qian <qianheng@amazon.com> * Fix IT and UT Signed-off-by: Heng Qian <qianheng@amazon.com> * address comments Signed-off-by: Heng Qian <qianheng@amazon.com> --------- Signed-off-by: Heng Qian <qianheng@amazon.com>
* Add MAP_CONCAT internal function Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Minor fix Signed-off-by: Tomoyuki Morita <moritato@amazon.com> --------- Signed-off-by: Tomoyuki Morita <moritato@amazon.com> Signed-off-by: Tomoyuki MORITA <moritato@amazon.com>
…-project#4464) Add per_second() support to the timechart command by implementing Option 3 (Eval Transformation). --------- Signed-off-by: Chen Dai <daichen@amazon.com>
…opensearch-project#4501) * Add configurable sytem limitations for subsearch and join command Signed-off-by: Lantao Jin <ltjin@amazon.com> * Fix IT Signed-off-by: Lantao Jin <ltjin@amazon.com> * typo Signed-off-by: Lantao Jin <ltjin@amazon.com> * fix IT Signed-off-by: Lantao Jin <ltjin@amazon.com> * remove rollback in doc Signed-off-by: Lantao Jin <ltjin@amazon.com> * address comments Signed-off-by: Lantao Jin <ltjin@amazon.com> * fix typo Signed-off-by: Lantao Jin <ltjin@amazon.com> * Fix IT Signed-off-by: Lantao Jin <ltjin@amazon.com> --------- Signed-off-by: Lantao Jin <ltjin@amazon.com>
…pensearch-project#4534) * [FollowUp] Set 0 and negative value of subsearch.maxout as unlimited Signed-off-by: Lantao Jin <ltjin@amazon.com> * fix doctest Signed-off-by: Lantao Jin <ltjin@amazon.com> * Fix conflicts Signed-off-by: Lantao Jin <ltjin@amazon.com> --------- Signed-off-by: Lantao Jin <ltjin@amazon.com>
* fix percentile bug Signed-off-by: xinyual <xinyual@amazon.com> * add IT Signed-off-by: xinyual <xinyual@amazon.com> * optimize it Signed-off-by: xinyual <xinyual@amazon.com> --------- Signed-off-by: xinyual <xinyual@amazon.com>
…opensearch-project#4522) * Including metadata fields type when doing agg/filter script push down Signed-off-by: Heng Qian <qianheng@amazon.com> * Fix IT Signed-off-by: Heng Qian <qianheng@amazon.com> --------- Signed-off-by: Heng Qian <qianheng@amazon.com>
…ch-project#4541) Signed-off-by: Lantao Jin <ltjin@amazon.com>
Updates CalcitePPLClickbenchITs
Signed-off-by: expani <anijainc@amazon.com>
Signed-off-by: expani <anijainc@amazon.com>
Signed-off-by: expani <anijainc@amazon.com>
Signed-off-by: Sandesh Kumar <sandeshkr419@gmail.com>
Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>
This reverts commit eaab86f.
Signed-off-by: Marc Handalian <marc.handalian@gmail.com>
…lan Adding e2e test workflow add e2e test workflow to sql
* Added support for Timestamp fields in filter Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com> * Updated the condition in TypeConverted to handle timestamp udt Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com> * Modifed the updateTimeStampFunction to handle recursively and updated checker method for timestamp udt Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com> --------- Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>
* Add assertions on all 43 queries against expected results from 3.4. Updated test to categorize into passing/failing (non200) and failing with 200. Signed-off-by: Marc Handalian <marc.handalian@gmail.com> * add CalcitePPLClickBenchIT.testDataFusion to e2e test workflow Signed-off-by: Marc Handalian <marc.handalian@gmail.com> * Update dataset to have hits on all queries, update expected fixtures. Signed-off-by: Marc Handalian <marc.handalian@gmail.com> --------- Signed-off-by: Marc Handalian <marc.handalian@gmail.com>
… UDTs (opensearch-project#12) Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>
count changes
Signed-off-by: Marc Handalian <marc.handalian@gmail.com>
…plan update gh workflow to display pass/fail output and update supported list
…ing on. Signed-off-by: Marc Handalian <marc.handalian@gmail.com>
…eries Signed-off-by: Marc Handalian <marc.handalian@gmail.com>
…plan update index to multi shard and update ignored set to ignore fetch queries
Signed-off-by: Marc Handalian <marc.handalian@gmail.com>
…plan ignore query 4
…unctions (opensearch-project#16) * Added common visitor to update timestamp and extract functions in relnode Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com> * Added mapping for regexp_replace Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com> * Removed some unnecessary lines Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com> * Updated the mapStringToTimeUnitRange method with all supported cases Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com> * Refactors for SPAN and LIKE function preprocess Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com> * Removed the resource dir which added in earlier commit Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com> * Added UDF in substrait for date_part and date_format Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com> * Removed isExtractFunction Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com> --------- Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>
Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>
…pensearch-project#15) * avg fix field mapping Signed-off-by: Sandesh Kumar <sandeshkr419@gmail.com> * fix dependency resolution Signed-off-by: Sandesh Kumar <sandeshkr419@gmail.com> --------- Signed-off-by: Sandesh Kumar <sandeshkr419@gmail.com>
Summary of ChangesHello @sandeshkr419, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly expands the PPL query engine's capabilities by introducing a new Unified Query API and several powerful commands and functions. The changes focus on improving data fusion, query planning, and optimization, while also addressing type handling, error reporting, and resource management. These updates aim to provide a more robust and versatile analytical experience for users. Highlights
Ignored Files
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
Signed-off-by: Sandesh Kumar <sandeshkr419@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a significant set of changes, primarily focused on a new "Data Fusion" or "Unified Query" API, along with a major Calcite version upgrade and the introduction of Substrait dependencies. It also adds several new PPL commands (multisearch, replace, streamstats) and functions, and includes numerous refactorings and documentation updates.
My review has identified a few critical and high-severity issues that should be addressed before merging:
- A hardcoded change in
QueryService.javaenables the Calcite engine for all query types, including SQL, which is not yet fully supported and could cause breakages. - The mapping of
IPandbinarydata types toBIGINTinOpenSearchTypeFactory.javais a breaking change that will affect functions relying on these types. - There is a minor duplication in the root
build.gradlefile that should be cleaned up for maintainability.
Overall, the introduction of the Unified Query API and the refactoring efforts (like QualifiedNameResolver and SchemaUnifier) are positive steps towards a more robust and extensible query engine. The documentation updates are also very thorough and helpful. Once the identified issues are resolved, this will be a strong contribution.
| private boolean shouldUseCalcite(QueryType queryType) { | ||
| return isCalciteEnabled(settings) && queryType == QueryType.PPL; | ||
| return true;//isCalciteEnabled(settings) && queryType == QueryType.PPL; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The shouldUseCalcite method is hardcoded to return true. Based on the commented-out code and the comment // Calcite is not available for SQL query now., it seems this change is intended for development or testing purposes. Merging this will enable the Calcite engine for all query types, including SQL, which could lead to unexpected behavior or breakages for SQL queries. This should be reverted to the original logic before merging.
| private boolean shouldUseCalcite(QueryType queryType) { | |
| return isCalciteEnabled(settings) && queryType == QueryType.PPL; | |
| return true;//isCalciteEnabled(settings) && queryType == QueryType.PPL; | |
| } | |
| private boolean shouldUseCalcite(QueryType queryType) { | |
| return isCalciteEnabled(settings) && queryType == QueryType.PPL; | |
| } |
| return TYPE_FACTORY.createSqlType(SqlTypeName.BIGINT, nullable); | ||
| // return TYPE_FACTORY.createUDT(ExprUDT.EXPR_IP, nullable); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The IP data type is now being mapped to BIGINT instead of the user-defined EXPR_IP type. The comment suggests this is for Substrait compatibility. However, this is a breaking change that will likely cause functions that operate on IP addresses, such as CIDRMATCH, to fail because they will receive a BIGINT instead of the expected IP type. This change should be made conditional or handled in a way that doesn't break existing functionality for non-Substrait use cases.
| return TYPE_FACTORY.createSqlType(SqlTypeName.BIGINT, nullable); | |
| // return TYPE_FACTORY.createUDT(ExprUDT.EXPR_IP, nullable); | |
| return TYPE_FACTORY.createUDT(ExprUDT.EXPR_IP, nullable); |
| // return TYPE_FACTORY.createUDT(ExprUDT.EXPR_BINARY, nullable); | ||
| } else if (fieldType.legacyTypeName().equalsIgnoreCase("timestamp")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The legacy binary data type is now being mapped to BIGINT instead of the user-defined EXPR_BINARY type. Similar to the change for the IP type, this is a breaking change motivated by Substrait compatibility. This could cause issues for any functionality that relies on the binary type. This change should be conditional to avoid breaking existing behavior.
| // return TYPE_FACTORY.createUDT(ExprUDT.EXPR_BINARY, nullable); | |
| } else if (fieldType.legacyTypeName().equalsIgnoreCase("timestamp")) { | |
| return TYPE_FACTORY.createUDT(ExprUDT.EXPR_BINARY, nullable); |
| return TYPE_FACTORY.createSqlType(SqlTypeName.BIGINT, nullable); | ||
| // return TYPE_FACTORY.createUDT(ExprUDT.EXPR_IP, nullable); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The legacy ip data type is also being mapped to BIGINT. This is a breaking change for the same reasons as the IP enum case. This should be reverted to use the EXPR_IP UDT to maintain compatibility.
| return TYPE_FACTORY.createSqlType(SqlTypeName.BIGINT, nullable); | |
| // return TYPE_FACTORY.createUDT(ExprUDT.EXPR_IP, nullable); | |
| return TYPE_FACTORY.createUDT(ExprUDT.EXPR_IP, nullable); |
| resolutionStrategy.force 'org.apache.calcite.avatica:avatica-core:1.26.0' | ||
| resolutionStrategy.force 'org.slf4j:slf4j-api:2.0.13' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Signed-off-by: Sandesh Kumar <sandeshkr419@gmail.com>
Description
Version forced to 3.3.0.
Testing:
Related Issues
Resolves #[Issue number to be closed when this PR is merged]
Check List
--signoffor-s.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.