-
Notifications
You must be signed in to change notification settings - Fork 900
ATLAS-5032: Fix basic search when querying by long attribute values #356
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
|
||
| for (AtlasStructType structType : structTypes) { | ||
| String qualifiedName = structType.getVertexPropertyName(criteria.getAttributeName()); | ||
| if (isIndexSearchable(criteria, structType)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fix was tested for how many hive_columns as for a hive_table for multiple hive_tables with this huge qualifiedname and name using basic search ?
aditya-gupta36
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes looks good to me, falls back to graph queries when index-based search isn’t safe due to length limits of qualifiedName
| public static final String INDEX_SEARCH_MAX_RESULT_SET_SIZE = "atlas.graph.index.search.max-result-set-size"; | ||
| public static final String INDEX_SEARCH_TYPES_MAX_QUERY_STR_LENGTH = "atlas.graph.index.search.types.max-query-str-length"; | ||
| public static final String INDEX_SEARCH_TAGS_MAX_QUERY_STR_LENGTH = "atlas.graph.index.search.tags.max-query-str-length"; | ||
| public static final String INDEX_SEARCH_SOLR_MAX_TOKEN_LENGTH = "atlas.graph.solr.index.search.max-token-length"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adds a new configurable constant tied to Solr’s default maxTokenLen (255).
Useful for operators (STARTS_WITH, ENDS_WITH, CONTAINS) that can’t safely be processed via the index when values exceed the token length.
Why this helps: avoids Solr truncation and allows search to fall back to the graph safely on long values.
| LOG.debug("{} operator found for string (TEXT) attribute {} and special characters found in filter value {}, deferring to in-memory or graph query (might cause poor performance)", operator, qualifiedName, attributeValue); | ||
|
|
||
| ret = false; | ||
| } else if ((operator == SearchParameters.Operator.STARTS_WITH || operator == SearchParameters.Operator.ENDS_WITH || operator == SearchParameters.Operator.CONTAINS) && attributeValue.length() > SOLR_MAX_TOKEN_STR_LENGTH) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bypasses index for these operators when value length exceeds the limit.
Falls back to a graph traversal!
What changes were proposed in this pull request?
JIRA : https://issues.apache.org/jira/browse/ATLAS-5032
Background:
When performing a search using long entity names with attributes like qualifiedName and the StartsWith operator, results are not returned as expected.
Root Cause:
The qualifiedName attribute is an indexed key. However, Solr's default standard tokenizer has a maximum token length of 255 characters. When entity names exceed this length, the tokenizer fails to parse the value correctly, leading to search failures.
Changes Proposed:
Approach 1 : Update Solr Configuration to Increase Max Token Length
-- Requires full reindexing of all existing data to apply the new schema.
[Existing PR] Approach 2 : Approach 2: Handle Long Value Search by Querying JanusGraph
-- This will affect performance as query is executed at janus
How was this patch tested?