Skip to content

Conversation

@fsi-yuvraj
Copy link

@fsi-yuvraj fsi-yuvraj commented May 22, 2025

What changes were proposed in this pull request?

JIRA : https://issues.apache.org/jira/browse/ATLAS-5032

Background:

When performing a search using long entity names with attributes like qualifiedName and the StartsWith operator, results are not returned as expected.

Root Cause:

The qualifiedName attribute is an indexed key. However, Solr's default standard tokenizer has a maximum token length of 255 characters. When entity names exceed this length, the tokenizer fails to parse the value correctly, leading to search failures.

Changes Proposed:

Approach 1 : Update Solr Configuration to Increase Max Token Length

  • Modify the Solr schema to increase the maxTokenLength option.
  • This allows Solr to properly tokenize field values as per set length.
  • Impact:
    -- Requires full reindexing of all existing data to apply the new schema.

[Existing PR] Approach 2 : Approach 2: Handle Long Value Search by Querying JanusGraph

  • For long value searches on indexed keys, query to janusgraph instead of solr
  • Impact:
    -- This will affect performance as query is executed at janus

How was this patch tested?

@fsi-yuvraj fsi-yuvraj changed the title ATLAS-5032: Fix basic search when querying by long qualifiedName attr… ATLAS-5032: Fix basic search when querying by long attribute values May 22, 2025

for (AtlasStructType structType : structTypes) {
String qualifiedName = structType.getVertexPropertyName(criteria.getAttributeName());
if (isIndexSearchable(criteria, structType)) {
Copy link
Contributor

@chaitalicod chaitalicod May 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fix was tested for how many hive_columns as for a hive_table for multiple hive_tables with this huge qualifiedname and name using basic search ?

Copy link
Contributor

@aditya-gupta36 aditya-gupta36 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes looks good to me, falls back to graph queries when index-based search isn’t safe due to length limits of qualifiedName

public static final String INDEX_SEARCH_MAX_RESULT_SET_SIZE = "atlas.graph.index.search.max-result-set-size";
public static final String INDEX_SEARCH_TYPES_MAX_QUERY_STR_LENGTH = "atlas.graph.index.search.types.max-query-str-length";
public static final String INDEX_SEARCH_TAGS_MAX_QUERY_STR_LENGTH = "atlas.graph.index.search.tags.max-query-str-length";
public static final String INDEX_SEARCH_SOLR_MAX_TOKEN_LENGTH = "atlas.graph.solr.index.search.max-token-length";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adds a new configurable constant tied to Solr’s default maxTokenLen (255).

Useful for operators (STARTS_WITH, ENDS_WITH, CONTAINS) that can’t safely be processed via the index when values exceed the token length.

Why this helps: avoids Solr truncation and allows search to fall back to the graph safely on long values.

LOG.debug("{} operator found for string (TEXT) attribute {} and special characters found in filter value {}, deferring to in-memory or graph query (might cause poor performance)", operator, qualifiedName, attributeValue);

ret = false;
} else if ((operator == SearchParameters.Operator.STARTS_WITH || operator == SearchParameters.Operator.ENDS_WITH || operator == SearchParameters.Operator.CONTAINS) && attributeValue.length() > SOLR_MAX_TOKEN_STR_LENGTH) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bypasses index for these operators when value length exceeds the limit.

Falls back to a graph traversal!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants