Implement lazy loading for inline Arrow results #1029

jayantsing-db · 2025-09-30T08:55:33Z

Description

This PR introduces lazy loading support for inline Arrow results to improve memory efficiency when handling large result sets.

Previously, InlineChunkProvider would eagerly fetch all arrow batches upfront when results had hasMoreRows = true, which could lead to memory issues with large datasets. This change splits the handling into two separate paths:

Lazy path (new): For Thrift-based inline Arrow results (when ARROW_BASED_SET is returned), we now use LazyThriftInlineArrowResult which fetches arrow batches on-demand as the client iterates through rows. This is similar to how LazyThriftResult works for columnar data.
Remote path (existing): For URL-based Arrow results (URL_BASED_SET), we continue using ArrowStreamResult with RemoteChunkProvider which downloads chunks from cloud storage.

The InlineChunkProvider is now only used for SEA results with JSON_ARRAY format and INLINE disposition (contain all data inline {no hasMoreRows flag set}).

This will reduce memory consumption and improve performance when dealing with large inline Arrow result sets similar to #975.

Testing

Unit tests
Integration tests
Manual testing

Additional Notes to the Reviewer

This PR introduces lazy loading support for inline Arrow results to improve memory efficiency when handling large result sets. Previously, InlineChunkProvider would eagerly fetch all arrow batches upfront when results had hasMoreRows = true, which could lead to memory issues with large datasets. This change splits the handling into two separate paths: 1. Lazy path (new): For Thrift-based inline Arrow results (when ARROW_BASED_SET is returned), we now use LazyThriftInlineArrowResult which fetches arrow batches on-demand as the client iterates through rows. This is similar to how LazyThriftResult works for columnar data. 2. Remote path (existing): For URL-based Arrow results (URL_BASED_SET), we continue using ArrowStreamResult with RemoteChunkProvider which downloads chunks from cloud storage. The InlineChunkProvider is now only used for SEA results with JSON_ARRAY format and INLINE disposition (contain all data inline {no hasMoreRows flag set}). This should reduce memory consumption and improve performance when dealing with large inline Arrow result sets.

jayantsing-db · 2025-09-30T08:57:35Z

I need to make some changes related to JDBC spec around row count because we don't have that data point when lazily fetching the results.

github-actions · 2025-10-31T06:41:28Z

This PR has been marked as Stale because it has been open for 30 days with no activity. If you would like the PR to remain open, please remove the stale label or comment on the PR.

github-actions · 2025-11-07T06:42:03Z

This PR was closed because it has been inactive for 7 days since being marked as stale.

src/main/java/com/databricks/jdbc/api/impl/arrow/LazyThriftInlineArrowResult.java

gopalldb · 2025-12-03T11:15:03Z

src/main/java/com/databricks/jdbc/api/impl/arrow/LazyThriftInlineArrowResult.java

+
+    // Check if we've reached the maxRows limit
+    boolean hasRowLimit = maxRows > 0;
+    if (hasRowLimit && globalRowIndex + 1 >= maxRows) {


so globalRowIndex 0 means 1st row, and maxRows -1 would mean last row.

So, at last row, globalRowIndex +1 = maxRows, which means no more rows. We can just check == instead of >=, any reason you are doing that?

You are right, we don't strictly need >= strictly (was being defensive).

gopalldb · 2025-12-03T11:19:17Z

src/main/java/com/databricks/jdbc/api/impl/arrow/LazyThriftInlineArrowResult.java

+    return null;
+  }
+
+  private Schema hiveSchemaToArrowSchema(TTableSchema hiveSchema)


We are not getting arrowSchema for inline arrow result?

I haven't looked into this in detail. Arrow parsing is ported as is in this PR (no changes from existing code. The methods were moved from InlineChunkProvider which was the previous class handling this). The PR only changes data download. Let me check this separately.

src/main/java/com/databricks/jdbc/api/impl/arrow/LazyThriftInlineArrowResult.java

gopalldb · 2025-12-03T11:28:45Z

src/main/java/com/databricks/jdbc/api/impl/arrow/ArrowStreamResult.java

+      if (result == null) {
+        return null;
+      }
+      ComplexDataTypeParser parser = new ComplexDataTypeParser();


do we need to create this for every getObject call?

Actually, this is just existing code from method getObject(int columnIndex) in the same class. I moved it to separate method because getObject(int columnIndex) was unreadable.

To answer the question: No we shouldn't create such objects in hot paths like getObjects. But I don't want to change the scope of this PR. Will create a separate change/

github-actions · 2026-01-03T06:43:09Z

This PR has been marked as Stale because it has been open for 30 days with no activity. If you would like the PR to remain open, please remove the stale label or comment on the PR.

…arrow-lazy

nikhilsuri-db

Do we have artefacts showing the improvement in memory usage with LazyInline Fetch?

src/main/java/com/databricks/jdbc/api/impl/arrow/LazyThriftInlineArrowResult.java

jayantsing-db · 2026-01-14T06:20:08Z

Do we have artefacts showing the improvement in memory usage with LazyInline Fetch?

The improvements are identical and follow same patterns as images in #966 (comment)

gopalldb · 2026-01-16T11:10:03Z

src/main/java/com/databricks/jdbc/api/impl/arrow/ArrowStreamResult.java

+
+    // Check if we need to convert geospatial types to string when geospatial support is disabled
+    // This check must come before the general complex type check
+    if (!isGeoSpatialSupportEnabled && isGeospatialType(requiredType)) {


Sreekanth is making changes in similar code, make sure that you don't override his changes for geospatial

jayantsing-db and others added 2 commits September 30, 2025 14:15

Merge branch 'main' into jayantsing-db/inline-arrow-lazy

a00b042

jayantsing-db requested a review from gopalldb September 30, 2025 08:57

github-actions bot added the Stale label Oct 31, 2025

github-actions bot closed this Nov 7, 2025

jayantsing-db removed the Stale label Dec 1, 2025

jayantsing-db reopened this Dec 1, 2025

gopalldb reviewed Dec 3, 2025

View reviewed changes

src/main/java/com/databricks/jdbc/api/impl/arrow/LazyThriftInlineArrowResult.java Show resolved Hide resolved

gopalldb reviewed Dec 3, 2025

View reviewed changes

src/main/java/com/databricks/jdbc/api/impl/arrow/LazyThriftInlineArrowResult.java Show resolved Hide resolved

gopalldb reviewed Dec 3, 2025

View reviewed changes

github-actions bot added the Stale label Jan 3, 2026

jayantsing-db removed the Stale label Jan 5, 2026

jayantsing-db added 2 commits January 7, 2026 22:02

Merge remote-tracking branch 'origin/main' into jayantsing-db/inline-…

84f6788

…arrow-lazy

Address review comments

155cf8c

jayantsing-db requested a review from gopalldb January 7, 2026 22:29

Fix arrow metadata

c2230c5

nikhilsuri-db reviewed Jan 9, 2026

View reviewed changes

src/main/java/com/databricks/jdbc/api/impl/arrow/LazyThriftInlineArrowResult.java Show resolved Hide resolved

Merge branch 'main' into jayantsing-db/inline-arrow-lazy

4e807a6

Address review comments and add tests

1df5d7a

gopalldb reviewed Jan 16, 2026

View reviewed changes

gopalldb approved these changes Jan 16, 2026

View reviewed changes

Implement lazy loading for inline Arrow results #1029

Are you sure you want to change the base?

Implement lazy loading for inline Arrow results #1029

Uh oh!

Conversation

jayantsing-db commented Sep 30, 2025

Description

Testing

Additional Notes to the Reviewer

Uh oh!

jayantsing-db commented Sep 30, 2025

Uh oh!

github-actions bot commented Oct 31, 2025

Uh oh!

github-actions bot commented Nov 7, 2025

Uh oh!

Uh oh!

gopalldb Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

jayantsing-db Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

gopalldb Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

jayantsing-db Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gopalldb Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

jayantsing-db Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jan 3, 2026

Uh oh!

nikhilsuri-db left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jayantsing-db commented Jan 14, 2026

Uh oh!

gopalldb Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants