Add native asyncio S3FS and Spark cursors with boilerplate deduplication (Phase 3)#668
Merged
laughingman7743 merged 5 commits intomasterfrom Feb 21, 2026
Merged
Conversation
Phase 3 of native asyncio cursor implementation: Part 1 - Boilerplate deduplication: - Extract shared properties, lifecycle methods, sync fetch, and async protocol into AioCursorBase (aio/base.py) - Refactor AioCursor, AioPandasCursor, AioArrowCursor, AioPolarsCursor to extend AioCursorBase, reducing ~520 lines of duplicated code Part 2 - AioS3FSCursor: - Lightweight async CSV cursor using S3FileSystem - Async fetch methods (via asyncio.to_thread) since S3FS uses lazy streaming from S3 Part 3 - AioSparkCursor: - AioSparkBaseCursor overrides post-init I/O with async equivalents (poll, cancel, terminate_session, read_s3_file) - AioSparkCursor for executing PySpark code asynchronously - Session init stays sync (wrapped in asyncio.to_thread at creation) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move the shared SQL cursor mixin from aio/base.py into aio/common.py and rename from AioCursorBase to WithAsyncFetch to follow the existing WithXXX naming convention (WithResultSet, WithCalculationExecution) where XXX describes the functionality provided. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Unlike sync Spark (SparkCursor + AsyncSparkCursor sharing SparkBaseCursor), the aio side has only AioSparkCursor, so a separate base class is unnecessary. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Native asyncio cursors provide direct control flow via await, making the on_start_query_execution callback unnecessary. This aligns with AsyncCursor which also omits this callback. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Change _calculate exception from OperationalError to DatabaseError to match sync SparkBaseCursor behavior - Add retry logic to _read_s3_file_as_text via async_retry_api_call to match sync version's retry_api_call usage - Remove incorrect # type: ignore[override] on name-mangled __poll - Add test_async_iterator for AioS3FSCursor - Add test_executemany and test_context_manager for AioSparkCursor - Move runtime import to top-level in S3FS test file Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This was referenced Feb 21, 2026
laughingman7743
added a commit
that referenced
this pull request
Feb 21, 2026
Add comprehensive documentation for the native asyncio cursor implementations added in PRs #666, #667, #668. This includes a new docs/aio.md overview page, AioCursor sections in each specialized cursor page, API reference for the aio module, and an async example in the README. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This was referenced Feb 21, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Phase 3 of native asyncio cursor implementation (#662), building on Phase 2 (PR #667).
Part 1: Boilerplate Deduplication —
WithAsyncFetchmixinExtract shared boilerplate from 4 existing aio SQL cursors into a
WithAsyncFetchmixin inpyathena/aio/common.py:arraysize(getter/setter),result_set(getter/setter),query_id(getter/setter),rownumber,rowcountclose(),executemany(),cancel()fetchone(),fetchmany(),fetchall()— for cursors that load data eagerly in__init____aiter__,__anext__,__aenter__,__aexit__Net reduction of ~520 lines of duplicated code across
AioCursor,AioPandasCursor,AioArrowCursor,AioPolarsCursor.Part 2:
AioS3FSCursorNew
pyathena/aio/s3fs/cursor.py— async CSV cursor using S3FileSystem.Unlike other aio cursors (Arrow/Pandas/Polars) that load data eagerly,
AthenaS3FSResultSetlazily streams rows from S3 via a CSV reader. Therefore:asyncio.to_thread(AthenaS3FSResultSet, ...)asyncio.to_thread(result_set.fetch*, ...)— reads from S3 on each callS3FSCursor)Part 3:
AioSparkCursorNew
pyathena/aio/spark/cursor.py— async PySpark code execution.SparkBaseCursorandWithCalculationExecutiondirectly (no intermediate base class — aio only has one Spark cursor variant)_poll,_cancel,_terminate_session,_read_s3_file_as_text,_calculate) overridden with async equivalents__init__(runs insideasyncio.to_threadat cursor creation)_read_s3_file_as_textusesasync_retry_api_callfor retry consistency with sync versionAdditional changes
on_start_query_executioncallback from all aio cursors — unnecessary in async context whereawaitprovides direct control flow. Consistent withAsyncCursorwhich also omits this callback.Class hierarchy
Files changed
pyathena/aio/common.pyWithAsyncFetchmixinpyathena/aio/cursor.pyWithAsyncFetch, remove boilerplatepyathena/aio/pandas/cursor.pyWithAsyncFetch, remove boilerplatepyathena/aio/arrow/cursor.pyWithAsyncFetch, remove boilerplatepyathena/aio/polars/cursor.pyWithAsyncFetch, remove boilerplatepyathena/aio/s3fs/__init__.pypyathena/aio/s3fs/cursor.pyAioS3FSCursorpyathena/aio/spark/__init__.pypyathena/aio/spark/cursor.pyAioSparkCursortests/pyathena/aio/conftest.pytests/pyathena/aio/s3fs/tests/pyathena/aio/spark/tests/pyathena/aio/test_cursor.pyTest plan
make fmt— formatting cleanmake chk— lint + mypy cleanAioS3FSCursortests: fetchone/many/all, async iterator, description, cancel, executemany, arraysize, context managerAioSparkCursortests: spark_dataframe, spark_sql, failed, cancel, executemany, context managerRelated issues
🤖 Generated with Claude Code