Skip to content

Comments

Preserve pandas nullable types in dtype= arguments#21499

Merged
rapids-bot[bot] merged 20 commits intorapidsai:mainfrom
mroeschke:ref/cudf/pandas_nullable_in_dtype
Feb 21, 2026
Merged

Preserve pandas nullable types in dtype= arguments#21499
rapids-bot[bot] merged 20 commits intorapidsai:mainfrom
mroeschke:ref/cudf/pandas_nullable_in_dtype

Conversation

@mroeschke
Copy link
Contributor

@mroeschke mroeschke commented Feb 19, 2026

Description

Towards #21229

One of the 2 large changes to natively support pandas extension types in cuDF now possible that we consistently use the ColumnBase.create API to preserve pandas extension types - dtype arguments will now pass through extension types instead of coercing them to numpy types in the def dtype function.

  • Some changes were needed in DatetimeTZColumn to be more accommodating to pandas.ArrowDtype to pass the cudf tests suite.
  • IIRC Dask, by default, will try to use pandas.StringDtype(storage="pyarrow") type if pyarrow is installed even with pandas < 2. I turned off this feature in some tests, as what is already done in other tests, and I expect we should be able to remove this with pandas 3 support when that string type is the default.
  • The added tests to conftest-patch.py appear to be largely due to column APIs not entire resolving the resulting dtype correctly still (like DatetimeColumn.strftime. Those improvement can be in a follow up.

The next change will be to preserve input data that are pandas objects with extension types.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@mroeschke mroeschke self-assigned this Feb 19, 2026
@mroeschke mroeschke added the Python Affects Python cuDF API. label Feb 19, 2026
@mroeschke mroeschke requested a review from a team as a code owner February 19, 2026 23:31
@mroeschke mroeschke added improvement Improvement / enhancement to an existing function breaking Breaking change labels Feb 19, 2026
@GPUtester GPUtester moved this to In Progress in cuDF Python Feb 19, 2026
@mroeschke mroeschke requested a review from a team as a code owner February 20, 2026 01:51
@github-actions github-actions bot added the cudf.pandas Issues specific to cudf.pandas label Feb 20, 2026
Copy link
Contributor

@galipremsagar galipremsagar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!!

@mroeschke
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit 26153ca into rapidsai:main Feb 21, 2026
108 checks passed
@github-project-automation github-project-automation bot moved this from In Progress to Done in cuDF Python Feb 21, 2026
@mroeschke mroeschke deleted the ref/cudf/pandas_nullable_in_dtype branch February 21, 2026 05:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

breaking Breaking change cudf.pandas Issues specific to cudf.pandas improvement Improvement / enhancement to an existing function Python Affects Python cuDF API.

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants