Skip to content

Comments

Forward-merge main into pandas3#21509

Open
AyodeAwe wants to merge 8 commits intopandas3from
main
Open

Forward-merge main into pandas3#21509
AyodeAwe wants to merge 8 commits intopandas3from
main

Conversation

@AyodeAwe
Copy link
Contributor

Forward-merge triggered by automated cron job to keep pandas3 up-to-date with main.

If this PR has conflicts, it will remain open for manual resolution.

See forward-merger docs for more info.

Follow-up to #21464 

Removes some testing configuration left behind in that PR.

Authors:
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - Kyle Edwards (https://github.com/KyleFromNVIDIA)

URL: #21494
@AyodeAwe AyodeAwe requested a review from a team as a code owner February 20, 2026 18:00
@AyodeAwe AyodeAwe requested review from bdice and removed request for a team February 20, 2026 18:00
@AyodeAwe
Copy link
Contributor Author

FAILURE - Unable to forward-merge automatically, manual merge is necessary.

cc @Matt711 @galipremsagar @mroeschke

Do not use the Resolve conflicts option in this PR. Follow these instructions: https://docs.rapids.ai/maintainers/forward-merger/

IMPORTANT: When merging this PR, do not use the auto-merger (i.e. the /merge comment). Instead, an admin must manually merge by changing the merging strategy to Create a Merge Commit. Otherwise, history will be lost and the branches become incompatible.

Follows up #21469 to handle the stricter dtype validation at array creation time introduced in cupy 14.

Ops-Bot-Merge-Barrier: true

Authors:
  - Matthew Murray (https://github.com/Matt711)

Approvers:
  - Tom Augspurger (https://github.com/TomAugspurger)
  - Bradley Dice (https://github.com/bdice)
  - James Lamb (https://github.com/jameslamb)

URL: #21504
@rapids-bot rapids-bot bot requested a review from a team as a code owner February 20, 2026 18:27
@rapids-bot rapids-bot bot requested review from brandon-b-miller and removed request for a team February 20, 2026 18:27
@github-actions github-actions bot added Python Affects Python cuDF API. cudf.pandas Issues specific to cudf.pandas labels Feb 20, 2026
@GPUtester GPUtester moved this to In Progress in cuDF Python Feb 20, 2026
…argument. (#21503)

Handle the breaking change introduced in rapidsai/rapidsmpf#871

Authors:
  - Mads R. B. Kristensen (https://github.com/madsbk)
  - Tom Augspurger (https://github.com/TomAugspurger)

Approvers:
  - Matthew Murray (https://github.com/Matt711)

URL: #21503
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@github-actions github-actions bot added the cudf-polars Issues specific to cudf-polars label Feb 20, 2026
This has not been used almost anywhere

Authors:
  - Michael Schellenberger Costa (https://github.com/miscco)
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - David Wendt (https://github.com/davidwendt)
  - Shruti Shivakumar (https://github.com/shrshi)

URL: #21477
@rapids-bot rapids-bot bot requested a review from a team as a code owner February 21, 2026 00:22
@rapids-bot rapids-bot bot requested review from kingcrimsontianyu and removed request for a team February 21, 2026 00:22
@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Feb 21, 2026
PointKernel and others added 3 commits February 21, 2026 00:31
Close #21512

This PR fixes a misaligned memory access bug. Previously, we aligned data to 8-byte boundaries, which causes issues for 16-byte types such as `decimal128` that require 16-byte alignment. The fix updates the alignment to 16 bytes.

Note that this change may introduce additional padding, but the overall padding overhead is negligible compared to the usable data.

Authors:
  - Yunsong Wang (https://github.com/PointKernel)

Approvers:
  - Vyas Ramasubramani (https://github.com/vyasr)
  - Bradley Dice (https://github.com/bdice)
  - David Wendt (https://github.com/davidwendt)

URL: #21513
This PR attempts to collect the common bits of logic used by the various ColumnBase subclasses' `find_and_replace` implementations. I noticed some of this duplicated code while working on other refactorings and collected the changes together here.

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - Matthew Roeschke (https://github.com/mroeschke)

URL: #21500
…21410)

### Summary
- Refactors join benchmark input table generation to produce deterministic results across runs
- Moves `generate_input_tables` implementation from header to separate `.cu` file
- Replaces random sampling with deterministic `thrust::tabulate` + `thrust::shuffle` approach

### Changes

**`cpp/benchmarks/common/generate_input.cu`**
- Rewrote `create_distinct_rows_column` for numeric types to use `cudf::sequence` followed by `thrust::shuffle` instead of random sampling
- This ensures unique values are generated deterministically given the same seed

**`cpp/benchmarks/join/generate_input_tables.cu`** (new file)
- Moved implementation from header file
- Build table gather map: uses `thrust::tabulate` with modulo to cycle through unique keys, then shuffles with fixed seed (12345)
- Probe table gather map: uses `thrust::tabulate` to assign matching keys (cycling through unique build keys) for first `selectivity * probe_rows` entries, non-matching keys for the rest, then shuffles with fixed seed (67890)

**`cpp/benchmarks/join/generate_input_tables.cuh`**
- Reduced to declarations only (moved CUDA kernels and implementation to `.cu` file)

**`cpp/benchmarks/CMakeLists.txt`**
- Added `join/generate_input_tables.cu` to the build

Authors:
  - Shruti Shivakumar (https://github.com/shrshi)

Approvers:
  - Yunsong Wang (https://github.com/PointKernel)
  - David Wendt (https://github.com/davidwendt)
  - Bradley Dice (https://github.com/bdice)

URL: #21410
@rapids-bot rapids-bot bot requested a review from a team as a code owner February 21, 2026 02:09
@github-actions github-actions bot added the CMake CMake build issue label Feb 21, 2026
Towards #21229

One of the 2 large changes to natively support pandas extension types in cuDF now possible that we consistently use the `ColumnBase.create` API to preserve pandas extension types - `dtype` arguments will now pass through extension types instead of coercing them to numpy types in the `def dtype` function.

* Some changes were needed in `DatetimeTZColumn` to be more accommodating to `pandas.ArrowDtype` to pass the cudf tests suite.
* IIRC Dask, by default, will try to use `pandas.StringDtype(storage="pyarrow")` type if pyarrow is installed even with pandas < 2. I turned off this feature in some tests, as what is already done in other tests, and I expect we should be able to remove this with pandas 3 support when that string type is the default.
* The added tests to `conftest-patch.py` appear to be largely due to column APIs not entire resolving the resulting dtype correctly still (like `DatetimeColumn.strftime`. Those improvement can be in a follow up.

The next change will be to preserve input data that are pandas objects with extension types.

Authors:
  - Matthew Roeschke (https://github.com/mroeschke)

Approvers:
  - Lawrence Mitchell (https://github.com/wence-)
  - GALI PREM SAGAR (https://github.com/galipremsagar)

URL: #21499
@rapids-bot rapids-bot bot requested a review from a team as a code owner February 21, 2026 05:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CMake CMake build issue cudf.pandas Issues specific to cudf.pandas cudf-polars Issues specific to cudf-polars libcudf Affects libcudf (C++/CUDA) code. Python Affects Python cuDF API.

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

9 participants