Conversation
Follow-up to #21464 Removes some testing configuration left behind in that PR. Authors: - James Lamb (https://github.com/jameslamb) Approvers: - Kyle Edwards (https://github.com/KyleFromNVIDIA) URL: #21494
Contributor
Author
|
FAILURE - Unable to forward-merge automatically, manual merge is necessary. cc @Matt711 @galipremsagar @mroeschke Do not use the IMPORTANT: When merging this PR, do not use the auto-merger (i.e. the |
Follows up #21469 to handle the stricter dtype validation at array creation time introduced in cupy 14. Ops-Bot-Merge-Barrier: true Authors: - Matthew Murray (https://github.com/Matt711) Approvers: - Tom Augspurger (https://github.com/TomAugspurger) - Bradley Dice (https://github.com/bdice) - James Lamb (https://github.com/jameslamb) URL: #21504
…argument. (#21503) Handle the breaking change introduced in rapidsai/rapidsmpf#871 Authors: - Mads R. B. Kristensen (https://github.com/madsbk) - Tom Augspurger (https://github.com/TomAugspurger) Approvers: - Matthew Murray (https://github.com/Matt711) URL: #21503
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
This has not been used almost anywhere Authors: - Michael Schellenberger Costa (https://github.com/miscco) - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - David Wendt (https://github.com/davidwendt) - Shruti Shivakumar (https://github.com/shrshi) URL: #21477
Close #21512 This PR fixes a misaligned memory access bug. Previously, we aligned data to 8-byte boundaries, which causes issues for 16-byte types such as `decimal128` that require 16-byte alignment. The fix updates the alignment to 16 bytes. Note that this change may introduce additional padding, but the overall padding overhead is negligible compared to the usable data. Authors: - Yunsong Wang (https://github.com/PointKernel) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) - Bradley Dice (https://github.com/bdice) - David Wendt (https://github.com/davidwendt) URL: #21513
This PR attempts to collect the common bits of logic used by the various ColumnBase subclasses' `find_and_replace` implementations. I noticed some of this duplicated code while working on other refactorings and collected the changes together here. Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Matthew Roeschke (https://github.com/mroeschke) URL: #21500
…21410) ### Summary - Refactors join benchmark input table generation to produce deterministic results across runs - Moves `generate_input_tables` implementation from header to separate `.cu` file - Replaces random sampling with deterministic `thrust::tabulate` + `thrust::shuffle` approach ### Changes **`cpp/benchmarks/common/generate_input.cu`** - Rewrote `create_distinct_rows_column` for numeric types to use `cudf::sequence` followed by `thrust::shuffle` instead of random sampling - This ensures unique values are generated deterministically given the same seed **`cpp/benchmarks/join/generate_input_tables.cu`** (new file) - Moved implementation from header file - Build table gather map: uses `thrust::tabulate` with modulo to cycle through unique keys, then shuffles with fixed seed (12345) - Probe table gather map: uses `thrust::tabulate` to assign matching keys (cycling through unique build keys) for first `selectivity * probe_rows` entries, non-matching keys for the rest, then shuffles with fixed seed (67890) **`cpp/benchmarks/join/generate_input_tables.cuh`** - Reduced to declarations only (moved CUDA kernels and implementation to `.cu` file) **`cpp/benchmarks/CMakeLists.txt`** - Added `join/generate_input_tables.cu` to the build Authors: - Shruti Shivakumar (https://github.com/shrshi) Approvers: - Yunsong Wang (https://github.com/PointKernel) - David Wendt (https://github.com/davidwendt) - Bradley Dice (https://github.com/bdice) URL: #21410
Towards #21229 One of the 2 large changes to natively support pandas extension types in cuDF now possible that we consistently use the `ColumnBase.create` API to preserve pandas extension types - `dtype` arguments will now pass through extension types instead of coercing them to numpy types in the `def dtype` function. * Some changes were needed in `DatetimeTZColumn` to be more accommodating to `pandas.ArrowDtype` to pass the cudf tests suite. * IIRC Dask, by default, will try to use `pandas.StringDtype(storage="pyarrow")` type if pyarrow is installed even with pandas < 2. I turned off this feature in some tests, as what is already done in other tests, and I expect we should be able to remove this with pandas 3 support when that string type is the default. * The added tests to `conftest-patch.py` appear to be largely due to column APIs not entire resolving the resulting dtype correctly still (like `DatetimeColumn.strftime`. Those improvement can be in a follow up. The next change will be to preserve input data that are pandas objects with extension types. Authors: - Matthew Roeschke (https://github.com/mroeschke) Approvers: - Lawrence Mitchell (https://github.com/wence-) - GALI PREM SAGAR (https://github.com/galipremsagar) URL: #21499
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Forward-merge triggered by automated cron job to keep
pandas3up-to-date withmain.If this PR has conflicts, it will remain open for manual resolution.
See forward-merger docs for more info.