Skip to content

Conversation

@400Ping
Copy link

@400Ping 400Ping commented Dec 22, 2025

Purpose of PR

  • Add an pinned host buffer pool and wire it into the dual-stream pipeline so each chunk uses double-buffered pinned staging before H2D copies (reduces malloc/free and GPU idle).

Related Issues or PRs

Closes #703

Changes Made

  • Bug fix
  • New feature
  • Refactoring
  • Documentation
  • Test
  • CI/CD pipeline
  • Other

Breaking Changes

  • Yes
  • No

Checklist

  • Added or updated unit tests for all changes
  • Added or updated documentation for all changes
  • Successfully built and ran all unit tests or manual tests locally
  • PR title follows "MAHOUT-XXX: Brief Description" format (if related to an issue)
  • Code follows ASF guidelines

@400Ping 400Ping marked this pull request as draft December 22, 2025 13:39
@400Ping 400Ping marked this pull request as ready for review December 22, 2025 13:51
@400Ping
Copy link
Author

400Ping commented Dec 22, 2025

@400Ping 400Ping changed the title [QDP] Double-buffered async I/O for read_parquet_batch [QDP] Pinned host buffer + dual-stream event pipeline to overlap copy and compute Dec 22, 2025
@rich7420
Copy link
Contributor

rich7420 commented Dec 23, 2025

Thanks @400Ping for the patch!

  1. What's the reason you define a ffi function again?
  2. some tests failed locally due to tensor shape problem or you should fix test for excepted output.

@400Ping
Copy link
Author

400Ping commented Dec 23, 2025

Thanks @400Ping for the patch!

  1. What's the reason you define a ffi function again?
  2. some tests failed locally due to tensor shape problem or you should fix test for excepted output.

My bad just fixed it.

@400Ping 400Ping marked this pull request as draft December 24, 2025 07:54
@400Ping 400Ping marked this pull request as ready for review December 25, 2025 10:18
Copy link
Contributor

@rich7420 rich7420 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@400Ping thanks for the patch!
left some comments

@rich7420 rich7420 marked this pull request as draft December 25, 2025 13:24
@400Ping 400Ping changed the title [QDP] Pinned host buffer + dual-stream event pipeline to overlap copy and compute [QDP] Double-buffered pinned I/O pipeline and faster Parquet decode Dec 25, 2025
@400Ping 400Ping marked this pull request as ready for review December 25, 2025 22:51
@rich7420
Copy link
Contributor

I think maybe we could add some unit tests for this.

@ryankert01
Copy link
Contributor

We have 2 improvement in this PR. Based on the benchmark result, I'm speculating if there's one of them are not contributing to the speed improvement. What's your experience?

Signed-off-by: 400Ping <fourhundredping@gmail.com>
Signed-off-by: 400Ping <fourhundredping@gmail.com>
Signed-off-by: 400Ping <fourhundredping@gmail.com>
Signed-off-by: 400Ping <fourhundredping@gmail.com>
Signed-off-by: 400Ping <fourhundredping@gmail.com>
Signed-off-by: 400Ping <fourhundredping@gmail.com>
Signed-off-by: 400Ping <fourhundredping@gmail.com>
Signed-off-by: 400Ping <fourhundredping@gmail.com>
Signed-off-by: 400Ping <fourhundredping@gmail.com>
This reverts commit 3556b5a.
Signed-off-by: 400Ping <fourhundredping@gmail.com>
@400Ping
Copy link
Author

400Ping commented Dec 29, 2025

We have 2 improvement in this PR. Based on the benchmark result, I'm speculating if there's one of them are not contributing to the speed improvement. What's your experience?

I think both have improvements, for the second one is what @rich7420 and @guan404ming suggested to change a different decompression technique to improve its performance. But I think overall it is because of the first one improving the speed improvements.

Signed-off-by: 400Ping <fourhundredping@gmail.com>
@400Ping
Copy link
Author

400Ping commented Dec 29, 2025

Just tested, the second one doesn't improve much performance, going to remove it.

Signed-off-by: 400Ping <fourhundredping@gmail.com>
Signed-off-by: 400Ping <fourhundredping@gmail.com>
@400Ping 400Ping requested a review from rich7420 December 29, 2025 14:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants