-
Notifications
You must be signed in to change notification settings - Fork 11
PytatoPyOpenCLArrayContext: use SVM allocator if available, limit arg size for GPUs #189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
156e369 to
ba7c205
Compare
inducer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good generally. Two style nits below.
Two questions:
- Does it work?
- How come it's marked as draft?
arraycontext/impl/pytato/compile.py
Outdated
|
|
||
| with ProcessLogger(logger, f"generate_loopy for '{prg_id}'"): | ||
| import pyopencl as cl | ||
| dev = self.actx.context.devices[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Safer to get it from the queue, which only has one device.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 1e54ce4
arraycontext/impl/pytato/compile.py
Outdated
| and cl.characterize.has_coarse_grain_buffer_svm(dev)): | ||
| limit = dev.max_parameter_size | ||
| # Leave some extra space since our sizes are estimates | ||
| target = lp.PyOpenCLTarget(limit_arg_size_nbytes=limit//2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, upon second thought: We can only pass this if we're sure that the memory allocated is actually SVM. So this has to get involved with memory pool creation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we do this here or as part of inducer/loopy#642 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can tell, yes.
This still needs the other PRs to be merged first I think. |
40c3821 to
6e912a9
Compare
|
This is ready for another review @inducer |
|
Thanks! |
Needs:
LoopyPyOpenCLTarget: pass through loopy.PyOpenCLTarget pytato#359(maybe not needed)PyOpenCL target: Overflow large argument counts into SVM struct loopy#642