-
Notifications
You must be signed in to change notification settings - Fork 828
Closed
Description
🚀 The feature, motivation and pitch
Run sampler (argmax, softmax for temperature > 0) on CUDA so that in the LLM workflow we don't have to memcpy logits to CPU and then sample.
Alternatives
No response
Additional context
No response
RFC (Optional)
No response
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels
Type
Projects
Status
Done