-
Notifications
You must be signed in to change notification settings - Fork 793
Open
Labels
questionCategorizes issue or PR as a support question.Categorizes issue or PR as a support question.
Description
Hello,
I've been using MPS on bare metal to optimize the gpu usage and performance as we've seen that even for a single process, having an MPS server leads to better performances than not having it.
I wanted to replicate the same on my kubernetes cluster using this configuration:
driver:
enabled: false
toolkit:
enabled: true
cdi:
enabled: false
nfd:
enabled: true
gfd:
enabled: true
migManager:
enabled: false
devicePlugin:
enabled: true
config:
name: device-plugin-config
create: true
default: default
data:
default: |-
version: v1
flags:
migStrategy: none
failOnInitError: true
rtx-2080-ti: |-
version: v1
sharing:
mps:
resources:
- name: nvidia.com/gpu
replicas: 1
But after I label the node with the rtx-2080-ti configuration the gpu-feature-discovery and nvidia-device-plugin-daemonset pods fail due to this error:
I1203 12:02:32.316208 193 main.go:163] Starting OS watcher.
I1203 12:02:32.316516 193 main.go:168] Loading configuration.
I1203 12:02:32.317279 193 main.go:160] Exiting
E1203 12:02:32.317308 193 main.go:127] unable to load config: unable to finalize config: unable to parse config file: error parsing config file: unmarshal error: error unmarshaling JSON: while decoding JSON: number of replicas must be >= 2
Why is not allowed to set MPS replicas to 1? Is there a way to deploy MPS without splitting the GPU in multiple replicas?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
questionCategorizes issue or PR as a support question.Categorizes issue or PR as a support question.