Skip to content

Setting MPS replicas to 1 #1548

@santurini

Description

@santurini

Hello,
I've been using MPS on bare metal to optimize the gpu usage and performance as we've seen that even for a single process, having an MPS server leads to better performances than not having it.

I wanted to replicate the same on my kubernetes cluster using this configuration:

driver:
  enabled: false
toolkit:
  enabled: true
cdi:
  enabled: false
nfd:
  enabled: true
gfd:
  enabled: true
migManager:
  enabled: false
devicePlugin:
  enabled: true
  config:
    name: device-plugin-config
    create: true
    default: default
    data:
      default: |-
        version: v1
        flags:
          migStrategy: none
          failOnInitError: true
      rtx-2080-ti: |-
        version: v1
        sharing:
          mps:
            resources:
            - name: nvidia.com/gpu
              replicas: 1

But after I label the node with the rtx-2080-ti configuration the gpu-feature-discovery and nvidia-device-plugin-daemonset pods fail due to this error:

I1203 12:02:32.316208     193 main.go:163] Starting OS watcher.
I1203 12:02:32.316516     193 main.go:168] Loading configuration.
I1203 12:02:32.317279     193 main.go:160] Exiting
E1203 12:02:32.317308     193 main.go:127] unable to load config: unable to finalize config: unable to parse config file: error parsing config file: unmarshal error: error unmarshaling JSON: while decoding JSON: number of replicas must be >= 2

Why is not allowed to set MPS replicas to 1? Is there a way to deploy MPS without splitting the GPU in multiple replicas?

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionCategorizes issue or PR as a support question.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions