Skip to content

Nutanix CSI controller uses wrong UUID for VM identification #175

@wfan-epic

Description

@wfan-epic

Nutanix CSI driver v3.3.8

I'm attempting to run the Nutanix CSI driver in a k8s cluster with nodes provisioned from a PrismCentral instance but the nutanix-csi-controller pod is crash looping. Logs from the nutanix-csi-plugin container:

W1121 17:10:49.027105       8 client_config.go:614] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I1121 17:10:49.034684       8 main.go:347] Getting credentials from secret dir
I1121 17:10:49.034764       8 main.go:353] Getting V4 client params: endpoint="<PRISMCENTRAL_HOST>:9440", username="<PRISMCENTRAL_USERNAME>"
2025-11-21T17:10:49.034Z main.go:372: [INFO] Resolving node <K8S_NODE>
2025-11-21T17:10:49.037Z main.go:576: [INFO] Node: {
  "MachineId": "ddf2dbfc-f1c4-541d-0677-8b0bad522464",
  "Name": "<K8S_NODE>"
}
2025-11-21T17:10:49.037Z main.go:428: [INFO] Resolving storage topology for node using annotations: <K8S_NODE>
2025-11-21T17:10:49.039Z main.go:580: [ERROR] error getting storage topology annnotations for node: <K8S_NODE>
2025-11-21T17:10:49.039Z topology.go:133: [INFO] Get VM ddf2dbfc-f1c4-541d-0677-8b0bad522464 from Management Endpoint <PRISMCENTRAL_HOST>:9440
2025-11-21T17:10:49.039Z vm_management.go:36: [INFO] creating a new VM api client
2025-11-21 17:10:49.039 INFO - GET https://<PRISMCENTRAL_HOST>:9440/api/vmm/v4.0.b1/ahv/config/vms/ddf2dbfc-f1c4-541d-0677-8b0bad522464
2025-11-21 17:10:49.145 INFO - HTTP/1.1 404 NOT FOUND
I1121 17:10:49.145917       8 main.go:195] CSI Topology: Check for unsupported operation error, vmUuid: %!s(MISSING)
F1121 17:10:49.145937       8 main.go:150] CSI Topology: Get VM failed with error: {"data":{"error":[{"message":"Failed to perform the operation on the VM with UUID 'ddf2dbfc-f1c4-541d-0677-8b0bad522464', because it is not found.","severity":"ERROR","code":"VMM-30100","locale":"en-US","errorGroup":"VM_NOT_FOUND","argumentsMap":{"vm_uuid":"ddf2dbfc-f1c4-541d-0677-8b0bad522464"},"$objectType":"vmm.v4.error.AppMessage"}],"$errorItemDiscriminator":"List<vmm.v4.error.AppMessage>","$objectType":"vmm.v4.error.ErrorResponse"},"$dataItemDiscriminator":"vmm.v4.error.ErrorResponse"}, vmUuid: ddf2dbfc-f1c4-541d-0677-8b0bad522464

Node info:

$ kubectl get node "${K8S_NODE}" -o yaml | yq '.status.nodeInfo'
architecture: amd64
bootID: 84e13162-d8fe-4656-90cd-a123e48d4dc2
containerRuntimeVersion: containerd://2.1.5
kernelVersion: 6.12.57-talos
kubeProxyVersion: ""
kubeletVersion: v1.33.4
machineID: ddf2dbfcf1c4541d06778b0bad522464
operatingSystem: linux
osImage: Talos (v1.11.5)
systemUUID: 4797b9a5-4b93-461e-94fe-860f221f8070

Manually querying PrismCentral using .status.nodeInfo.systemUUID instead of .status.nodeInfo.machineID returns information about the VM as expected.

In case it's relevant, the Nutanix Cloud Controller Manager (CCM) is also running in this cluster, and this issue has persisted across CCM versions 0.5.2 and 0.6.0.

Please let me know if there's any other information I can provide.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions