Skip to content

fix: support Google COS NVMe drive discovery in node-agent findDrives#1849

Open
kristina-solovyova wants to merge 1 commit intomainfrom
11-12-fix_support_google_cos_nvme_drive_discovery_in_node-agent_finddrives
Open

fix: support Google COS NVMe drive discovery in node-agent findDrives#1849
kristina-solovyova wants to merge 1 commit intomainfrom
11-12-fix_support_google_cos_nvme_drive_discovery_in_node-agent_finddrives

Conversation

@kristina-solovyova
Copy link
Collaborator

@kristina-solovyova kristina-solovyova commented Nov 12, 2025

Detect Google Container-Optimized OS and use wwid fallback for NVMe serial IDs when the standard serial path is unavailable.

  • Mount /etc/os-release for node-agent pods for COS detection
  • Add COS-specific wwid fallback for NVMe devices in node-agent's findDrives

Copy link
Collaborator Author

kristina-solovyova commented Nov 12, 2025


How to use the Graphite Merge Queue

Add the label main-merge-queue to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has required the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@kristina-solovyova kristina-solovyova changed the base branch from 11-12-fix_sanitize_kernel_versions_in_enable-local-drivers-distribution to graphite-base/1849 November 12, 2025 15:18
@kristina-solovyova kristina-solovyova force-pushed the 11-12-fix_support_google_cos_nvme_drive_discovery_in_node-agent_finddrives branch from 5f00cbf to 67c75d8 Compare November 12, 2025 15:18
@kristina-solovyova kristina-solovyova changed the base branch from graphite-base/1849 to main November 12, 2025 15:18
@kristina-solovyova kristina-solovyova marked this pull request as ready for review November 12, 2025 15:19
@graphite-app graphite-app bot requested review from assafgi and tigrawap November 12, 2025 15:19
@graphite-app
Copy link

graphite-app bot commented Nov 12, 2025

Graphite Automations

"Add anton/matt/sergey/kristina as reviwers on operator PRs" took an action on this PR • (11/12/25)

2 reviewers were added to this PR based on Anton Bykov's automation.

@kristina-solovyova kristina-solovyova force-pushed the 11-12-fix_support_google_cos_nvme_drive_discovery_in_node-agent_finddrives branch from 67c75d8 to 0778700 Compare November 12, 2025 16:22
Copy link
Collaborator Author

@assafgi @tigrawap we have an issue here - weka is not handling drive serial for Google COS correclty:

[root@weka-operator-node-agent-vblw4 /]# cat /sys/devices/pci0000:00/0000:00:04.0/nvme/nvme0/serial
nvme_card

root@gke-kristina-gcp-converged-7f1fcae5-dq3l:/# weka cluster drive -o id,uuid,hostname,status,serial,path 
DISK ID  UUID                                  HOSTNAME                                  STATUS  SERIAL NUMBER  DEVICE PATH
0        da5e64c7-1738-444d-9a30-c6e329d2c843  gke-kristina-gcp-converged-7f1fcae5-dq3l  ACTIVE  nvme_card      0000:00:04.0
1        6a5634de-1c09-4158-b05c-b8bea81b2656  gke-kristina-gcp-converged-7f1fcae5-52gd  ACTIVE  nvme_card      0000:00:04.0
2        7fa67ed3-f09e-4ade-96a0-9db366ebb78c  gke-kristina-gcp-converged-7f1fcae5-ch72  ACTIVE  nvme_card      0000:00:04.0
3        20e7f596-1230-42d1-a645-f48e173f9d7c  gke-kristina-gcp-converged-7f1fcae5-23gd  ACTIVE  nvme_card      0000:00:04.0
4        356ee683-737b-49d6-afb7-d12e0c49c906  gke-kristina-gcp-converged-7f1fcae5-9z8p  ACTIVE  nvme_card      0000:00:04.0
5        47081f54-85e2-4c8d-8cf2-0ee46b0e2f0d  gke-kristina-gcp-converged-7f1fcae5-2f17  ACTIVE  nvme_card      0000:00:04.0

and this is reflected in our AddedDrives:

...
  addedDrives:
  - added_time: "2025-11-12T16:15:13.158865Z"
    device_path: "0000:00:04.0"
    serial_number: nvme_card
    status: ACTIVE
    uuid: 47081f54-85e2-4c8d-8cf2-0ee46b0e2f0d
  allocations:
    agentPort: 16304
    drives:
    - nvme.1ae0-6e766d655f63617264-6e766d655f63617264-00000001
    netDevices:
    - udp
    wekaPort: 16100

Copy link
Collaborator Author

it breaks AddedDrivesNotAligedWithAllocations check needed for replace-drives, which is based on serials

Copy link
Contributor

We dont officially support google COS for backends
@rugggger spent some time on working around serial ids, but i dont remember if reached full solution

Copy link
Contributor

I did not continue to work on it, it's in my backlog - there was an issue to sign all the drives with a unique ID. We can sync on it @kristina-solovyova

@assafgi assafgi requested a review from a team as a code owner January 28, 2026 18:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants