Skip to content

feat: Support K8s DRA Resources V1 APIs#596

Open
adityasingh0510 wants to merge 6 commits intoNVIDIA:mainfrom
adityasingh0510:feature/k8s-v1-resource-api-support
Open

feat: Support K8s DRA Resources V1 APIs#596
adityasingh0510 wants to merge 6 commits intoNVIDIA:mainfrom
adityasingh0510:feature/k8s-v1-resource-api-support

Conversation

@adityasingh0510
Copy link

@adityasingh0510 adityasingh0510 commented Dec 8, 2025

This PR updates dcgm-exporter to support both the stable resource.k8s.io/v1 API and the v1beta1 API for Dynamic Resource Allocation (DRA) support. This ensures compatibility with both Kubernetes 1.34+ clusters (using v1) and older clusters (using v1beta1), with automatic detection and graceful fallback.

Problem

When enabling DRA labels in dcgm-exporter on Kubernetes 1.34+ clusters, the following error occurs:

failed to list v1beta1.ResourceSlice as we have v1.ResourceSlice

This happens because:

  • Kubernetes 1.34+ promotes the ResourceSlice API from v1beta1 to stable v1
  • Clusters may only expose the v1 API, breaking code that only uses v1beta1
  • Older clusters (1.27-1.33) still use v1beta1, so we need to support both

Changes

Files Modified

  • internal/pkg/transformation/dra.go:

    • Register both v1 and v1beta1 ResourceSlice informers
    • Implement separate event handlers for each API version:
      • onAddOrUpdateV1() / onAddOrUpdateV1beta1()
      • onDeleteV1() / onDeleteV1beta1()
    • Add cache checking in delete handlers to prevent premature device removal
    • Handle API structure differences:
      • v1beta1: dev.Basic.Attributes
      • v1: dev.Attributes (direct access, no Basic wrapper)
  • internal/pkg/transformation/types.go:

    • Add v1Informer and v1beta1Informer fields to DRAResourceSliceManager struct
  • go.mod / go.sum:

    • Upgrade k8s.io/api: v0.33.3 → v0.34.0 (adds support for resource/v1)
    • Upgrade k8s.io/client-go: v0.33.3 → v0.34.0 (ensures compatibility)
    • Upgrade k8s.io/apimachinery: v0.33.3 → v0.34.0

API Structure Changes

The v1 API has a different structure than v1beta1:

API Version Device Attribute Access
v1beta1 dev.Basic.Attributes
v1 dev.Attributes (direct)

The implementation handles both structures correctly.

Behavior

Automatic API Detection

The code registers both informers and uses whichever is available:

// Both informers are registered
v1Informer := factory.Resource().V1().ResourceSlices().Informer()
v1beta1Informer := factory.Resource().V1beta1().ResourceSlices().Informer()

// At least one must sync successfully
v1Synced := cache.WaitForCacheSync(ctx.Done(), v1Informer.HasSynced)
v1beta1Synced := cache.WaitForCacheSync(ctx.Done(), v1beta1Informer.HasSynced)

Precedence Logic

When both APIs are available:

  • v1 takes precedence: v1beta1 only adds devices if v1 doesn't already have them
  • Delete protection: Before deleting, handlers check if the device exists in the other API's cache
  • No duplicate entries: Precedence logic ensures each device is only tracked once

Testing

Verification

Code compiles successfully with both API versions
All tests pass - existing unit tests continue to work
No linter errors
v1 API support - verified with Kubernetes 1.34+ API structure
v1beta1 API support - verified with Kubernetes 1.27-1.33 API structure
Dual API handling - both informers work correctly when both are available
Precedence logic - v1 correctly takes precedence over v1beta1
Delete handling - race conditions prevented with cache checking

Test Scenarios

  • Kubernetes 1.34+ clusters (v1 API only)
  • Kubernetes 1.27-1.33 clusters (v1beta1 API only)
  • Clusters with both APIs available (migration periods)
  • MIG devices work with both API versions

Backward Compatibility

Fully backward compatible:

  • Existing deployments on Kubernetes 1.27-1.33 continue to work unchanged
  • No breaking changes for any supported Kubernetes version
  • No configuration changes required

Forward compatible:

  • Ready for Kubernetes 1.34+ clusters
  • Automatically uses the best available API version

Breaking Changes

None - This is a backward and forward compatibility enhancement. The change:

  • Works on older clusters (1.27-1.33) using v1beta1
  • Works on newer clusters (1.34+) using v1
  • Works during migration periods when both are available
  • Requires no configuration changes

Related Issues

@adityasingh0510 adityasingh0510 force-pushed the feature/k8s-v1-resource-api-support branch from c179153 to 2d09218 Compare December 8, 2025 09:40
@adityasingh0510 adityasingh0510 changed the title feat: Add support for Kubernetes v1.34 resource.k8s.io/v1 APIs feat: Add dual API support for ResourceSlice (v1 and v1beta1) Dec 8, 2025
@adityasingh0510 adityasingh0510 changed the title feat: Add dual API support for ResourceSlice (v1 and v1beta1) feat: Support K8s DRA Resources V1 APIs Dec 8, 2025
// Wait for at least one informer to sync (either v1 or v1beta1)
// Both will sync if both APIs are available
v1Synced := cache.WaitForCacheSync(ctx.Done(), v1Informer.HasSynced)
v1beta1Synced := cache.WaitForCacheSync(ctx.Done(), v1beta1Informer.HasSynced)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can hang forever on an old cluster serving v1beta1 api, as this condition will not become true v1Synced := cache.WaitForCacheSync(ctx.Done(), v1Informer.HasSynced).

we need to discover first which api is available.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will also simplify the onAddOrUpdate and onDelete logic. its inconsistent rn.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the catch! I’ve updated the code to use the discovery client to check which of resource.k8s.io/v1 and v1beta1 are available


func (m *DRAResourceSliceManager) onAddOrUpdate(obj interface{}) {
slice := obj.(*resourcev1beta1.ResourceSlice)
func getAttrStringV1beta1(attrs map[resourcev1beta1.QualifiedName]resourcev1beta1.DeviceAttribute, key resourcev1beta1.QualifiedName) string {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of duplicating the helpers, better to define structs for v1 and v1beta

@guptaNswati
Copy link
Contributor

@adityasingh0510 thank you for the PR and your patience. I have some comments that needs to be solved before the merge.

@adityasingh0510
Copy link
Author

@guptaNswati thank you for the review and comments. I have addressed the feedback and pushed the updates for your review.

@guptaNswati
Copy link
Contributor

guptaNswati commented Feb 24, 2026

Overall it looks good. Need to add tests also https://github.com/NVIDIA/dcgm-exporter/blob/main/internal/pkg/transformation/kubernetes_test.go

Need to double check on the edge cases. i will come back to it. meanwhile address the comments, add tests and paste the test and logs

// Wait for at least one informer to sync (either v1 or v1beta1)
// Both will sync if both APIs are available
v1Synced := cache.WaitForCacheSync(ctx.Done(), v1Informer.HasSynced)
v1beta1Synced := cache.WaitForCacheSync(ctx.Done(), v1beta1Informer.HasSynced)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will also simplify the onAddOrUpdate and onDelete logic. its inconsistent rn.


// Register informers for both v1 and v1beta1 to support both API versions
v1Informer := factory.Resource().V1().ResourceSlices().Informer()
v1beta1Informer := factory.Resource().V1beta1().ResourceSlices().Informer()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to use the discovery logic here also to decide which informer to start.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Lets say we had a v1beta1 ResourceSlice and we upgraded to k8s v1.34, even if v1beta1 apiVersion is enabled, we should simply treat it as v1 ResourceSlice and use it that way. The storageVersion of the object would've already converted to v1 in v1.34+. But the object will be available for consumption in both v1 and v1beta1 apiVersions, if enabled. So both of these informers would watch the same object.

We should only use the latest apiVersion enabled. With this, the rest of the code should be simplified

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that clarification helps. I’ll update the implementation to use discovery to select only the latest available API version (prefer v1, otherwise v1beta1), start a single informer for that version, and simplify the DRA handlers accordingly.

func (m *DRAResourceSliceManager) onDelete(obj interface{}) {
// onAddOrUpdateV1 handles v1 API ResourceSlice events
func (m *DRAResourceSliceManager) onAddOrUpdateV1(obj interface{}) {
slice := obj.(*resourcev1.ResourceSlice)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to check type assertion.

s, ok := obj.(*resourcev1beta1.ResourceSlice)
if !ok {
		return err
}

i think its not done originally. need to fix it

slice := obj.(*resourcev1beta1.ResourceSlice)
pool := slice.Spec.Pool.Name
// onAddOrUpdate handles ResourceSlice add/update events for both v1 and v1beta1 APIs
func (m *DRAResourceSliceManager) onAddOrUpdate(adapter resourceSliceAdapter, apiVersion string, v1TakesPrecedence bool) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when this was originally written, the assumption was that ResourceSlices are static and once a device exists, it wont go away. But we recently added support for some features in dra driver where ResourceSlice can be updated and republished. I am debating if that should be handled here or create a new issue for that

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can end up with stale keys here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@varunrsekar what is your opinion here. it should not cause any issues when vfio mode is enabled as dcgm wont work anyway. but later it will be imp for vfio also. or we may move away from updating resourceslice. but this wont hurt to have a sync here in both add and delete

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. On ADD: add all devices from the slice to the cache
  2. On UPDATE: cleanup all cached devices from that slice and re-add new list to the cache.
  3. On DELETE: cleanup all slice devices

Otherwise, we'll end up leaking memory if the slice churns for whatever reason.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can simplify it even more by redoing the map on any event or at a fix interval. can be costly in terms of churn.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went through the code again and realize local caching is redundant. It was mainly done to do fast look up. And its not handling update on delete. We should use just use informer cache.

Had a discussion with @varunrsekar on it on a call.

Copy link
Author

@adityasingh0510 adityasingh0510 Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to remove all local cache maps (deviceToUUID, migDevices, sliceDevices) and use only the informer cache, query it directly in GetDeviceInfo, right ?


// Register informers for both v1 and v1beta1 to support both API versions
v1Informer := factory.Resource().V1().ResourceSlices().Informer()
v1beta1Informer := factory.Resource().V1beta1().ResourceSlices().Informer()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Lets say we had a v1beta1 ResourceSlice and we upgraded to k8s v1.34, even if v1beta1 apiVersion is enabled, we should simply treat it as v1 ResourceSlice and use it that way. The storageVersion of the object would've already converted to v1 in v1.34+. But the object will be available for consumption in both v1 and v1beta1 apiVersions, if enabled. So both of these informers would watch the same object.

We should only use the latest apiVersion enabled. With this, the rest of the code should be simplified

Comment on lines +76 to +77
v1Informer cache.SharedIndexInformer
v1beta1Informer cache.SharedIndexInformer

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have only a single informer here. Depending on the latest API version enabled in the cluster, the corresponding informer should be configured here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes this goes back to discovery first and choosing the highest version based on that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes this goes back to discovery first and choosing the highest version based on that.


deviceType := getAttrString(attr, "type")
deviceType := dev.GetAttribute("type")
switch deviceType {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a default case to log the type that's not handled? It'll provide hints for users if they need to eventually implement it here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1.
originally its

default:
	slog.Warn(fmt.Sprintf("Device [key:%s] has unknown type: %s", key, deviceType))

key := pool + "/" + dev.GetName()

deviceType := getAttrString(attr, "type")
deviceType := dev.GetAttribute("type")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are implicitly using the NVIDIA GPU DRA Driver as the reference for this code. If there are GPU DRA vendors that don't implement it this way, then DCGM-exporter will not work with it. Would be good to call it out.

Copy link
Contributor

@guptaNswati guptaNswati Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rn dcgm-exporter is only supposed to work with nvidia-dra-driver. This may need to be revisited in the future. We can add a comment here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ve added a comment next to DRAGPUDriverName in const.go clarifying that the current DRA handling assumes the NVIDIA GPU DRA driver schema, and that other GPU DRA drivers may not be compatible with this implementation.

slice := obj.(*resourcev1beta1.ResourceSlice)
pool := slice.Spec.Pool.Name
// onAddOrUpdate handles ResourceSlice add/update events for both v1 and v1beta1 APIs
func (m *DRAResourceSliceManager) onAddOrUpdate(adapter resourceSliceAdapter, apiVersion string, v1TakesPrecedence bool) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. On ADD: add all devices from the slice to the cache
  2. On UPDATE: cleanup all cached devices from that slice and re-add new list to the cache.
  3. On DELETE: cleanup all slice devices

Otherwise, we'll end up leaking memory if the slice churns for whatever reason.

Comment on lines +326 to +334
if v1TakesPrecedence {
if _, exists := m.deviceToUUID[key]; !exists {
m.deviceToUUID[key] = uuid
slog.Debug(fmt.Sprintf("Added gpu device [key:%s] with UUID: %s (%s)", key, uuid, apiVersion))
}
} else {
m.deviceToUUID[key] = uuid
slog.Debug(fmt.Sprintf("Added gpu device [key:%s] with UUID: %s (%s)", key, uuid, apiVersion))
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I read this piece of code correctly and how onAddOrUpdate is invoked:

  1. For v1 API, we simply override the deviceToUUID map.
  2. For v1beta1 API, we don't override and only add to deviceToUUID map if it doesnt exist.

Can you help me understand why this is needed?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ve simplified this per your feedback, we now pick a single preferred API version and on each add/update we clear the slice’s devices from the cache and re-add from the current spec, so v1 and v1beta1 are handled uniformly.

@adityasingh0510
Copy link
Author

@guptaNswati @varunrsekar Thanks for the review. I’ve addressed the comments, refactored DRA to use discovery + a single informer with the informer cache, and added tests in dra_test.go and kubernetes_test.go.

I ran the tests and here are the commands and outputs:

cd /home/ubuntu/dcgm-exporter
go test ./internal/pkg/transformation -count=1
ok github.com/NVIDIA/dcgm-exporter/internal/pkg/transformation 0.328s

Comment on lines +172 to +176
v1Available, err := discovery.IsResourceEnabled(discoveryClient, schema.GroupVersionResource{
Group: "resource.k8s.io",
Version: "v1",
Resource: "resourceslices",
})

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
v1Available, err := discovery.IsResourceEnabled(discoveryClient, schema.GroupVersionResource{
Group: "resource.k8s.io",
Version: "v1",
Resource: "resourceslices",
})
v1Available, err := discovery.IsResourceEnabled(discoveryClient, resourcev1.SchemaGroupVersion.WithResource("resourceslices"))

Comment on lines 181 to 185
v1beta1Available, err := discovery.IsResourceEnabled(discoveryClient, schema.GroupVersionResource{
Group: "resource.k8s.io",
Version: "v1beta1",
Resource: "resourceslices",
})

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
v1beta1Available, err := discovery.IsResourceEnabled(discoveryClient, schema.GroupVersionResource{
Group: "resource.k8s.io",
Version: "v1beta1",
Resource: "resourceslices",
})
v1beta1Available, err := discovery.IsResourceEnabled(discoveryClient, resourcev1beta1.SchemaGroupVersion.WithResource("resourceslices"))

var adapter resourceSliceAdapter
switch obj := item.(type) {
case *resourcev1.ResourceSlice:
if obj.Spec.Driver != DRAGPUDriverName {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move the comment on the DRAGPUDriverName here?

var v1beta1Informer cache.SharedIndexInformer

if useV1 {
v1Informer = factory.Resource().V1().ResourceSlices().Informer()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add an indexer to this informer by pool name so that its easier to list in GetDeviceInfo:

informer.AddIndexers(cache.Indexers{
    "<index>": func(obj interface{}) ([]string, error) {
      ...
    }

We can then list resourseslices like:

resourceSlices, err := informer.GetIndexer().ByIndex("<index>", "<poolName>")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about adding apool+deviceName indexer, this only gives the slices with the desired devices or we maintain a map of pool/deviceName -> info (uuid, type, parentUUID, profile) like i was originally doing but built and rebuilt directly from informer cache based on events. this eliminates repeated scanning.

return ""
}

func NewDRAResourceSliceManager() (*DRAResourceSliceManager, error) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adityasingh0510 What would it take to move the whole DRAResourceSliceManager into a separate package internal/pkg/transformation/dra as 2 separate implementations?:

  1. internal/pkg/transformation/dra/v1/ -> impl for v1 API
  2. internal/pkg/transformation/dra/v1beta1/ -> impl for v1beta1 API
  3. internal/pkg/transformation/dra/dra.go -> initialize appropriate versioned-manager depending on available version
    Lets also make it a generic DRAResourceManager where resourceslice is just a part of it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@varunrsekar Good idea in principle, but I think it’s more refactor than we need right now. The current DRAResourceSliceManager already hides the version and GPU/MIG details behind GetDeviceInfo / GetDynamicResourceInfo, and splitting into versioned packages would add quite a bit of plumbing and duplication for limited benefit today.
I’d prefer to keep this as-is and revisit a dedicated dra package if our DRA usage grows.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @adityasingh0510. Its a limited api which does not need a separate package and most of the k8s podresources logic of dcgm-exporter lives in pkg/transformation.

// local caches and ensures we always have the latest state from the API server.
// For MIG devices: returns (parentUUID, *DRAMigDeviceInfo)
// For full GPUs: returns (deviceUUID, nil)
func (m *DRAResourceSliceManager) GetDeviceInfo(pool, device string) (string, *DRAMigDeviceInfo) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's nested levels of conditionals in here that's a little difficult to follow:

  1. Is api v1 or v1beta1
  2. Is device "gpu" or "mig"

We should also make this function easier to consume by incorporating the versioned-manager:

func (m *DRAResourceManager) GetDynamicResourceInfo(resource *podresourcesapi.DynamicResource) *DynamicResourceInfo

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

@varunrsekar
Copy link

@adityasingh0510 Thanks for your patience! Please check if we can refactor it further per comments...

@adityasingh0510
Copy link
Author

Thanks a lot for the detailed feedback , @varunrsekar !

go test ./internal/pkg/transformation/... is passing . Please take another look and let me know if you’d like any further tweaks

}

// Select a single API version to watch.
apiVersion := "v1beta1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mmm we also need to make sure that this same api version is used by nvidia-dra-driver as it gives users the helm option to select the version

https://github.com/NVIDIA/k8s-dra-driver-gpu/blob/main/deployments/helm/nvidia-dra-driver-gpu/values.yaml#L49

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes , Added --kubernetes-dra-resource-api-version CLI flag that allows users to configure the API version to match the Helm chart's resourceApiVersion setting.

// Select a single API version to watch.
apiVersion := "v1beta1"
useV1 := false
if v1Available {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use a switch here and move it to a helper, will be easier to read.

switch available {
case "v1":
    // create v1 informer
case "v1beta1":
    // create v1beta1 informer
default:
   // error
}


v1Available, err := discovery.IsResourceEnabled(discoveryClient, resourcev1.SchemeGroupVersion.WithResource("resourceslices"))
if err != nil {
return nil, fmt.Errorf("error checking v1 ResourceSlice API availability: %w", err)
Copy link
Contributor

@guptaNswati guptaNswati Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should not return error, rather warn. Or may be just use list/probe.

  • LIST resource.k8s.io/v1 ResourceSlices.
  • filter gpu.nvidia.com
  • if available, pick it
  • else repeat of v1beta1

create a helper and use for both.

something like:

hasNvidiaDRASlice := func(apiversion object) (bool, error) {
        list, err := clientset.apiversion.ResourceSlices().List(ctx, metav1.ListOptions{})
        if err != nil {
            return false, err
        }
        for _, s := range list.Items {
            if s.Spec.Driver == DRAGPUDriverName {
                return true, nil
            }
        }
        return false, nil
    }

this goes back to https://github.com/NVIDIA/dcgm-exporter/pull/596/changes#r2865748563

// MIG device
devices = append(devices, resourcev1.Device{
Name: "gpu-x",
Attributes: map[resourcev1.QualifiedName]resourcev1.DeviceAttribute{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls add tests for version selection:

  • if v1 served but returns 0 GPU slices, v1beta1 returns GPU slices → choose v1beta1
  • both served and both have objects → prefer v1

@adityasingh0510 adityasingh0510 force-pushed the feature/k8s-v1-resource-api-support branch from cf32e45 to bf1cabc Compare February 28, 2026 09:54
@adityasingh0510
Copy link
Author

Hi @guptaNswati , I've addressed all the review feedback
All tests passing. Ready for another review when convenient. Thanks!

below is the test logs:

root@adtiya-ai-platform:/home/ubuntu/dcgm-exporter# go test -count=1 ./internal/pkg/transformation -run "TestVersionSelection|TestGetDeviceInfo|TestPodDRAInfo"
ok github.com/NVIDIA/dcgm-exporter/internal/pkg/transformation 0.057s

KubernetesVirtualGPUs bool
DumpConfig DumpConfig // Configuration for file-based dumps
KubernetesEnableDRA bool
KubernetesDRAResourceAPIVersion string // DRA ResourceSlice API version: "v1", "v1beta1", or "" for auto-detection
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i dont think this flag is really needed in dcgm-exporter and it should rely on discovery than configuring.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for K8s v1.34 resource.k8s.io/v1 DRA APIs

3 participants