Skip to content

Conversation

@ggoklani
Copy link
Collaborator

@ggoklani ggoklani commented Feb 11, 2026

Enhancement:Add an RDMA validation testcase that checks the expected RDMA enablement and Azure RDMA persistent naming setup.

Reason:
To automatically verify that the role correctly enables RDMA in waagent, installs RDMA userland tools, and (on Azure/systemd) configures and maintains Azure persistent RDMA naming services.

Result:
A templated test-rdma.sh script is installed by the role and validates:
/etc/waagent.conf contains OS.EnableRDMA=y
ibv_devinfo is available
On Azure + systemd: persistent RDMA naming scripts, unit files, and udev rule exist; services are enabled; monitor is active; oneshot service is not failed

Note: Moved "Create Azure HPC resource directories" task at beginning to avoid path not found issue for other tasks.

**Manual Testing:

[azureuser@gaurav-hpcrdmatest1 tests]$ ./test-rdma.sh**

Testing waagent RDMA flag
Test Passed: waagent RDMA flag is set

Testing RDMA userland tools
Test Passed: RDMA tools are present (ibv_devinfo)

Testing Azure persistent RDMA naming artifacts
Test Passed: Azure persistent RDMA naming artifacts exist

Testing Azure persistent RDMA naming services
Test Passed: Azure persistent RDMA naming services look healthy

Issue Tracker Tickets (Jira or BZ if any):
https://issues.redhat.com/browse/RHELHPC-127

Summary by Sourcery

New Features:

  • Install a templated test-rdma.sh script that validates RDMA enablement in waagent, presence of RDMA tools, and Azure persistent RDMA naming artifacts and services on systemd-based Azure systems.

@sourcery-ai
Copy link

sourcery-ai bot commented Feb 11, 2026

Reviewer's Guide

Adds an Ansible-managed RDMA validation test script and wires it into the role so deployments can automatically verify waagent RDMA configuration, RDMA userland tools, and Azure persistent RDMA naming behavior.

Sequence diagram for RDMA validation script execution

sequenceDiagram
    actor Operator
    participant AnsibleRole
    participant ManagedNode
    participant TestRdmaScript
    participant Waagent
    participant IbverbsTools
    participant Systemd
    participant AzurePersistentNaming

    Operator->>AnsibleRole: Run_playbook
    AnsibleRole->>ManagedNode: Apply_hpc_azure_role
    AnsibleRole->>ManagedNode: Install_test_rdma_script

    Operator->>TestRdmaScript: Execute_test_rdma_sh
    TestRdmaScript->>Waagent: Read_etc_waagent_conf
    Waagent-->>TestRdmaScript: Return_OS_EnableRDMA_value

    TestRdmaScript->>IbverbsTools: Run_ibv_devinfo
    IbverbsTools-->>TestRdmaScript: Report_device_info_or_error

    TestRdmaScript->>Systemd: Detect_systemd_and_Azure_environment
    alt Azure_and_systemd
        TestRdmaScript->>AzurePersistentNaming: Check_scripts_unit_files_udev_rule
        AzurePersistentNaming-->>TestRdmaScript: Report_presence
        TestRdmaScript->>Systemd: Check_services_enabled_and_monitor_active
        Systemd-->>TestRdmaScript: Report_service_status
    end

    TestRdmaScript-->>Operator: Exit_code_and_summary
Loading

File-Level Changes

Change Details Files
Install a templated RDMA validation script as part of the Ansible role.
  • Adds a task to install a templated test-rdma.sh script into the role's tests directory with root ownership and execute permissions
  • Places the new installation step after Azure persistent RDMA naming monitor setup to ensure prerequisites are configured first
tasks/main.yml
Implement the RDMA validation script template that checks waagent, RDMA tools, and Azure persistent naming state.
  • Defines helper functions to assert the presence of files, executables, and commands, and to report failures consistently
  • Validates that /etc/waagent.conf exists and contains OS.EnableRDMA=y
  • Verifies ibv_devinfo is available in PATH as a proxy for RDMA userland tools being installed
  • Detects Azure platform via DMI sys_vendor and systemd as PID 1 and conditionally runs Azure-specific checks
  • Checks Azure persistent RDMA naming scripts, systemd unit files, and udev rules exist and have appropriate executability
  • Ensures Azure persistent RDMA naming services are enabled, the oneshot service is not in a failed state, and the monitor service is active
templates/rdma/test-rdma.sh.j2

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've left some high level feedback:

  • The waagent RDMA check uses grep -Fxq "OS.EnableRDMA=y" /etc/waagent.conf, which will fail if the setting is present but formatted differently (e.g., trailing spaces, comments, or lowercase); consider a more robust match that tolerates whitespace and comments while still enforcing the value.
  • The Azure detection relies on sys_vendor being exactly "Microsoft Corporation"; you might want to normalize (trim and case-fold) the vendor string or support known aliases to avoid false negatives on slightly different OEM strings.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The waagent RDMA check uses `grep -Fxq "OS.EnableRDMA=y" /etc/waagent.conf`, which will fail if the setting is present but formatted differently (e.g., trailing spaces, comments, or lowercase); consider a more robust match that tolerates whitespace and comments while still enforcing the value.
- The Azure detection relies on `sys_vendor` being exactly `"Microsoft Corporation"`; you might want to normalize (trim and case-fold) the vendor string or support known aliases to avoid false negatives on slightly different OEM strings.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@ggoklani ggoklani force-pushed the test_rdma_naming_changes branch from 1f85be6 to dea5d3c Compare February 11, 2026 09:42
@ggoklani ggoklani changed the title test : Added RDMA validation script for waagent, ibverbs tools, and Azure persistent naming test: Added RDMA validation script for waagent, ibverbs tools, and Azure persistent naming Feb 11, 2026
@ggoklani ggoklani force-pushed the test_rdma_naming_changes branch from dea5d3c to d482df7 Compare February 11, 2026 12:41
@ggoklani
Copy link
Collaborator Author

@dgchinner can you help to review this. Thanks

@dgchinner
Copy link
Contributor

Some things about the commits in the change - I'll look at the code separately.

First, you should never be merging origin/main back into the PR branch. If you have to update the PR branch because the main repository have moved forward, you need to rebase your PR branch on top of the new main branch (using 'git rebase') so that we keep a clean, linear git repository history. Repeated main branch back merges makes a mess of the git history and the non-linear nature of merge commits causes problems with git bisect and other sorts of change history analysis that we may need to do in future. Hence we need to avoid back-merges where-ever possible.

Secondly, the email address in the signed-off-by tags is not a valid email address. It needs to be your official RH email address. You can configure the username and email address that is used by git for all places where it adds your identity (author, committer, tags, etc) in the [user] section of your ~/.gitconfig file.

Finally, commit messages (again). The commit message should tell me why a change is being, not what the code is doing. This specific commit message says "test X, test y, test z", but I know that from looking at the code change. It doesn't tell me why we are testing those things. The commit message also doesn't tell me why you needed to move a bnuch of code around, either. I'm left to guess as to -why- these things need to be done and so I hve no context to determine if the tests are sufficient, whether they are redundant, what constraints and/or assumptions the test code operates under, how to run teh test, what the expected outcome may be, etc.

Again, you have put all this information in the PR. That's good, but this information also needs to be in the commit message. The history of our code is contained in the git repository, not in the github-based process metadata. If github goes away for whatever reason, we lose access to all the PR information. However, if that information is derived from the commits in the repository, we still have all that history because it is being kept in every copy of the git repository that has been cloned from the original on github.

Hence if you are writing something in the PR to describe the change and that information is not in the commit message, then please stop writing the PR and -rewrite the commit message- to contain that information. Once you've done that, then submit the PR.

Github makes this easy: if you title your commit with the one line summary you'd put in the PR title (e.g. "test: exercise persistent RDMA naming"), then when you create the PR the one-line commit title will be pulled into the PR title and the commit message body will be pulled into the PR body automatically. IOWs, you don't need to write a PR - it should already be written before you press the 'create PR' button....

@ggoklani ggoklani force-pushed the test_rdma_naming_changes branch from 24a8892 to 4547fff Compare February 12, 2026 05:54
… enablement and Azure RDMA persistent naming setup.

To automatically verify that the role correctly enables RDMA in waagent, installs RDMA userland tools, and (on Azure/systemd) configures and maintains Azure persistent RDMA naming services.
How to run:
Execute {{ __hpc_azure_tests_dir }}/test-rdma.sh after the role completes.
Expected result:
Exit 0 with “Test Passed …” lines; non-zero with “Failed: …” explaining the missing/failed prerequisite.

Moved "Create Azure HPC resource directories" task at beginning to avoid path not found issue for other tasks.

Signed-off-by: Gaurav Goklani <ggoklani@redhat.com>
@ggoklani ggoklani force-pushed the test_rdma_naming_changes branch from 4547fff to 123a8dc Compare February 12, 2026 05:58
@ggoklani
Copy link
Collaborator Author

Some things about the commits in the change - I'll look at the code separately.

First, you should never be merging origin/main back into the PR branch. If you have to update the PR branch because the main repository have moved forward, you need to rebase your PR branch on top of the new main branch (using 'git rebase') so that we keep a clean, linear git repository history. Repeated main branch back merges makes a mess of the git history and the non-linear nature of merge commits causes problems with git bisect and other sorts of change history analysis that we may need to do in future. Hence we need to avoid back-merges where-ever possible.

Secondly, the email address in the signed-off-by tags is not a valid email address. It needs to be your official RH email address. You can configure the username and email address that is used by git for all places where it adds your identity (author, committer, tags, etc) in the [user] section of your ~/.gitconfig file.

Finally, commit messages (again). The commit message should tell me why a change is being, not what the code is doing. This specific commit message says "test X, test y, test z", but I know that from looking at the code change. It doesn't tell me why we are testing those things. The commit message also doesn't tell me why you needed to move a bnuch of code around, either. I'm left to guess as to -why- these things need to be done and so I hve no context to determine if the tests are sufficient, whether they are redundant, what constraints and/or assumptions the test code operates under, how to run teh test, what the expected outcome may be, etc.

Again, you have put all this information in the PR. That's good, but this information also needs to be in the commit message. The history of our code is contained in the git repository, not in the github-based process metadata. If github goes away for whatever reason, we lose access to all the PR information. However, if that information is derived from the commits in the repository, we still have all that history because it is being kept in every copy of the git repository that has been cloned from the original on github.

Hence if you are writing something in the PR to describe the change and that information is not in the commit message, then please stop writing the PR and -rewrite the commit message- to contain that information. Once you've done that, then submit the PR.

Github makes this easy: if you title your commit with the one line summary you'd put in the PR title (e.g. "test: exercise persistent RDMA naming"), then when you create the PR the one-line commit title will be pulled into the PR title and the commit message body will be pulled into the PR body automatically. IOWs, you don't need to write a PR - it should already be written before you press the 'create PR' button....

@dgchinner Implemented the suggestions.. Thank you for the review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants