-
Notifications
You must be signed in to change notification settings - Fork 6
test: Added RDMA validation script for waagent, ibverbs tools, and Azure persistent naming #77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
test: Added RDMA validation script for waagent, ibverbs tools, and Azure persistent naming #77
Conversation
Reviewer's GuideAdds an Ansible-managed RDMA validation test script and wires it into the role so deployments can automatically verify waagent RDMA configuration, RDMA userland tools, and Azure persistent RDMA naming behavior. Sequence diagram for RDMA validation script executionsequenceDiagram
actor Operator
participant AnsibleRole
participant ManagedNode
participant TestRdmaScript
participant Waagent
participant IbverbsTools
participant Systemd
participant AzurePersistentNaming
Operator->>AnsibleRole: Run_playbook
AnsibleRole->>ManagedNode: Apply_hpc_azure_role
AnsibleRole->>ManagedNode: Install_test_rdma_script
Operator->>TestRdmaScript: Execute_test_rdma_sh
TestRdmaScript->>Waagent: Read_etc_waagent_conf
Waagent-->>TestRdmaScript: Return_OS_EnableRDMA_value
TestRdmaScript->>IbverbsTools: Run_ibv_devinfo
IbverbsTools-->>TestRdmaScript: Report_device_info_or_error
TestRdmaScript->>Systemd: Detect_systemd_and_Azure_environment
alt Azure_and_systemd
TestRdmaScript->>AzurePersistentNaming: Check_scripts_unit_files_udev_rule
AzurePersistentNaming-->>TestRdmaScript: Report_presence
TestRdmaScript->>Systemd: Check_services_enabled_and_monitor_active
Systemd-->>TestRdmaScript: Report_service_status
end
TestRdmaScript-->>Operator: Exit_code_and_summary
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey - I've left some high level feedback:
- The waagent RDMA check uses
grep -Fxq "OS.EnableRDMA=y" /etc/waagent.conf, which will fail if the setting is present but formatted differently (e.g., trailing spaces, comments, or lowercase); consider a more robust match that tolerates whitespace and comments while still enforcing the value. - The Azure detection relies on
sys_vendorbeing exactly"Microsoft Corporation"; you might want to normalize (trim and case-fold) the vendor string or support known aliases to avoid false negatives on slightly different OEM strings.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The waagent RDMA check uses `grep -Fxq "OS.EnableRDMA=y" /etc/waagent.conf`, which will fail if the setting is present but formatted differently (e.g., trailing spaces, comments, or lowercase); consider a more robust match that tolerates whitespace and comments while still enforcing the value.
- The Azure detection relies on `sys_vendor` being exactly `"Microsoft Corporation"`; you might want to normalize (trim and case-fold) the vendor string or support known aliases to avoid false negatives on slightly different OEM strings.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
1f85be6 to
dea5d3c
Compare
dea5d3c to
d482df7
Compare
|
@dgchinner can you help to review this. Thanks |
|
Some things about the commits in the change - I'll look at the code separately. First, you should never be merging origin/main back into the PR branch. If you have to update the PR branch because the main repository have moved forward, you need to rebase your PR branch on top of the new main branch (using 'git rebase') so that we keep a clean, linear git repository history. Repeated main branch back merges makes a mess of the git history and the non-linear nature of merge commits causes problems with git bisect and other sorts of change history analysis that we may need to do in future. Hence we need to avoid back-merges where-ever possible. Secondly, the email address in the signed-off-by tags is not a valid email address. It needs to be your official RH email address. You can configure the username and email address that is used by git for all places where it adds your identity (author, committer, tags, etc) in the [user] section of your ~/.gitconfig file. Finally, commit messages (again). The commit message should tell me why a change is being, not what the code is doing. This specific commit message says "test X, test y, test z", but I know that from looking at the code change. It doesn't tell me why we are testing those things. The commit message also doesn't tell me why you needed to move a bnuch of code around, either. I'm left to guess as to -why- these things need to be done and so I hve no context to determine if the tests are sufficient, whether they are redundant, what constraints and/or assumptions the test code operates under, how to run teh test, what the expected outcome may be, etc. Again, you have put all this information in the PR. That's good, but this information also needs to be in the commit message. The history of our code is contained in the git repository, not in the github-based process metadata. If github goes away for whatever reason, we lose access to all the PR information. However, if that information is derived from the commits in the repository, we still have all that history because it is being kept in every copy of the git repository that has been cloned from the original on github. Hence if you are writing something in the PR to describe the change and that information is not in the commit message, then please stop writing the PR and -rewrite the commit message- to contain that information. Once you've done that, then submit the PR. Github makes this easy: if you title your commit with the one line summary you'd put in the PR title (e.g. "test: exercise persistent RDMA naming"), then when you create the PR the one-line commit title will be pulled into the PR title and the commit message body will be pulled into the PR body automatically. IOWs, you don't need to write a PR - it should already be written before you press the 'create PR' button.... |
24a8892 to
4547fff
Compare
… enablement and Azure RDMA persistent naming setup.
To automatically verify that the role correctly enables RDMA in waagent, installs RDMA userland tools, and (on Azure/systemd) configures and maintains Azure persistent RDMA naming services.
How to run:
Execute {{ __hpc_azure_tests_dir }}/test-rdma.sh after the role completes.
Expected result:
Exit 0 with “Test Passed …” lines; non-zero with “Failed: …” explaining the missing/failed prerequisite.
Moved "Create Azure HPC resource directories" task at beginning to avoid path not found issue for other tasks.
Signed-off-by: Gaurav Goklani <ggoklani@redhat.com>
4547fff to
123a8dc
Compare
@dgchinner Implemented the suggestions.. Thank you for the review |
Enhancement:Add an RDMA validation testcase that checks the expected RDMA enablement and Azure RDMA persistent naming setup.
Reason:
To automatically verify that the role correctly enables RDMA in waagent, installs RDMA userland tools, and (on Azure/systemd) configures and maintains Azure persistent RDMA naming services.
Result:
A templated test-rdma.sh script is installed by the role and validates:
/etc/waagent.conf contains OS.EnableRDMA=y
ibv_devinfo is available
On Azure + systemd: persistent RDMA naming scripts, unit files, and udev rule exist; services are enabled; monitor is active; oneshot service is not failed
Note: Moved "Create Azure HPC resource directories" task at beginning to avoid path not found issue for other tasks.
**Manual Testing:
[azureuser@gaurav-hpcrdmatest1 tests]$ ./test-rdma.sh**
Testing waagent RDMA flag
Test Passed: waagent RDMA flag is set
Testing RDMA userland tools
Test Passed: RDMA tools are present (ibv_devinfo)
Testing Azure persistent RDMA naming artifacts
Test Passed: Azure persistent RDMA naming artifacts exist
Testing Azure persistent RDMA naming services
Test Passed: Azure persistent RDMA naming services look healthy
Issue Tracker Tickets (Jira or BZ if any):
https://issues.redhat.com/browse/RHELHPC-127
Summary by Sourcery
New Features: