Skip to content

Conversation

@dgchinner
Copy link
Contributor

@dgchinner dgchinner commented Feb 11, 2026

Enhancement:

The Azure HPC diagnostics script captures information about the
system hardware and software for dianostic purposes. It is intended
to supplement the RHEL sosreport diagnostics to cover the Azure
specific hardware and software that the sosreport does not capture.
This script will be used for information gathering in support
contexts, it is not intended to be run on active HPC nodes.

The script will be installed to /opt/hpc/azure/tools, and should be run
from there.

The log files generated from this script must be treated with the same
care as sosreport log files as they may contain customer information
and/or PII which falls under various data protection laws.

Result:


$ sudo /opt/hpc/azure/tools/gather_azhpc_vm_diagnostics.sh
######################################################################
# Azure HPC Diagnostics Tool                                                                                                                        #
######################################################################
# NOTICES:                                                                                                                                                   #
######################################################################
# This tool generates and bundles together various logs and diagnostic 
# information. It,  however, DOES NOT TRANSMIT any of said data. It is
# left to the user  to choose to transmit this data to Red Hat.
[......]

End of Diagnostic Collection

Checking for common issues

#######################################################################
# Placing diagnostic files in the following location:                                                                                                        #
# /var/hpc/azure/diagnostics/b320033e-5253-418a-a0f7-f4bba87e549e.2026-02-11.UTC08.03.30.tar.gz                                                              #
########################################################################

When run, a tarball containing all the log files will be deposited in /var/hpc/azure/diagnostics/.
The last lines output by the script will indicate the name and location of the tarball containing
the diagnostic information.

Issue Tracker Tickets (Jira or BZ if any): https://issues.redhat.com/browse/RHELHPC-110

Summary by Sourcery

Install and integrate the Azure HPC Diagnostics tool into the Azure HPC role for automated deployment and configuration.

New Features:

  • Add optional installation of the Azure HPC Diagnostics script into /opt/hpc/azure/tools controlled by a new hpc_install_diagnostics variable.

Enhancements:

  • Configure required diagnostics dependencies and source package metadata for RHEL 9, including download, patching, and installation workflow for the upstream diagnostics script.
  • Adjust the diagnostics script behavior and messaging via a templated patch to align with Red Hat support workflows and filesystem layout, including outputting tarballs under /var/hpc/azure/diagnostics and updating notices.
  • Document how to enable and run the Azure HPC Diagnostics tool in the role README.

Documentation:

  • Document the Azure HPC Diagnostics tool, its purpose, usage, and configuration flag in the role README.

@sourcery-ai
Copy link

sourcery-ai bot commented Feb 11, 2026

Reviewer's Guide

Adds automated installation of the Azure HPC Diagnostics tool to the role, including package dependencies, archive download, templated patching of the upstream script, relocation of diagnostics output, and documentation of the new hpc_install_diagnostics toggle.

Sequence diagram for Azure HPC Diagnostics installation via Ansible role

sequenceDiagram
    participant ControlNode
    participant TargetVM
    participant GitHubAzhpcDiagnostics

    ControlNode->>TargetVM: Run role tasks/main.yml
    rect rgb(235,235,235)
        Note over ControlNode,TargetVM: Install Azure HPC Diagnostics tool block (when hpc_install_diagnostics)
        ControlNode->>TargetVM: stat path=__hpc_azure_tools_dir/gather_azhpc_vm_diagnostics.sh
        TargetVM-->>ControlNode: __hpc_azure_diags_installed

        alt diagnostics_not_installed
            ControlNode->>TargetVM: package name=__hpc_azure_diagnostics_packages state=present
            loop until_success
                TargetVM-->>ControlNode: __hpc_azure_diagnostics_packages_install
            end

            ControlNode->>TargetVM: include_tasks download_extract_package.yml
            ControlNode->>GitHubAzhpcDiagnostics: HTTP GET hpcdiag_archive (url in __hpc_azhpc_diags_info)
            GitHubAzhpcDiagnostics-->>ControlNode: tarball
            ControlNode->>TargetVM: Upload and extract archive to __hpc_pkg_extracted.path

            ControlNode->>ControlNode: tempfile hpc_diags_XXXX.patch
            ControlNode->>ControlNode: template azhpc_vm_diagnostics.sh.patch.j2 -> __hpc_diags_patch_file.path

            ControlNode->>TargetVM: patch src=__hpc_diags_patch_file.path dest=__hpc_pkg_extracted.path/Linux/src/gather_azhpc_vm_diagnostics.sh

            ControlNode->>TargetVM: copy patched script -> __hpc_azure_tools_dir/gather_azhpc_vm_diagnostics.sh mode=0755

            ControlNode->>ControlNode: file state=absent path=__hpc_diags_patch_file.path
            ControlNode->>TargetVM: file state=absent path=__hpc_pkg_extracted.path
        else diagnostics_already_installed
            Note over ControlNode,TargetVM: Skips download, patch, and install steps
        end
    end

    ControlNode-->>TargetVM: Continue with remaining role tasks
Loading

File-Level Changes

Change Details Files
Introduce conditional task block to install and customize the Azure HPC Diagnostics script.
  • Guard diagnostics installation behind the hpc_install_diagnostics variable.
  • Skip installation when the script is already present at the target path.
  • Install required diagnostics package dependencies with retry support.
  • Download and extract the azhpc-diagnostics release archive using the existing download_extract_package.yml helper.
  • Create a temporary patch file on the control node and render it from a Jinja2 template before applying it to the extracted diagnostics script.
  • Patch and then copy the customized diagnostics script into the Azure tools directory with root ownership and executable permissions.
  • Clean up temporary patch files and the extracted archive directory after installation.
tasks/main.yml
Document the diagnostics installation toggle and usage.
  • Describe the purpose and behavior of the Azure HPC Diagnostics tool in the role README.
  • Document how to run the diagnostics script and where to find the resulting tarball.
  • Reference the upstream Azure azhpc-diagnostics repository for further information.
  • Introduce the hpc_install_diagnostics variable, defaulting it to true.
README.md
defaults/main.yml
Define release metadata and dependencies for the Azure HPC Diagnostics tool.
  • Add __hpc_azhpc_diags_info with name, version, checksum, and download URL for the azhpc-diagnostics release.
  • Introduce __hpc_azure_diagnostics_packages (currently hyperv-tools) as required dependencies for diagnostics.
vars/RedHat_9.yml
vars/main.yml
Customize upstream diagnostics script behavior and runtime paths via a templated patch.
  • Harden the script shebang and embed ansible-managed/system-role comments.
  • Remove online self-update and git commit detection, simplifying VERSION_INFO and replacing offline/online flags.
  • Set diagnostics output directory to the role’s Azure runtime diagnostics path instead of the script directory.
  • Adjust user-facing notices to refer to Red Hat rather than Microsoft and fix wording/spacing issues.
  • Remove Azure portal upload guidance from the script output.
  • Force terminal width handling and small UX tweaks while keeping core logic intact.
templates/azhpc_vm_diagnostics.sh.patch.j2

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@dgchinner dgchinner force-pushed the azure-hpc-diagnostics branch from f12a9f3 to bd20207 Compare February 11, 2026 08:10
Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 2 issues

Prompt for AI Agents
Please address the comments from this code review:

## Individual Comments

### Comment 1
<location> `tasks/main.yml:1447-1451` </location>
<code_context>
+            dest: "{{ __hpc_diags_patch_file.path }}"
+            mode: '0644'
+
+        - name: Patch Diagnostics script
+          patch:
+            src: "{{ __hpc_diags_patch_file.path }}"
+            dest: "{{ __hpc_pkg_extracted.path }}/Linux/src/gather_azhpc_vm_diagnostics.sh"
+            remote_src: false
+            state: present
+            strip: 1
</code_context>

<issue_to_address>
**issue (bug_risk):** Patch task likely mixes up controller/remote paths via `remote_src: false`.

Since `__hpc_diags_patch_file.path` is created via `tempfile` on the remote host, `src` points to a remote path. With `remote_src: false`, the `patch` module will look for this file on the controller instead and fail with file-not-found. Set `remote_src: true` so `patch` reads the file from the remote host where it actually exists.
</issue_to_address>

### Comment 2
<location> `README.md:203` </location>
<code_context>
+
+The Azure HPC Diagnostics tool gathers system information for triage and
+debugging purposes. It collects information and state from the hardware, OS,
+azure envinroment and installed applications and packages it into a tarball
+to simplify the process of system support of bug triage.
+
</code_context>

<issue_to_address>
**issue (typo):** Fix typo and capitalization in "azure envinroment".

Suggest using "Azure environment" here to fix the spelling and align capitalization with earlier usage.

```suggestion
Azure environment and installed applications and packages it into a tarball
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@dgchinner dgchinner force-pushed the azure-hpc-diagnostics branch 3 times, most recently from 02bd5e0 to 2c6078e Compare February 12, 2026 04:11
Define up the Azure HPC diagnostics tarball download location,
set up a SHA256 sum for it (as github does not supply one), then
add the config documentation and infrastructure to pull it down and
unpack it, ready to extract the diagnostics script from it.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
The script itself installs certain tools via repository deep links.
Some of these tools are provided by OS pacakages, so pull them in
via the system-role as a dependent package.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
The Azure HPC diagnostics script captures information about the
system hardware and software for dianostic purposes. It is intended
to supplement the RHEL sosreport diagnostics to cover the Azure
specific hardware and software that the sosreport does not capture.
This script will be used for information gathering in support
contexts, it is not intended to be run on active HPC nodes.

Before we install the downloaded diagnostic script, we need to
change a few things in the script:

- the output should be in {{ __hpc_azure_runtime_dir }}/diagnostics
- permanently disable the auto update code
- fix the version number instead of assuming the script it running
  from a local git repository
- change from defaulting to online mode (requires internet access)
  to offline mode. --offline option goes away, replaced by --online
  option
- Indicate that the diagnostic log files should be passed on to
  Red Hat, not Microsoft.

To make this easy, we will add a patch file to the system role that
contains the code changes we need to make to the script. This is
much simpler to apply that needing to do complex parser based
matches and replacements to make the changes we need.

The resultant patch file will then need to be treated as a
template to do path substitution for the runtime output directory.
This will place the diagnostic output in a well known place by
default, rather than where-ever the script was run from.

The script will be installed to {{ __hpc_azure_tools_dir }}. If the
script is already present in this location, then we will skip over
the installation entirely.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
@dgchinner dgchinner force-pushed the azure-hpc-diagnostics branch from 2c6078e to 686a52d Compare February 12, 2026 04:30
Copy link
Collaborator

@spetrosi spetrosi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well done with patch templating!

Comment on lines +1439 to +1448
- name: Clean up temporary patch file
file:
path: "{{ __hpc_diags_patch_file.path }}"
state: absent

- name: Remove extracted temp directory
file:
path: "{{ __hpc_pkg_extracted.path }}"
state: absent
changed_when: false
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- name: Clean up temporary patch file
file:
path: "{{ __hpc_diags_patch_file.path }}"
state: absent
- name: Remove extracted temp directory
file:
path: "{{ __hpc_pkg_extracted.path }}"
state: absent
changed_when: false
- name: Clean up temporary patch file
file:
path: "{{ item }}"
state: absent
loop:
- "{{ __hpc_diags_patch_file.path }}"
- "{{ __hpc_pkg_extracted.path }}"

Can be a loop here to avoid repetition.

version: 0.3.4
sha256: bab588b37f9a7d03fff82ff22d8a24c18a64e18eb2dad31f447a67b6fb76bd4c
url: https://github.com/Azure/Moneo/archive/refs/tags/v0.3.4.tar.gz

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

version: 20220316
sha256: bcecba0ff8999131f45508718ac6eec8615550e046c77d69c148d3947647849f
url: https://github.com/Azure/azhpc-diagnostics/archive/refs/tags/hpcdiag-20220316.tar.gz

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

- kernel-devel
- kernel-headers
__hpc_azure_diagnostics_packages:
# lsvmbus
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it commented here? Can you either comment why this might eventually be needed or remove?

dest: "{{ __hpc_diags_patch_file.path }}"
mode: '0644'

- name: Patch Diagnostics script
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only concern is what if dest script got updated and our patch stops working?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants