Skip to content

Conversation

@nbojanic
Copy link

🔍 Description

This PR adds a diagnose subcommand to Trident (#285).

The command creates a tarball containing:

  • host status
  • whether trident is in a container
  • whether trident is in a VM
  • platform info (os release, kernel version, cpus, memory)
  • disk information (lsblk)
  • log files
  • metrics
  • datastores

🤔 Rationale

This PR aims to simplify troubleshooting by providing one command that aggregates relevant debugging info into a single tarball.

📝 Checks

📌 Follow-ups

TODO (before promoting from draft):

  • check on warnings in osutils crate
  • add an end to end test for trident diagnose

🗒️ Notes

@nbojanic
Copy link
Author

@microsoft-github-policy-service agree company="Microsoft"

@frhuelsz
Copy link
Contributor

/AzurePipelines run [GITHUB]-trident-pr

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@nbojanic
Copy link
Author

/AzurePipelines run [GITHUB]-trident-pr

@azure-pipelines
Copy link

Commenter does not have sufficient privileges for PR 336 in repo microsoft/trident

@bfjelds
Copy link
Member

bfjelds commented Nov 17, 2025

/AzurePipelines run [GITHUB]-trident-pr-e2e

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@bfjelds
Copy link
Member

bfjelds commented Nov 17, 2025

/AzurePipelines run [GITHUB]-trident-pr

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@bfjelds
Copy link
Member

bfjelds commented Nov 18, 2025

@bfjelds
Copy link
Member

bfjelds commented Nov 18, 2025

i think it'd be great to invoke trident diagnose in the e2e ab-update tests here:

logrus.Infof("Trident service is in expected state")
.

something like:

out, err := utils.InvokeTrident(h.args.Env, client, h.args.EnvVars, "diagnose")
if err != nil {
    return fmt.Errorf("failed to invoke Trident diagnose: %w", err)
}

logrus.Infof("Trident service is in expected state")
return nil, nil

then we'd get at least a little testing for it. it'd be even better to scp the diagnostics to the artifacts folder or even validate the contents of the diagnostics, but i'd be happy with something simple first :)

@bfjelds
Copy link
Member

bfjelds commented Nov 18, 2025

/AzurePipelines run [GITHUB]-trident-pr

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@Britel
Copy link
Collaborator

Britel commented Nov 18, 2025

/AzurePipelines run [GITHUB]-trident-pr

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@bfjelds
Copy link
Member

bfjelds commented Nov 19, 2025

would be nice to have a docs/Explanation/Diagnostics.md file, maybe it references https://microsoft.github.io/trident/docs/How-To-Guides/View-Trident's-Background-Log to explain some of the collected diagnostics?

maybe suggest using trident diagnostics for troubleshooting in these docs:

also add note about diagnostics here: https://microsoft.github.io/trident/docs/Trident/How-Do-I-Interact-With-Trident

@nbojanic nbojanic changed the title feature: Add diagnostics command feature: Add diagnose command Nov 19, 2025
@nbojanic nbojanic force-pushed the user/nbojanic/diagnostics branch from c52d608 to 163a337 Compare November 20, 2025 03:05
@nbojanic
Copy link
Author

@bfjelds added an e2e test and docs. Promoting from draft PR.

@nbojanic nbojanic marked this pull request as ready for review November 20, 2025 03:08
@nbojanic nbojanic requested a review from a team as a code owner November 20, 2025 03:08
Copilot AI review requested due to automatic review settings November 20, 2025 03:08
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds a diagnose command to Trident that generates comprehensive diagnostic bundles for troubleshooting. The command collects system information, logs, metrics, and datastores into a compressed tarball (.tar.zst) that can be shared for support purposes.

Key Changes

  • New trident diagnose CLI command that creates compressed diagnostic bundles containing system info, logs, metrics, and datastore files
  • End-to-end test in Storm framework that validates the diagnostic bundle contents
  • Comprehensive documentation including a new How-To guide and updates to tutorials

Reviewed Changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
crates/trident/src/diagnostics.rs Core diagnostics implementation including report collection, bundle creation, and support for historical logs
crates/trident/src/cli.rs Added Diagnose command to CLI with output path parameter
crates/trident/src/main.rs Integrated diagnose command into main execution flow
crates/trident/src/lib.rs Added public diagnose method and made diagnostics module and TEMPORARY_DATASTORE_PATH public
crates/trident/src/logging/tracestream.rs Made PLATFORM_INFO publicly accessible for diagnostics collection
crates/trident_api/src/error.rs Added DiagnosticBundleGeneration error variant
tools/storm/utils/ssh/client/client.go Added CopyRemoteFileToLocal function for downloading diagnostic bundles in tests
tools/storm/helpers/ab_update.go Added checkDiagnostics test case that validates bundle contents and structure
docs/How-To-Guides/Diagnostics.md Comprehensive guide explaining bundle generation, structure, and use cases
docs/Tutorials/Trident-Hello-World.md Added troubleshooting section with diagnose command example
docs/Tutorials/Performing-an-AB-Update.md Added troubleshooting section with diagnose command example
docs/Reference/Trident-CLI.md Added complete CLI reference for diagnose command
docs/Trident/How-Do-I-Interact-With-Trident.md Added diagnose command to the list of available CLI commands

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 43 to 56
```
trident-diagnostics/
├── report.json # Diagnostic report with metadata
└── logs/
├── trident-full.log # Current Trident execution log
├── trident-metrics.jsonl # Current Trident metrics
├── historical/ # Logs from past servicing
│ ├── trident-<servicing_state>-<timestamp>.log
│ ├── trident-metrics-<servicing_state>-<timestamp>.log
│ └── ...
├── datastore.sqlite # Default datastore
├── datastore-tmp.sqlite # Temporary datastore (if applicable)
└── datastore-configured.sqlite # Configured datastore (if applicable)
```
Copy link

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bundle structure documentation shows datastore files as being inside the logs/ directory, but the implementation (in diagnostics.rs lines 276-292) places them at the root level of the bundle. The structure should be:

trident-diagnostics/
├── report.json
├── datastore.sqlite
├── datastore-tmp.sqlite
├── datastore-configured.sqlite
└── logs/
    ├── trident-full.log
    ├── trident-metrics.jsonl
    └── historical/
        └── ...

Copilot uses AI. Check for mistakes.
) -> Result<PathBuf, Error> {
let mut collected_files = Vec::new();
let file = osutils::files::create_file(output_path)?;
let encoder = zstd::Encoder::new(file, 0)?;
Copy link

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Using compression level 0 for zstd means using the default compression level (which is typically level 3). Consider adding a comment to clarify this is intentional, or explicitly setting a specific level (e.g., 3 for default) for better code clarity. Alternatively, consider using a higher level like 10 for better compression at the cost of slightly slower compression time, since diagnostics bundles are typically created infrequently and smaller file sizes are beneficial for transmission.

Suggested change
let encoder = zstd::Encoder::new(file, 0)?;
// Use compression level 10 for zstd: higher compression for diagnostics bundles, which are created infrequently.
let encoder = zstd::Encoder::new(file, 10)?;

Copilot uses AI. Check for mistakes.
@alejandro-microsoft
Copy link
Contributor

/AzurePipelines run [GITHUB]-trident-pr-e2e

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@alejandro-microsoft
Copy link
Contributor

/AzurePipelines run [GITHUB]-trident-pr

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copilot AI review requested due to automatic review settings November 21, 2025 00:59
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

if let Ok(entries) = fs::read_dir(log_dir) {
for entry in entries.flatten() {
if let Ok(file_name) = entry.file_name().into_string() {
if file_name.starts_with("trident-") && file_name.ends_with(".log") {
Copy link

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file extension filter only captures .log files, but historical metrics files use the .jsonl extension. This means historical metrics files (e.g., trident-metrics-<timestamp>.jsonl) will not be collected. Update the condition to also include .jsonl files: file_name.ends_with(\".log\") || file_name.ends_with(\".jsonl\")

Suggested change
if file_name.starts_with("trident-") && file_name.ends_with(".log") {
if file_name.starts_with("trident-") && (file_name.ends_with(".log") || file_name.ends_with(".jsonl")) {

Copilot uses AI. Check for mistakes.
@frhuelsz frhuelsz marked this pull request as draft November 26, 2025 23:20
@nbojanic nbojanic marked this pull request as ready for review December 16, 2025 19:24
Copilot AI review requested due to automatic review settings December 16, 2025 19:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants