-
Notifications
You must be signed in to change notification settings - Fork 15
feature: Add diagnose command #336
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@microsoft-github-policy-service agree company="Microsoft" |
|
/AzurePipelines run [GITHUB]-trident-pr |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/AzurePipelines run [GITHUB]-trident-pr |
|
Commenter does not have sufficient privileges for PR 336 in repo microsoft/trident |
|
/AzurePipelines run [GITHUB]-trident-pr-e2e |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/AzurePipelines run [GITHUB]-trident-pr |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
i think it'd be great to invoke trident/tools/storm/helpers/ab_update.go Line 395 in ce9da55
something like: out, err := utils.InvokeTrident(h.args.Env, client, h.args.EnvVars, "diagnose")
if err != nil {
return fmt.Errorf("failed to invoke Trident diagnose: %w", err)
}
logrus.Infof("Trident service is in expected state")
return nil, nilthen we'd get at least a little testing for it. it'd be even better to scp the diagnostics to the artifacts folder or even validate the contents of the diagnostics, but i'd be happy with something simple first :) |
|
/AzurePipelines run [GITHUB]-trident-pr |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/AzurePipelines run [GITHUB]-trident-pr |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
would be nice to have a docs/Explanation/Diagnostics.md file, maybe it references https://microsoft.github.io/trident/docs/How-To-Guides/View-Trident's-Background-Log to explain some of the collected diagnostics? maybe suggest using
also add note about diagnostics here: https://microsoft.github.io/trident/docs/Trident/How-Do-I-Interact-With-Trident |
c52d608 to
163a337
Compare
|
@bfjelds added an e2e test and docs. Promoting from draft PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds a diagnose command to Trident that generates comprehensive diagnostic bundles for troubleshooting. The command collects system information, logs, metrics, and datastores into a compressed tarball (.tar.zst) that can be shared for support purposes.
Key Changes
- New
trident diagnoseCLI command that creates compressed diagnostic bundles containing system info, logs, metrics, and datastore files - End-to-end test in Storm framework that validates the diagnostic bundle contents
- Comprehensive documentation including a new How-To guide and updates to tutorials
Reviewed Changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| crates/trident/src/diagnostics.rs | Core diagnostics implementation including report collection, bundle creation, and support for historical logs |
| crates/trident/src/cli.rs | Added Diagnose command to CLI with output path parameter |
| crates/trident/src/main.rs | Integrated diagnose command into main execution flow |
| crates/trident/src/lib.rs | Added public diagnose method and made diagnostics module and TEMPORARY_DATASTORE_PATH public |
| crates/trident/src/logging/tracestream.rs | Made PLATFORM_INFO publicly accessible for diagnostics collection |
| crates/trident_api/src/error.rs | Added DiagnosticBundleGeneration error variant |
| tools/storm/utils/ssh/client/client.go | Added CopyRemoteFileToLocal function for downloading diagnostic bundles in tests |
| tools/storm/helpers/ab_update.go | Added checkDiagnostics test case that validates bundle contents and structure |
| docs/How-To-Guides/Diagnostics.md | Comprehensive guide explaining bundle generation, structure, and use cases |
| docs/Tutorials/Trident-Hello-World.md | Added troubleshooting section with diagnose command example |
| docs/Tutorials/Performing-an-AB-Update.md | Added troubleshooting section with diagnose command example |
| docs/Reference/Trident-CLI.md | Added complete CLI reference for diagnose command |
| docs/Trident/How-Do-I-Interact-With-Trident.md | Added diagnose command to the list of available CLI commands |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
docs/How-To-Guides/Diagnostics.md
Outdated
| ``` | ||
| trident-diagnostics/ | ||
| ├── report.json # Diagnostic report with metadata | ||
| └── logs/ | ||
| ├── trident-full.log # Current Trident execution log | ||
| ├── trident-metrics.jsonl # Current Trident metrics | ||
| ├── historical/ # Logs from past servicing | ||
| │ ├── trident-<servicing_state>-<timestamp>.log | ||
| │ ├── trident-metrics-<servicing_state>-<timestamp>.log | ||
| │ └── ... | ||
| ├── datastore.sqlite # Default datastore | ||
| ├── datastore-tmp.sqlite # Temporary datastore (if applicable) | ||
| └── datastore-configured.sqlite # Configured datastore (if applicable) | ||
| ``` |
Copilot
AI
Nov 20, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bundle structure documentation shows datastore files as being inside the logs/ directory, but the implementation (in diagnostics.rs lines 276-292) places them at the root level of the bundle. The structure should be:
trident-diagnostics/
├── report.json
├── datastore.sqlite
├── datastore-tmp.sqlite
├── datastore-configured.sqlite
└── logs/
├── trident-full.log
├── trident-metrics.jsonl
└── historical/
└── ...
| ) -> Result<PathBuf, Error> { | ||
| let mut collected_files = Vec::new(); | ||
| let file = osutils::files::create_file(output_path)?; | ||
| let encoder = zstd::Encoder::new(file, 0)?; |
Copilot
AI
Nov 20, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Using compression level 0 for zstd means using the default compression level (which is typically level 3). Consider adding a comment to clarify this is intentional, or explicitly setting a specific level (e.g., 3 for default) for better code clarity. Alternatively, consider using a higher level like 10 for better compression at the cost of slightly slower compression time, since diagnostics bundles are typically created infrequently and smaller file sizes are beneficial for transmission.
| let encoder = zstd::Encoder::new(file, 0)?; | |
| // Use compression level 10 for zstd: higher compression for diagnostics bundles, which are created infrequently. | |
| let encoder = zstd::Encoder::new(file, 10)?; |
|
/AzurePipelines run [GITHUB]-trident-pr-e2e |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/AzurePipelines run [GITHUB]-trident-pr |
|
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 13 out of 13 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
crates/trident/src/diagnostics.rs
Outdated
| if let Ok(entries) = fs::read_dir(log_dir) { | ||
| for entry in entries.flatten() { | ||
| if let Ok(file_name) = entry.file_name().into_string() { | ||
| if file_name.starts_with("trident-") && file_name.ends_with(".log") { |
Copilot
AI
Nov 21, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The file extension filter only captures .log files, but historical metrics files use the .jsonl extension. This means historical metrics files (e.g., trident-metrics-<timestamp>.jsonl) will not be collected. Update the condition to also include .jsonl files: file_name.ends_with(\".log\") || file_name.ends_with(\".jsonl\")
| if file_name.starts_with("trident-") && file_name.ends_with(".log") { | |
| if file_name.starts_with("trident-") && (file_name.ends_with(".log") || file_name.ends_with(".jsonl")) { |
🔍 Description
This PR adds a diagnose subcommand to Trident (#285).
The command creates a tarball containing:
🤔 Rationale
This PR aims to simplify troubleshooting by providing one command that aggregates relevant debugging info into a single tarball.
📝 Checks
📌 Follow-ups
TODO (before promoting from draft):
🗒️ Notes