-
Notifications
You must be signed in to change notification settings - Fork 67
[inventory] Add unhealthy zpools from each sled #9615
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
illumos-utils/src/zpool.rs
Outdated
| pub struct UnhealthyZpoolsResult { | ||
| pub zpools: Vec<String>, | ||
| pub errors: Vec<String>, | ||
| pub time_of_status: Option<DateTime<Utc>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be optional? If we have a result, it must have been collected at some time, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is meant for when this runs on non-illumos environements
The idea is that there is no time of status because no command was ever run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also there is a point in time where health monitor may not have run yet -> #9589 (comment)
| #[derive(Debug, Clone, PartialEq, Eq, Deserialize, Serialize, JsonSchema)] | ||
| #[serde(rename_all = "snake_case")] | ||
| pub struct UnhealthyZpoolsResult { | ||
| pub zpools: Vec<String>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went back and forth between just having a list of unhealthy zpools, or associating each zpool with it's state. In the end I went with listing the zpools only, but I'm not convinced. We'll be including the information of the health checks in the support bundle, and it'd be useful for them to be able to see what state each zpool is in. Thoughts? @davepacheco @jgallagher
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Associating each zpool with its state sounds good to me; having an explicit entry for "this zpool was healthy" seems safer than inferring "any zpool that isn't explicitly listed as unhealthy must have been healthy".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, I was thinking of only including the "unhealthy" zpools with their associated statuses in this list. Similarly with the svcs_in_maintenance I only added the services in maintenance with their associated zones. If I were to include the "healthy" zpools then it wouldn't really be consistent with the services in maintenance no?
My take on the health checks is to only report on things that are in an unhealthy state. Thoughts?
This is the second PR for #9412
This commit adds a list of unhealthy zpools to the sled agent inventory's health monitor. In a follow-up PR this information will be added to the DB
Successful unhealthy zpool retrieval:
Response contains errors: