-
Notifications
You must be signed in to change notification settings - Fork 67
TQ: Support adding sleds via trust quorum #9650
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This PR introduces two new external APIs to allow adding multiple sleds to a rack
at once and to query status about the ongoing operation. Both are
currently experimental and live under `/v1/trust-quorum`. They need to
be moved under `/system/hardware` like the original `sled-add` command.
They also need to be reworked to not report trust quorum specific
details if it can be avoided. Most of that should be in omdb for
debugging. I may add some of that support to this PR.
This PR also introduces a background task for driving the trust quorum
reconfiguration to completion. Reconfiguration is driven by two steps.
Synchronously updating the DB in the new external endpoint handler and
then asynchronously trying to commit the operation via the background
task.
I tested this on a4x2 and it works as expected. See the trace from the original external API test below:
```
➜ oxide.rs git:(main) ✗ echo '{"rack_id": "0dbef452-a6dd-4831-bbdc-769ea3353f28", "sled_ids": [{"part": "PPP-PPPPPPP","serial": "00000000002"}]}' | target/debug/oxide --profile recovery api /v1/trust-quorum/new-members --method POST --input -
➜ oxide.rs git:(main) ✗ target/debug/oxide --profile recovery api /v1/trust-quorum/config/latest/0dbef452-a6dd-4831-bbdc-769ea3353f28
{
"abort_reason": null,
"commit_crash_tolerance": 1,
"coordinator": {
"part_number": "PPP-PPPPPPP",
"serial_number": "00000000003"
},
"encrypted_rack_secrets": null,
"epoch": 2,
"last_committed_epoch": 1,
"members": {
"PPP-PPPPPPP:00000000000": {
"share_digest": null,
"state": "unacked",
"time_committed": null,
"time_prepared": null
},
"PPP-PPPPPPP:00000000001": {
"share_digest": null,
"state": "unacked",
"time_committed": null,
"time_prepared": null
},
"PPP-PPPPPPP:00000000002": {
"share_digest": null,
"state": "unacked",
"time_committed": null,
"time_prepared": null
},
"PPP-PPPPPPP:00000000003": {
"share_digest": null,
"state": "unacked",
"time_committed": null,
"time_prepared": null
}
},
"rack_id": "0dbef452-a6dd-4831-bbdc-769ea3353f28",
"state": "preparing",
"threshold": 3,
"time_aborted": null,
"time_committed": null,
"time_committing": null,
"time_created": "2026-01-14T21:32:18.780136Z"
}
➜ oxide.rs git:(main) ✗ target/debug/oxide --profile recovery api /v1/trust-quorum/config/latest/0dbef452-a6dd-4831-bbdc-769ea3353f28
{
"abort_reason": null,
"commit_crash_tolerance": 1,
"coordinator": {
"part_number": "PPP-PPPPPPP",
"serial_number": "00000000003"
},
"encrypted_rack_secrets": null,
"epoch": 2,
"last_committed_epoch": 1,
"members": {
"PPP-PPPPPPP:00000000000": {
"share_digest": "fcfb09128c84d82cc81b200c6c682510f63160a4417856f4041b1886445e8b14",
"state": "prepared",
"time_committed": null,
"time_prepared": "2026-01-14T21:32:55.826622Z"
},
"PPP-PPPPPPP:00000000001": {
"share_digest": "d8cad02bd3bccd08109a79e3bf6d8dab0d460a0ba879bf42887dc0fc8d855786",
"state": "prepared",
"time_committed": null,
"time_prepared": "2026-01-14T21:32:55.848235Z"
},
"PPP-PPPPPPP:00000000002": {
"share_digest": "dd57ad8e271734fabfe97d6180d6da3e5c3805e17dacf58e0f2a6d5ed7f1242b",
"state": "prepared",
"time_committed": null,
"time_prepared": "2026-01-14T21:32:55.806644Z"
},
"PPP-PPPPPPP:00000000003": {
"share_digest": "6b27327ca49976ccca83972e6578ef195c99489e62811e8d0a0cb061fca9c0c4",
"state": "prepared",
"time_committed": null,
"time_prepared": "2026-01-14T21:32:55.837154Z"
}
},
"rack_id": "0dbef452-a6dd-4831-bbdc-769ea3353f28",
"state": "preparing",
"threshold": 3,
"time_aborted": null,
"time_committed": null,
"time_committing": null,
"time_created": "2026-01-14T21:32:18.780136Z"
}
➜ oxide.rs git:(main) ✗ target/debug/oxide --profile recovery api /v1/trust-quorum/config/latest/0dbef452-a6dd-4831-bbdc-769ea3353f28
{
"abort_reason": null,
"commit_crash_tolerance": 1,
"coordinator": {
"part_number": "PPP-PPPPPPP",
"serial_number": "00000000003"
},
"encrypted_rack_secrets": {
"data": "53de7731deec3f298a7f5067e256a63bb2869a91c9710d9b23dbf3d261d1b730039d9cb11b543c14906ff77cd409d32953959e9ff8933858",
"salt": "ec609ed5ff7aee94e2e88ad94af56e0cbb8a66a683294005c7888f60a627956a"
},
"epoch": 2,
"last_committed_epoch": 1,
"members": {
"PPP-PPPPPPP:00000000000": {
"share_digest": "fcfb09128c84d82cc81b200c6c682510f63160a4417856f4041b1886445e8b14",
"state": "committed",
"time_committed": "2026-01-14T21:33:03.864617Z",
"time_prepared": "2026-01-14T21:32:55.826622Z"
},
"PPP-PPPPPPP:00000000001": {
"share_digest": "d8cad02bd3bccd08109a79e3bf6d8dab0d460a0ba879bf42887dc0fc8d855786",
"state": "committed",
"time_committed": "2026-01-14T21:33:03.864617Z",
"time_prepared": "2026-01-14T21:32:55.848235Z"
},
"PPP-PPPPPPP:00000000002": {
"share_digest": "dd57ad8e271734fabfe97d6180d6da3e5c3805e17dacf58e0f2a6d5ed7f1242b",
"state": "committed",
"time_committed": "2026-01-14T21:33:03.864617Z",
"time_prepared": "2026-01-14T21:32:55.806644Z"
},
"PPP-PPPPPPP:00000000003": {
"share_digest": "6b27327ca49976ccca83972e6578ef195c99489e62811e8d0a0cb061fca9c0c4",
"state": "committed",
"time_committed": "2026-01-14T21:33:03.864617Z",
"time_prepared": "2026-01-14T21:32:55.837154Z"
}
},
"rack_id": "0dbef452-a6dd-4831-bbdc-769ea3353f28",
"state": "committed",
"threshold": 3,
"time_aborted": null,
"time_committed": "2026-01-14T21:33:04.652543Z",
"time_committing": "2026-01-14T21:32:55.861158Z",
"time_created": "2026-01-14T21:32:18.780136Z"
}
➜ oxide.rs git:(main) ✗
```
04173b6 to
2d8fc67
Compare
| /// testing. | ||
| pub const TRUST_QUORUM_INTEGRATION_ENABLED: bool = false; | ||
| //pub const TRUST_QUORUM_INTEGRATION_ENABLED: bool = false; | ||
| pub const TRUST_QUORUM_INTEGRATION_ENABLED: bool = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reset this before merge.
jgallagher
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Just a bunch of nits and small suggestions
| async fn trust_quorum_get_latest_config( | ||
| rqctx: RequestContext<Self::Context>, | ||
| path_params: Path<params::RackPath>, | ||
| ) -> Result<HttpResponseOk<Option<RackMembershipChange>>, HttpError>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very happy to defer to others with more opinions on the external API, but a few questions based mostly on the update sync this week:
- Will the
trust_quorum_...bits of these names leak out into the OpenAPI spec? (I think so, because we default the operation ID to match the method name?) - I'm a little surprised this is a "get latest" and not "get the result of an operation I started", but maybe I misunderstood? I thought we wanted something like "add sled is async, and returns an identifier that can be used to check the progress of the operation".
- Are all the fields of
RackMembershipChangemeaningful to an operator? (I'm mostly squinting atepoch, but maybe that's closely related to the previous bullet.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, so I left the method names the same, but changed the endpoints. I did not realize that the method names leaked.
I really like the return a token / identifier mechanism in general, and it could possibly work here. However there are two wrinkles:
- The initial request could timeout and the trust quorum reconfiguration could be in progress or even committed by the time the user polled. What would they poll with if they didn't get back the token? I suppose they could ask for a token for the latest configuration, but that brings up the next point.
- There's an inherent TOCTTOU here, where the user can see that their configuration committed but another user could have started a later one. Maybe that's not a problem and the user is only concerned about their own and can always ask for a token back.
While writing this, I think you sold me on the token idea. The user will submit a request and get back the epoch as the token for the configuration. Then they will poll that epoch. We will also provide another api to get the latest epoch.
I think this all takes me to your last question. In this case the epoch is the identifier to know which configuration a user is dealing with. I could change this to version or generation in the API and map that to an Epoch, but I think that will generally make things more confusing for support. I personally just want to call it epoch, and not try to map the same concept to a different word :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed up in 9832d33
|
Thanks for the comprehensive review @jgallagher. I think I fixed up everything. Let me know if you see anything else! |
jgallagher
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes LGTM, just a couple minor style suggestions I didn't notice the first time around.
Happy to approve with a couple caveats:
- Your comment about resetting
TRUST_QUORUM_INTEGRATION_ENABLEDstill needs to be applied - Would strongly prefer an external-API-focused set of eyes to review those bits (maybe @ahl?)
| rqctx: RequestContext<Self::Context>, | ||
| path_params: Path<params::RackPath>, | ||
| req: TypedBody<params::AddSledsRequest>, | ||
| ) -> Result<HttpResponseOk<Epoch>, HttpError>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do the type names leak out into the API doc? If so, maybe RackMembershipEpoch would be more clear? If not, ignore this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No leaking as far as I can tell.
"properties": {
"epoch": {
"description": "The generation / version of the configuration",
"type": "integer",
"format": "uint64",
"minimum": 0
},
|
Tested in a4x2 with all the latest changes. I even was able to catch the bg task doing something in OMDB: All that remains is for someone to take a look over the external API. |
This PR introduces two new external APIs to allow adding multiple sleds to a rack at once and to query status about the ongoing operation. It also adds an omdb command for more detailed status. Much more omdb to come in the near future.
This PR also introduces a background task for driving the trust quorum reconfiguration to completion. Reconfiguration is driven by two steps. Synchronously updating the DB in the new external endpoint handler and then asynchronously trying to commit the operation via the background task.
I tested this on a4x2 and it works as expected. See the trace from the original external API test below: