feature: Runtime update auto-rollback #407

ayaegashi · 2025-12-12T00:53:40Z

🔍 Description

Implement auto-rollback for runtime updates. Auto-rollback is triggered when a step in Finalize fails or a health check fails.

📌 Follow-ups

TODO:

Optimize caching extension images: Trident should search for extension images on the OS when rolling back before attempting to re-download. This will be in a follow-up PR.

Copilot

Pull request overview

This PR implements an auto-rollback feature for runtime updates. When a runtime update finalization fails, the system will automatically attempt to roll back to the previous known good configuration.

Key Changes:

Added auto-rollback logic that triggers when runtime update finalization fails
Introduced a rollback boolean parameter to finalize_update function to control whether the operation is performing a rollback
Added internal_state_matches_engine_ctx method to detect when subsystem state needs repopulation (handles both separate finalize calls and auto-rollback scenarios)
Enabled health checks to run for RuntimeUpdate servicing type

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
`crates/trident_api/src/config/host/health.rs`	Added RuntimeUpdate to the servicing types that should execute health checks
`crates/trident/src/subsystems/extensions/utils.rs`	Added state matching function and helper to verify subsystem state consistency with engine context; includes comprehensive test coverage
`crates/trident/src/subsystems/extensions/mod.rs`	Updated condition to use new state matching function for determining when to repopulate subsystem state
`crates/trident/src/lib.rs`	Implemented auto-rollback error handling when runtime update finalization fails
`crates/trident/src/engine/update.rs`	Implemented auto-rollback error handling when runtime update finalization fails (similar pattern to lib.rs)
`crates/trident/src/engine/runtime_update.rs`	Added rollback parameter to finalize_update function, implemented spec swapping logic for rollback, and added health check execution

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

crates/trident/src/lib.rs

Copilot · 2025-12-12T00:57:20Z

crates/trident/src/engine/update.rs

+                    return runtime_update::finalize_update(
+                        &mut subsystems,
+                        state,
+                        true,
+                        Some(update_start_time),
+                        #[cfg(feature = "grpc-dangerous")]
+                        sender,
+                    );


In the auto-rollback logic, if the rollback itself fails, the error from the rollback operation will be returned. However, this loses the original error context from the initial finalize failure. Consider wrapping or chaining the errors so that both the original failure and the rollback failure are captured for better debugging and diagnostics.

crates/trident/src/subsystems/extensions/utils.rs

Copilot · 2025-12-12T00:57:20Z

crates/trident/src/engine/runtime_update.rs

+    // Run health checks if we are performing a runtime update (skip if we are
+    // rolling back)
+    if !rollback {
+        health::execute_health_checks(&ctx)?;
+    }
+
    debug!(
        "Updating host's servicing state to '{:?}'",
        ServicingState::Provisioned


When performing an auto-rollback, the host status state update logic at lines 151-154 may not be correct. After a rollback, setting spec_old to Default::default() and spec to ctx.spec (which contains the old spec during rollback) may leave the system in an inconsistent state. During rollback, we're reverting to the old configuration, so the state transitions and spec/spec_old updates should reflect that we're returning to a previously known good state rather than completing a new update.

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

crates/trident/src/engine/runtime_update.rs

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

crates/trident/src/engine/runtime_update.rs

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

crates/trident/src/engine/runtime_update.rs

crates/trident/src/engine/update.rs

crates/trident/src/lib.rs

…e-rollback-v2

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

crates/trident/src/engine/update.rs

crates/trident/src/engine/runtime_update.rs

…e-rollback-v2

ayaegashi · 2025-12-16T00:19:00Z

/AzurePipelines run [GITHUB]-trident-pr-e2e

azure-pipelines · 2025-12-16T00:19:19Z

Azure Pipelines successfully started running 1 pipeline(s).

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

crates/trident/src/engine/runtime_update.rs

ayaegashi · 2025-12-16T06:33:19Z

/AzurePipelines run [GITHUB]-trident-pr-e2e

azure-pipelines · 2025-12-16T06:33:39Z

Azure Pipelines successfully started running 1 pipeline(s).

ayaegashi · 2025-12-16T07:29:14Z

/AzurePipelines run [GITHUB]-trident-pr-e2e

azure-pipelines · 2025-12-16T07:29:32Z

Azure Pipelines successfully started running 1 pipeline(s).

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-16T07:36:23Z

crates/trident/src/subsystems/extensions/utils.rs

+        .map(|e| (&e.sha384, (&e.path, &e.ext_type)))
+        .collect();
+    // Iterate through all extensions in the Host Configuration and ensure that
+    // each extension images matches a corresponding extension image on SHA384


Grammatical error in comment: "each extension images matches" should be "each extension image matches".

Suggested change

// each extension images matches a corresponding extension image on SHA384

// each extension image matches a corresponding extension image on SHA384

Copilot · 2025-12-16T07:36:24Z

crates/trident/src/lib.rs

+                                            false,
+                                            true,


The auto-rollback call has incorrect parameters. When performing a rollback, reverse_specs should be true (not false) to reverse the spec and spec_old in the EngineContext. Additionally, run_health_checks should be false during rollback to avoid triggering another rollback if health checks fail. Compare with the correct implementation in update.rs lines 159-165 which uses reverse_specs: true and run_health_checks: false.

Suggested change

false,

true,

true, // reverse_specs: true for rollback

false, // run_health_checks: false for rollback

bfjelds · 2025-12-16T16:26:55Z

crates/trident/src/engine/update.rs

+                .message("Failed to finalize runtime update");
+                if let Err(e) = finalize_result {
+                    error!("Runtime update finalize failed with message: {e:?}");
+                    // Attempt an auto-rollback


expand this comment a little to describe how calling finalize_update a second time is attempting an auto-rollback

bfjelds · 2025-12-16T16:31:31Z

crates/trident/src/lib.rs

                        if allowed_operations.has_finalize() {
                            let mut subsystems = SUBSYSTEMS.lock().unwrap();
-                            runtime_update::finalize_update(
+                            let finalize_result = runtime_update::finalize_update(


seeing this chunk of double-call-runtime_update::finalize_update (here and in update.rs) makes me think it should be refactored.

maybe finalize_update can accept ServicingType instead of the 2 bools ...

what is currently finalize_udpate can become inner_finalize or do_finalize, and finalize_udpate can call inner_finalize passing spec/old-spec as-is ... if inner_finalize fails health checks, then if ServicingType!=ManualRollback, inner_finalize can call itself with spec/old-spec reversed (to invoke rollback).

auto-rollback

c66d50b

ayaegashi requested a review from a team as a code owner December 12, 2025 00:53

Copilot AI review requested due to automatic review settings December 12, 2025 00:53

Copilot started reviewing on behalf of ayaegashi December 12, 2025 00:54 View session

Copilot AI reviewed Dec 12, 2025

View reviewed changes

ayaegashi added 2 commits December 12, 2025 01:09

make two bools

290a1ed

reduce time complexity

4040004

Copilot AI review requested due to automatic review settings December 12, 2025 01:17

Copilot started reviewing on behalf of ayaegashi December 12, 2025 01:17 View session

reverse bug

42d56f3

Copilot AI reviewed Dec 12, 2025

View reviewed changes

crates/trident/src/engine/runtime_update.rs Outdated Show resolved Hide resolved

crates/trident/src/engine/runtime_update.rs Outdated Show resolved Hide resolved

ayaegashi added 2 commits December 12, 2025 01:27

fix osconfig bug

c7cd66a

fix

2c9eb95

Copilot AI review requested due to automatic review settings December 12, 2025 02:32

Copilot started reviewing on behalf of ayaegashi December 12, 2025 02:32 View session

remove changes to osconfig

3124ca3

Copilot AI reviewed Dec 12, 2025

View reviewed changes

crates/trident/src/engine/runtime_update.rs Outdated Show resolved Hide resolved

crates/trident/src/engine/runtime_update.rs Outdated Show resolved Hide resolved

ayaegashi added 2 commits December 12, 2025 17:55

update logs

6da5136

fixes

8e96de6

Copilot AI review requested due to automatic review settings December 13, 2025 03:19

Copilot started reviewing on behalf of ayaegashi December 13, 2025 03:20 View session

Copilot AI reviewed Dec 13, 2025

View reviewed changes

crates/trident/src/engine/runtime_update.rs Outdated Show resolved Hide resolved

crates/trident/src/engine/update.rs Show resolved Hide resolved

crates/trident/src/lib.rs Show resolved Hide resolved

ayaegashi added 3 commits December 15, 2025 18:51

update comments

1560665

Merge remote-tracking branch 'origin/main' into user/ayaegashi/runtim…

20b66d1

…e-rollback-v2

show runtime error if autorollback fails

73522bf

Copilot AI review requested due to automatic review settings December 15, 2025 20:23

Copilot AI reviewed Dec 15, 2025

View reviewed changes

crates/trident/src/engine/update.rs Show resolved Hide resolved

crates/trident/src/engine/runtime_update.rs Show resolved Hide resolved

take away auto-rollback

eaa5538

ayaegashi changed the title ~~feature: Runtime update auto-rollback~~ feature: Runtime update rollback Dec 16, 2025

Merge remote-tracking branch 'origin/main' into user/ayaegashi/runtim…

c51146b

…e-rollback-v2

Copilot AI review requested due to automatic review settings December 16, 2025 00:16

Copilot started reviewing on behalf of ayaegashi December 16, 2025 00:16 View session

fix comment

6945ffe

Copilot AI reviewed Dec 16, 2025

View reviewed changes

crates/trident/src/engine/runtime_update.rs Show resolved Hide resolved

crates/trident/src/engine/runtime_update.rs Show resolved Hide resolved

auto-rollback works

82e7dad

ayaegashi changed the title ~~feature: Runtime update rollback~~ feature: Runtime update auto-rollback Dec 16, 2025

fix bugs, match on hash AND path

d306479

Copilot AI review requested due to automatic review settings December 16, 2025 07:29

Copilot started reviewing on behalf of ayaegashi December 16, 2025 07:31 View session

Copilot AI reviewed Dec 16, 2025

View reviewed changes

bfjelds reviewed Dec 16, 2025

View reviewed changes

	// each extension images matches a corresponding extension image on SHA384
	// each extension image matches a corresponding extension image on SHA384

-                                            false,
-                                            true,
+                                            true,  // reverse_specs: true for rollback
+                                            false, // run_health_checks: false for rollback

feature: Runtime update auto-rollback #407

Are you sure you want to change the base?

feature: Runtime update auto-rollback #407

Uh oh!

Conversation

ayaegashi commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔍 Description

📌 Follow-ups

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

ayaegashi commented Dec 16, 2025

Uh oh!

azure-pipelines bot commented Dec 16, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

ayaegashi commented Dec 16, 2025

Uh oh!

azure-pipelines bot commented Dec 16, 2025

Uh oh!

ayaegashi commented Dec 16, 2025

Uh oh!

azure-pipelines bot commented Dec 16, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

bfjelds Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

bfjelds Dec 16, 2025

ayaegashi commented Dec 12, 2025 •

edited

Loading