Improve Docker downgrade robustness and version reporting#72
Improve Docker downgrade robustness and version reporting#72dragonfire1119 merged 2 commits intomasterfrom
Conversation
Enhances the downgrade_docker function to retry apt-get installs, verify package installation after errors, and provide more detailed status and version reporting for containerd. Also updates package hold/unhold logic to include docker-ce-rootless-extras and improves logging in both run.sh and test-script.sh for better diagnostics and validation.
Summary by CodeRabbitRelease Notes
✏️ Tip: You can customize this high-level summary in your review settings. WalkthroughUpdated scripts to bump the version to 2025.12.1 and overhaul Docker downgrade/install flows: added multi-attempt install retries, trap-based policy restoration, expanded docker-ce-rootless-extras handling, enhanced containerd management and verification, and richer step-labeled diagnostics. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant InstallScript as run.sh
participant APT as apt/dpkg
participant System as systemd/processes
participant Docker as dockerd
participant Containerd as containerd
User->>InstallScript: invoke install/downgrade
InstallScript->>APT: unhold/remove packages (includes docker-ce-rootless-extras)
InstallScript->>APT: attempt install (loop: max_install_attempts)
alt install attempt fails
APT-->>InstallScript: failure + diagnostics
InstallScript->>InstallScript: record attempt, log diagnostics
InstallScript->>APT: retry (if attempts remain)
else install succeeds
APT-->>InstallScript: installed docker-ce, containerd.io
InstallScript->>System: restore policy-rc.d (trap ensures restoration)
InstallScript->>System: restart/reload daemon
System->>Containerd: stop lingering processes / start containerd
Containerd-->>InstallScript: readiness (wait & verify version)
InstallScript->>Docker: start/reload dockerd
Docker-->>InstallScript: readiness (verify docker version)
InstallScript->>User: final report (docker + containerd versions)
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro Cache: Disabled due to Reviews > Disable Cache setting Knowledge base: Disabled due to data retention organization setting 📒 Files selected for processing (2)
🧰 Additional context used🧬 Code graph analysis (1)casaos-fix-docker-api-version/run.sh (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
🔇 Additional comments (10)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
casaos-fix-docker-api-version/run.sh (1)
1329-1350: Containerd force-kill and restart logic is thorough but has a subtle race condition.Lines 1336–1340 use
pkill -9 containerdto forcefully terminate lingering processes before restarting. However, there's a potential race:
- Line 1332:
systemctl stop containerdis issued- Line 1336-1340: Check for lingering processes and force-kill them
- Line 1350:
systemctl start containerdis issuedIf systemctl is still in the process of stopping the service (Step 1), the force-kill (Step 2) may interfere. A safer approach:
# After systemctl stop, wait a bit longer and ensure processes are dead for i in {1..5}; do if ! pgrep -x containerd >/dev/null 2>&1; then break fi sleep 1 done # Then force-kill if still running if pgrep -x containerd >/dev/null 2>&1; then pkill -9 containerd sleep 2 fiCurrently, the code checks for processes immediately after the 2-second sleep, which may be adequate. Adding a small loop between the sleep and force-kill would be more robust:
sleep 2 + + # Wait for process to exit gracefully + for i in {1..3}; do + if ! pgrep -x containerd >/dev/null 2>&1; then + break + fi + sleep 1 + done # Kill any lingering containerd processes to ensure old binary is not running if pgrep -x containerd >/dev/null 2>&1; thenThis is a minor refinement; the current code will work but is not belt-and-suspenders.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to Reviews > Disable Cache setting
Knowledge base: Disabled due to data retention organization setting
📒 Files selected for processing (2)
casaos-fix-docker-api-version/run.sh(9 hunks)casaos-fix-docker-api-version/test-script.sh(7 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
casaos-fix-docker-api-version/run.sh (1)
casaos-fix-docker-api-version/test-script.sh (2)
apply_docker_api_override(197-264)remove_docker_api_override(267-297)
🪛 Shellcheck (0.11.0)
casaos-fix-docker-api-version/test-script.sh
[warning] 100-100: Declare and assign separately to avoid masking return values.
(SC2155)
[warning] 101-101: Declare and assign separately to avoid masking return values.
(SC2155)
[warning] 428-428: Declare and assign separately to avoid masking return values.
(SC2155)
[warning] 429-429: Declare and assign separately to avoid masking return values.
(SC2155)
casaos-fix-docker-api-version/run.sh
[warning] 1371-1371: Declare and assign separately to avoid masking return values.
(SC2155)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: test-fix-script
- GitHub Check: test-fix-script
🔇 Additional comments (12)
casaos-fix-docker-api-version/test-script.sh (2)
310-310: Good:docker-ce-rootless-extrasnow consistently included in all package operations.The test script now mirrors the main script's expanded package management. This ensures all Docker-related packages are handled uniformly across upgrade, installation, and removal operations. The consistency is important for reproducible test scenarios.
Also applies to: 344-345, 573-573, 618-619
472-483: Containerd version checks are thorough but rely on exact version matching.Lines 472–483 validate that containerd package and binary versions match the expected 1.7.28. The validation uses
grep -qto check for version strings, which is reasonable, but be aware:
- If the
awkcommand on line 101 or 429 returns an empty string (e.g., ifcontainerd --versionfails or is not installed), the fallback to${containerd_bin:-unknown}provides safe handling.- The checks use substring matching (
grep -q "^1.7.28"), which is appropriate for this use case.casaos-fix-docker-api-version/run.sh (10)
1198-1200: Excellent: Trap mechanism ensures service auto-start policy is always restored.Line 1199 uses
trap "allow_service_autostart" RETURNto guarantee cleanup even if the function exits early due to errors. This is a robust pattern that prevents leaving the system in an inconsistent state (policy-rc.d preventing service restarts).
1205-1250: Retry logic with per-attempt package verification is well-designed.The new installation loop (lines 1205–1250) handles non-critical apt-get failures gracefully:
- Attempts installation up to 2 times (configurable via
max_install_attempts)- After each attempt, explicitly verifies if the required packages are installed with correct versions (lines 1220–1243)
- Treats successful package installation as success even if apt-get returns an error (common when post-install scripts have non-fatal warnings)
- Provides clear messaging about what was attempted and why
This is a significant robustness improvement.
1252-1275: Potential issue: PIPESTATUS check may not reflect apt-get exit code if tee was used.At line 1254, the code checks
${PIPESTATUS[0]}to detect timeout (exit code 124). However, line 1212 usestee /tmp/docker-install.login the apt-get pipeline, which resetsPIPESTATUSto tee's exit code (usually 0 if file write succeeds), not apt-get's.To reliably detect timeouts:
- if timeout 600 $SUDO apt-get install ... 2>&1 | tee /tmp/docker-install.log; then + if timeout 600 $SUDO apt-get install ... 2>&1 | tee /tmp/docker-install.log; then + exit_code=$? + else + exit_code=$? + fi + if [ $exit_code -eq 124 ]; thenOr, simpler:
local apt_exit_code=0 if timeout 600 $SUDO apt-get install ... 2>&1 | tee /tmp/docker-install.log; then apt_exit_code=$? ... else apt_exit_code=$? fiHowever, since the code checks
apt_successvia package verification (lines 1220–1243) before checkingPIPESTATUS, the practical impact is low—the verification step will catch if packages weren't installed. The timeout check at line 1254 only triggers if neither package got installed, which is correct defensive coding, but thePIPESTATUScheck may not work as intended.Consider capturing the exit code explicitly or relying solely on the package verification checks (lines 1220–1243), which are more robust.
1355-1373: Containerd polling loop with 15 retries is robust but wait time is short (15 seconds total).Lines 1355–1373 poll for containerd readiness with a 1-second interval for up to 15 attempts (15 seconds total). This is a good pattern—active polling is preferable to fixed sleeps. However:
- 15 seconds may be tight on slow systems or in containers with resource contention
- The loop uses
systemctl is-active --quietas the only readiness check, which is sufficient- If containerd fails to start, the warning at line 1366 allows the script to continue, which is acceptable since Docker will also be started and will detect containerd issues
No changes needed, but be aware that the timeout is conservative. If you see flaky failures in slow environments, increase the loop limit or interval.
2163-2173: Final containerd version reporting is valuable for diagnostics, but has SC2155 issues.Lines 2163–2173 add final reporting of containerd package and binary versions at script completion:
local containerd_pkg_version containerd_pkg_version=$(dpkg -l containerd.io 2>/dev/null | awk 'NR>5 {print $3; exit}') local containerd_bin_version containerd_bin_version=$(timeout 5 containerd --version 2>/dev/null | head -n1)Good: The declarations are separated from assignments (fixing the SC2155 issue correctly). The reporting provides clear diagnostics.
However, there's a subtle inconsistency: Lines 2164 and 2166 declare variables correctly separated, but in the context of the script, it's cleaner to ensure all variable declarations follow the same pattern consistently throughout.
1113-1113: apt-mark unhold command expanded to include docker-ce-rootless-extras.Line 1113 now includes
docker-ce-rootless-extrasin the apt-mark unhold command. This is consistent with the hold/unhold pattern throughout the script and ensures all Docker-related packages are managed uniformly.
1127-1127: apt-get remove now includes docker-ce-rootless-extras for complete cleanup.Line 1127 removes all Docker packages including
docker-ce-rootless-extras. This is important for a clean downgrade—leaving behind the rootless-extras package could cause confusion or conflicts.Potential edge case: If
docker-ce-rootless-extrasis not installed on older systems,apt-get removewill still succeed (apt-get is idempotent for remove operations). No issue here.
1291-1291: apt-mark hold now includes docker-ce-rootless-extras.Line 1291 holds all Docker packages including
docker-ce-rootless-extrasto prevent automatic upgrades. Consistent with the unhold operations.
62-62: Version bumps are consistent across all banner/usage locations.All four locations (lines 62, 1714, 1722, 1730) correctly update the version string from 2025.12.0 to 2025.12.1. Consistency check passed.
Also applies to: 1714-1714, 1722-1722, 1730-1730
1413-1417: Docker status and journal logs are appended to /tmp/docker-install.log for troubleshooting.Lines 1413–1417 append Docker status and journal logs to
/tmp/docker-install.log(usingtee -a), which complements the earlier apt-get output logging at line 1212. This is excellent for diagnostics when installation fails.
Refactored variable assignments in run.sh and test-script.sh to use separate declaration and assignment for better readability and shell compatibility. Updated the containerd version check in test-script.sh to accept both '1.7.28' and 'v1.7.28' formats.
This pull request updates both the main fix script and its test script for CasaOS Docker version management, focusing on improving reliability and visibility of Docker and containerd installation and status. The main enhancements include more robust package installation logic, expanded handling of the
docker-ce-rootless-extraspackage, improved process management for containerd, and better reporting of package and binary versions for debugging.Reliability and Installation Improvements
apt-getreturns warnings; now also traps to always restore service auto-start policy.apt-mark hold/unhold,apt-get remove/install) to includedocker-ce-rootless-extrasfor consistency and completeness. [1] [2] [3] [4] [5] [6] [7]Containerd Process and Version Handling
containerdandcontainerd-shimprocesses, waits longer for readiness, and verifies the running binary version after restart.Debugging and Status Reporting
/tmp/docker-install.logfor easier troubleshooting.User Messaging and Version Bump
2025.12.0to2025.12.1in all user-facing messages. [1] [2]These changes together make the Docker downgrade and upgrade processes more robust, transparent, and easier to debug, especially regarding containerd management and version accuracy.