-
Notifications
You must be signed in to change notification settings - Fork 31
Description
in this test:
lspci_lifecycle_test_lifecycle_0.serial.log is the serial output from the destination VM of the lifecycle test. the source VM is in lspci_lifecycle_test.serial.log. the source VM looks basically reasonable. the progress bar renders a bit funky because ANSI codes evaporated, and it seems like the test is wrong but in a way that should pass:
localhost login: root
Welcome to Alpine!
The Alpine Wiki contains a large amount of how-to guides and general
information about administrating Alpine systems.
See <http://wiki.alpinelinux.org/>.
You can setup the system with the command: setup-alpine
You may change this message by editing /etc/motd.
localhost:~# stty -F `tty` cols 9999
localhost:~# sudo lspci -vvx
-ash: sudo: not found
localhost:~# sudo lshw -notime
-ash: sudo: not found
lspci and lshw are not doing what we wanted. that's not the problem here (#792 for the "normal" case of this test). the problem here is that in the destination VM, the serial output is:
The highlighted entry will be executed automatically in 1s. The highlighted entry will be executed automatically in 0s. Booting `Linux virt'
[ 28.191936] watchdog: BUG: soft lockup - CPU#0 stuck for 26s! [kworker/0:1:15]
[ 56.191890] watchdog: BUG: soft lockup - CPU#0 stuck for 52s! [kworker/0:1:15]
[ 96.191826] watchdog: BUG: soft lockup - CPU#0 stuck for 89s! [kworker/0:1:15]
[ 124.191780] watchdog: BUG: soft lockup - CPU#0 stuck for 115s! [kworker/0:1:15]
[ 152.191735] watchdog: BUG: soft lockup - CPU#0 stuck for 141s! [kworker/0:1:15]
[ 180.191689] watchdog: BUG: soft lockup - CPU#0 stuck for 167s! [kworker/0:1:15]
[ 208.191644] watchdog: BUG: soft lockup - CPU#0 stuck for 193s! [kworker/0:1:15]
[ 236.191598] watchdog: BUG: soft lockup - CPU#0 stuck for 219s! [kworker/0:1:15]
[ 284.191520] watchdog: BUG: soft lockup - CPU#0 stuck for 264s! [kworker/0:1:15]
the guest in fact did not boot. the phd-runner logs around here confirm that we waited five minutes for the guest to say nothing here: https://buildomat.eng.oxide.computer/wg/0/artefact/01KGQRWSKSJKNN9QXY8PPY27EG/lo4zfMXju35PmPUhejEF21RN5YPMZAMVs5qcXgIyXfJs0hEJ/01KGQRXWP544KACC9ZPBCCJQ84/01KGQTREWP1ZF5QAFJDN7PFY4M/phd-runner.log?format=x-bunyan#L1288
2026-02-05T21:09:35.437Z INFO phd-runner: [WAIT_TO_BOOT - EVENT] waiting for guest to boot
...
2026-02-05T21:14:36.166Z INFO phd-runner: test phd_tests::hw::lspci_lifecycle_test ... FAILED: timed out while waiting to boot
this also was clearly a flake. the commit for the failing run is this, where its parent here was fine. this also does not readily reproduce on, say, my workstation. so we had an issue at some point, and don't really have much to go on other than it should never happen.
it would have been really useful to know where the vCPU 0 thread is at, which motivated my remark to Eliza that turned into #1034.