From 939dd713efce9147295b19f27a6465d504fabad1 Mon Sep 17 00:00:00 2001 From: Lorenz Bauer Date: Fri, 5 Dec 2025 16:31:26 +0000 Subject: [PATCH] init: work around the mother of all race conditions ebpf-go CI has been plagued by sporadic hangs, where tests simply time out while trying to write status information to stdout. The bug manifests when issuing blocking writes to a virtio console while also polling it. The way we trigger the bug is quite involved: - init opens tha port via os.OpenFile. This sets O_NONBLOCK on the fd, and registers the os.File with the poller. - The port is passed to the child process via exec.Cmd.Stdout. This internally calls os.File.Fd(), which clears O_NONBLOCK but doesn't remove the file from the poller. - The child process receives a blocking stdout. Writing to it will issue a blocking write to the virtio-console port, specifically port_fops_write() in virtio_console.c. - port_fops_write() calls wait_port_writable(). This puts the calling thread to sleep if the virtqueue is full, by waiting on port->waitqueue. We now enter the race window. - The host processes the guest's write, frees up some space in the virtqueue and issues an interrupt to the guest. - This interrupt races with a call to port_fops_poll() issued by the init process's Go runtime. That function invokes will_write_block(), which consumes all used buffers from the virtqueue. - The interrupt handler vring_interrupt() checks whether the virtqueue has any unused buffers via more_used(). Since all buffers have just been consumed by port_fops_poll() the interrupt is dropped. At this point we still have a writer stuck in port_fops_write() waiting for a wakeup that never comes. The workaround for this issue is to close the stdio file in init, thereby removing it from the runtime poller. Fixes: https://github.com/lmb/vimto/issues/29 --- init.go | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/init.go b/init.go index b626e5e..a79e011 100644 --- a/init.go +++ b/init.go @@ -307,7 +307,19 @@ func minimalInit(sys syscaller, args []string) error { fmt.Fprintf(stdio, "%s is not readable, execution might fail with %q\n", cmd.Path, err) } - result := proc.Run() + result := proc.Start() + if result == nil { + // Work around a bug in virtio-console which doesn't deal well with concurrent + // access to a single serial port. + // + // It is crucial that we don't hold on to stdio, otherwise the runtime will + // opportunistically poll it, which causes writes to hang. + // + // See https://github.com/lmb/vimto/issues/29. + _ = stdio.Close() + + result = proc.Wait() + } if err := executeSimpleCommands(cmd.Teardown, cmd.Dir, cmd.Env); err != nil { return fmt.Errorf("teardown: %w", err)