init: work around the mother of all race conditions#30
Merged
Conversation
ebpf-go CI has been plagued by sporadic hangs, where tests simply time out while trying to write status information to stdout. The bug manifests when issuing blocking writes to a virtio console while also polling it. The way we trigger the bug is quite involved: - init opens tha port via os.OpenFile. This sets O_NONBLOCK on the fd, and registers the os.File with the poller. - The port is passed to the child process via exec.Cmd.Stdout. This internally calls os.File.Fd(), which clears O_NONBLOCK but doesn't remove the file from the poller. - The child process receives a blocking stdout. Writing to it will issue a blocking write to the virtio-console port, specifically port_fops_write() in virtio_console.c. - port_fops_write() calls wait_port_writable(). This puts the calling thread to sleep if the virtqueue is full, by waiting on port->waitqueue. We now enter the race window. - The host processes the guest's write, frees up some space in the virtqueue and issues an interrupt to the guest. - This interrupt races with a call to port_fops_poll() issued by the init process's Go runtime. That function invokes will_write_block(), which consumes all used buffers from the virtqueue. - The interrupt handler vring_interrupt() checks whether the virtqueue has any unused buffers via more_used(). Since all buffers have just been consumed by port_fops_poll() the interrupt is dropped. At this point we still have a writer stuck in port_fops_write() waiting for a wakeup that never comes. The workaround for this issue is to close the stdio file in init, thereby removing it from the runtime poller. Fixes: #29
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
ebpf-go CI has been plagued by sporadic hangs, where tests simply time out while trying to write status information to stdout.
The bug manifests when issuing blocking writes to a virtio console while also polling it. The way we trigger the bug is quite involved:
We now enter the race window.
At this point we still have a writer stuck in port_fops_write() waiting for a wakeup that never comes.
The workaround for this issue is to close the stdio file in init, thereby removing it from the runtime poller.
Fixes: #29