Skip to content

Fix bug in Semaphore::timed_wait on POSIX #795

@gavv

Description

@gavv

deadline parameter of Semaphore::timed_wait is supposed to be absolute time in CLOCK_MONOTONIC domain. The code that uses it, as well as your example, pass monotonic timestamps to it.

However in roc_core/target_posix_ext/roc_core/semaphore.cpp we pass deadline to sem_timedwait, which expects time in CLOCK_REALTIME domain (i.e. since epoch).

Note that roc_core/target_darwin/roc_core/semaphore.cpp doesn't have this problem.

Suggested fix:

  1. When sem_clockwait system call is available (we need a compile-time check), then use it instead of sem_timedwait with CLOCK_MONOTONIC clock id.

    This is the most efficient approach.

  2. Otherwise, modify wait loop: each iteration, before calling sem_timedwait, compute wallclock deadline based on current time and monotonic deadline.


Another concern with sem_timedwait is what happens if system time changes.

Seems that on recent linux / glibc it's not affected by time changes, i.e. if we asked it to wait until now + 100 seconds, it will sleep for 100 seconds even if time changed to past or future.

Just verified it using this small program:

Click to expand
#include <errno.h>
#include <semaphore.h>
#include <stdint.h>
#include <stdio.h>
#include <time.h>

int64_t timestamp(clockid_t clock) {
    timespec ts;
    clock_gettime(clock, &ts);
    return (int64_t)ts.tv_sec * 1000000000 + (int64_t)ts.tv_nsec;
}

int main() {
    sem_t sem;
    sem_init(&sem, 0, 0);

    int64_t deadline = timestamp(CLOCK_REALTIME) + 100000000000; // 100 seconds

    timespec ts;
    ts.tv_sec = long(deadline / 1000000000);
    ts.tv_nsec = long(deadline % 1000000000);

    int ret = sem_timedwait(&sem, &ts);

    printf("ret = %d\n", ret);
    perror("errno");
}

It will sleep 100 seconds even if I run sudo date --set="2000-01-01 00:00:00" or sudo date --set="2027-01-01 00:00:00" from another terminal.

However it's not necessary to be true on other POSIX platforms. In worst case, sem_timedwait may hang, but if we're more lucky, it may return a error code.

Currently we panic if error is not ETIMEOUT (= return) and EINTR (= repeat loop). I think we should also allow EINVAL and EAGAIN (and repeat the loop), I suspect they are good candidates for this situation, though I haven't found anything in Internet.


Ideally we should also add a few tests for Semaphore::timed_wait.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions