Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Protect against real time skips #412

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

milloni
Copy link

@milloni milloni commented Jan 17, 2019

POSIX _timedwait API is affected by changes to system time (that is, a sysadmin/service has changed the datetime on the system). This pull request attempts to mitigate that by additionally measuring elapsed time with CLOCK_MONOTONIC, and restarting the _timedwait call if a time skip had occurred.

I've run into this problem in a client's in-vehicle infotainment system. Note there's also an effort to create a GNU extension API that will function in a similar way as _timedwait, but will let the user specify the clock [1]. By choosing CLOCK_MONOTONIC, the user can avoid this issue.

[1] http://sourceware-org.1504.n7.nabble.com/PATCH-Semaphores-add-sem-timedwait-monotonic-GNU-extension-BZ-14717-td550206.html#a550467

It isn't used anywhere and therefore I have no way or need to test
subsequent changes in it.

Signed-off-by: Maciej Wolny <[email protected]>
@falkTX
Copy link
Member

falkTX commented Jan 18, 2019

PR looks good overall.
I thought there was a non-portable call to setup a semaphore clock..
(I mean, why did you not use it directly?)

also note that posix-semaphore c++ code is not used under linux, that is why CI passes even though the code itself should not build.

@falkTX
Copy link
Member

falkTX commented Jan 18, 2019

the asserts against clocktime return value seem dangerous to me.
on optimized builds they are ignored and they just crash on all others cases.
better to handle these errors.

@milloni
Copy link
Author

milloni commented Jan 21, 2019

the asserts against clocktime return value seem dangerous to me.
on optimized builds they are ignored and they just crash on all others cases.
better to handle these errors.

Ah, yes - I was planning to do something about this, and then forgot. The only ways clock_gettime() can fail are 1) programmer error, 2) requested clock not supported. It makes sense to handle 1) with an assertion since these should never happen in the program. For 2), my plan was to check if CLOCK_MONOTONIC is available once, and then, if it's not, forgo this check-for-skips dance altogether. I'll do that in the next revision of this PR.

@milloni
Copy link
Author

milloni commented Jan 21, 2019

I thought there was a non-portable call to setup a semaphore clock..
(I mean, why did you not use it directly?)

Are you talking about http://sourceware-org.1504.n7.nabble.com/PATCH-Semaphores-add-sem-timedwait-monotonic-GNU-extension-BZ-14717-td550206.html#a550467 ?
This is a proposed patch to glibc but it's not merged or even ready yet.

also note that posix-semaphore c++ code is not used under linux, that is why CI passes even though the code itself should not build.

Are you sure? On my Ubuntu 16.04 LTS:

$ ./waf configure
<snip>
$ ./waf build
<snip>
[ 29/215] cxx: posix/JackPosixSemaphore.cpp -> build/posix/JackPosixSemaphore.cpp.1.o
[ 30/215] cxx: posix/JackPosixProcessSync.cpp -> build/posix/JackPosixProcessSync.cpp.1.
<snip>

EDIT: Oh, sorry, that was 1.9.9.5 - you're right

EDIT: So what's used instead now?

Maciej Wolny added 2 commits January 21, 2019 15:47
POSIX functions with names ending with _timedwait use the real time
clock and as such are sensitive to changes in system time during the
call. This is described in: [1].

This patch implements a wrapper for sem_timedwait() that, if the
function call fails, will check if the failure is due to a time warp,
and if so, will restart the call.

Based on idea and implementation by Ben Hutchings <[email protected]>

[1] https://sourceware.org/ml/libc-alpha/2018-12/msg00512.html

v2: Check if CLOCK_MONOTONIC is available.

Signed-off-by: Maciej Wolny <[email protected]>
POSIX functions with names ending with _timedwait use the real time
clock and as such as sensitive to changes in system time during the
call. This is described in: [1].

This patch implements a wrapper for pthread_cond_timedwait() that, if
the function call fails, will check if the failure is due to a time warp
and if so, will restart the call.

Based on idea and implementation by Ben Hutchings <[email protected]>

[1] https://sourceware.org/ml/libc-alpha/2018-12/msg00512.html

v2: Check if CLOCK_MONOTONIC is available.

Signed-off-by: Maciej Wolny <[email protected]>
@milloni
Copy link
Author

milloni commented Jan 21, 2019

Ok, I pushed the v2 revision. I think the CI is broken

@milloni
Copy link
Author

milloni commented Jan 21, 2019

I tested this on a fork of 1.9.9.5. I tested legacy paths (i.e CLOCK_MONOTONIC not supported) by changing if(res != 0) to if(true) where necessary. I don't have a way of testing on MacOS/other Unix and I'm not really sure how I would go about testing against current master (it probably won't work on the system that I'm testing on).

@7890
Copy link
Contributor

7890 commented Jan 21, 2019

The travis build fails because:

E: Unable to locate package gcc-6
E: Unable to locate package g++-6
E: Couldn't find any package by regex 'g++-6'
The command "./.ci/install-deps.sh" failed and exited with 100 during .
Your build has been stopped.

This is something unrelated to this PR.

@milloni
Copy link
Author

milloni commented Oct 10, 2019

Hey I just remembered this. Any chance you could accept it?

@falkTX
Copy link
Member

falkTX commented Oct 11, 2019

tbh I do not like this approach very much. the issue should be fixed elsewhere, jack is not the only software that has this problem.
also, we should use CLOCK_MONOTONIC_RAW if possible

@milloni
Copy link
Author

milloni commented May 12, 2020

Sorry, I missed the last message

@falkTX I think you're right, although glibc folks have already had this discussion and they're not going to fix it, because doing so would supposedely break posix API. They are working on a different API that doesn't have this problem but it's an extension (not in posix) and ultimately it's downstream software that will have to make the switch.

Honestly I'm a bit surprised at this counter-intuitive behavior but apparently it's a "feature" not a bug

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants