Introduce flag for overall FFI calls safe/unsafe behaviour #140

iustin · 2019-11-20T13:20:20Z

Start of changes for addressing #34. The flag is not used yet, but
will be in future patches.

The idea is that all functions will be changed from ccall unsafe to
ccall FFI_*, and then this flag denotes whether FFI defines map to the
actual values or all point to old-style/"fast" unsafe calls.

Sending this first to see if we agree on approach for converting the
code.

Signed-off-by: Iustin Pop [email protected]

Start of changes for addressing #34. The flag is not used yet, but will be in future patches. The idea is that all functions will be changed from `ccall unsafe` to `ccall FFI_*`, and then this flag denotes whether the FFI defines map to the actual values or all point to old-style/"fast" unsafe calls. Sending this first to see if we agree on approach for converting the code. Signed-off-by: Iustin Pop <[email protected]>

simonmar · 2019-11-22T07:54:01Z

I'm not sure if we want all FFI calls to be declared safe. There are probably some that never block and could be declared unsafe, e.g. getpid.

But otherwise sure, this looks reasonable to me.

iustin · 2019-11-22T09:11:40Z

Not sure I understand. The patch introduces three CPP defines, which will be used as appropriate for each ffi function. If flag is set to true, then all these will map to unsafe, like in previous behaviour.

Bodigrim · 2020-11-14T15:24:01Z

It does not make much sense to merge this PR without any actual application of the flag. If a release comes out before such applications are implemented, we'll find ourselves in an awkward situation.

iustin · 2020-11-14T15:54:51Z

It does not make much sense to merge this PR without any actual application of the flag. If a release comes out before such applications are implemented, we'll find ourselves in an awkward situation.

Of course, but at least I wanted to get agreement that this way (using a configure-time flag) is the way to go.

For some reason, there seem to be disagreement, although I don't understand exactly why.

Bodigrim · 2020-11-14T17:08:46Z

Well, it is hard to come to an agreement about the flag, unless you demonstrate how you intend to use it. If unsafe = False, are all FFI calls safe?

Bodigrim · 2020-11-15T12:32:52Z

@iustin sorry, I might have sounded unwelcoming yesterday. I think that a compile-time flag is a way to go, it's just that its precise semantics is not clear from this PR alone. If you wish to flesh out #34 one module at a time, we can setup a separate branch to merge correspondent PRs, so that they do not appear in master until everything is done. Would this work for you?

@vdukhovni what do you think about the compile-time flag approach?

iustin · 2020-11-15T21:05:16Z

@Bodigrim - no worries. The PR was made long ago, without looking I don't remeber the semantics. Checking…

Ah yes, this is very basic. I had a longer patch (with actual functions annotated) way way back, but it stopped being mergeable years ago.

Basically we need to a flag to allow falling back to the current behaviour. Whether it's called unsafe = True or safe_calls = False, it doesn't really matter. With the current minimal PR, basically:

unsafe = True (default), then the new macros FFI_SAFE and FFI_INTR will just translate to "unsafe"
unsafe = False, then the macros will translate to their name (FFI_SAFE→safe, FFI_INTR→intr).

The idea is to change all (literal) "unsafe" annotation into either FFI_UNSAFE, FFI_SAFE, FFI_INTR, as appropriate for the respective call. However, doing this across entire library is a large effort, so I wanted to start piecewiese, in order to prevent the previous issue - a half-complete patch that was no longer mergeable. Your idea of doing this module-by-module makes sense, thanks.

Naming-related: I think I'd like to rename the cabal flag to 'unsafe_calls', to be a bit more explicit.

Usage: as said above, each actual C call needs to be investigated and annotated as appropriate

is the function a "trivial" call, that in theory at least can never block (e.g. most/all vsdo-related calls)? replace "unsafe" with FFI_UNSAFE; for example, the "time" import in unix/System/Posix/Time.hs;
is the function a slow call that involves IO? then replace "unsafe" with either "FFI_SAFE" or "FFI_INTR" (if the function can be interrupted by signals)

And that's all there is to it. Longer term (a few releases), probably the flag default should flip so that by default we get correctness not speed, but that's for later.

hs-viktor · 2020-11-15T22:33:40Z

@iustin sorry, I might have sounded unwelcoming yesterday. I think that a compile-time flag is a way to go, it's just that its precise semantics is not clear from this PR alone. If you wish to flesh out #34 one module at a time, we can setup a separate branch to merge correspondent PRs, so that they do not appear in master until everything is done. Would this work for you?

@vdukhovni what do you think about the compile-time flag approach?

I am not convinced that the compile-time flags meet reasonable usability requirements. However the library built, somebody ends up unhappy, because either performance or "safety" suffers. My gut instinct is, that for calls where it is sometimes appropriate to use the "safe" version, and sometimes the "unsafe", we should provide both functions and the user choose which one to use.

So, when we e.g. expose the raw read(2) system call, a caller who wants a fast non-blocking read on a local disk file can call the "unsafe" version, while a caller who may want (for some reason) to directly block on a read of a socket, without going through the IO manager, can call the unsafe version.

Otherwise, when it is clear either "safe", "unsafe" or "into" is simply correct in all cases, then do that with no macros. A package whose semantics are unknowable to its caller (because built separately) sounds unwise to me. The API should have clear predictable behaviour.

hs-viktor · 2020-11-15T22:35:34Z

By the way, in case you're wondering, @hs-viktor and @vdukhovni are (alleged by @hs-viktor) one and the same person.

vdukhovni · 2020-11-15T22:36:11Z

By the way, in case you're wondering, @hs-viktor and @vdukhovni are (alleged by @hs-viktor) one and the same person.

And @vdukhovni agrees.

iustin · 2020-11-16T00:30:46Z

Let me offer a counter point of view: there are no "fast" calls when dealing with IO that could be annotated with unsafe. A "local" open is fast, until you hit an actual hardware issue, or a need to flush page cache/buffers, or actually what you thought is local is actually an iSCSI device.

Thus, the only correct implementation is to mark all such calls as safe or intr, to prevent heisenbugs that are even more tricky to debug and will leave users to think that the Haskell runtime is flaky in the sense of unpredictable delays or other strange behaviour (hangs in the single threaded runtime, for example).

Providing both variants might work, but would lead to significant increase of the API, and would lead to amplification in libraries-that-use-this library, since they would have to do the same.

I don't think there is a good solution here, so the question is which is the best compromise. The idea of compile time behaviour was that all distributions would default to the safe and slow approach, and only custom stacks would use unsafe (at their risk). Whether that makes sense or not, I have no idea.

So, the question is, which "evil" to accept? Difference in behaviour based on compilation flags (under the control of the build process), significantly increased API (under to control of the programmer, somewhat), or slowness in all IO paths (no choice)?

hs-viktor · 2020-11-16T00:50:33Z

Let me offer a counter point of view: there are no "fast" calls when dealing with IO that could be annotated with unsafe. A "local" open is fast, until you hit an actual hardware issue, or a need to flush page cache/buffers, or actually what you thought is local is actually an iSCSI device.

Thus, the only correct implementation is to mark all such calls as safe or intr, to prevent heisenbugs that are even more tricky to debug and will leave users to think that the Haskell runtime is flaky in the sense of unpredictable delays or other strange behaviour (hangs in the single threaded runtime, for example).

Providing both variants might work, but would lead to significant increase of the API, and would lead to amplification in libraries-that-use-this library, since they would have to do the same.

I don't think there is a good solution here, so the question is which is the best compromise. The idea of compile time behaviour was that all distributions would default to the safe and slow approach, and only custom stacks would use unsafe (at their risk). Whether that makes sense or not, I have no idea.

So, the question is, which "evil" to accept? Difference in behaviour based on compilation flags (under the control of the build process), significantly increased API (under to control of the programmer, somewhat), or slowness in all IO paths (no choice)?

This is why the RTS provides an IO manager, and high-level APIs called by users generally try a non-blocking operation and if that does not immediately succeed, ask the I/O manager to wake them up when the resource is ready.

Thus many users os the APIs invoke them on non-blocking handles, and expect fast unsafe semantics. Users who directly use "unix" can be shown examples of correct usage with potentially blocking calls. Callers who want to block can do so in a bound thread via forkOS, or can (if really compelling) define their own "foreign safe" binding for one or two functions.

Of all the options, having unpredictable compile-time feature selection feels the worst. Some dependent modules may expect "safe" others "unsafe", they all get linked together, now what?

I think the current "unsafe" bindings have in fact worked quite well for a long time for most users, and there are safe interfaces built around them. Since there are potential traps for naîve users, my suggestion is to focus on documentation and examples of "safe" usage.

hs-viktor · 2020-11-16T08:07:28Z

I'm not sure if we want all FFI calls to be declared safe. There are probably some that never block and could be declared unsafe, e.g. getpid.

But otherwise sure, this looks reasonable to me.

I am not sure I agree. Given the "unsafe" interfaces, one can generally construct safe ways to use them, or use the "unsafe" variants directly when appropriate. If instead only the "safe" interfaces are available, then there's no way to avoid the costs other than to copy the code into one's own project with "unsafe" restored.

So I'd like to see a case-by-case approach that justifies "safe" for particular functions, after looking into whether safer interfaces are already available via the IO manager (that may even internally rely on the "unsafe" functions in a safe way).

Interfaces that supporting polling and non-blocking operations should IMHO remain "unsafe", with safety implemented at a higher layer by making use of appropriate event notification mechanisms. Interfaces that are generally expected non-blocking, but can block on network filesystems, ... can be documented as potentially requiring forkOS when that's a concern, or in some cases made "safe" if we have good reason to expect that applications will not make heavy use of the interface or critically depend on its performance.

Thus I would be reluctant to make the brea'n'butter read(2), write(2), open(2), openat(2), ... functions "safe", but would rather document the tradeoffs. It is easy enough for the few users who want "safe" versions to just declare their own versions. With PatternSynonyms we can provide portable access to all the requisite system constants so that all the user needs to do is just make the "safe" call instead of the one provided by the library, and does not have to reimplement the associated glue.

iustin · 2020-11-16T11:20:10Z

To be very clear, I do not care exactly how this is implemented, but I think there's a bit more to it, hence the discussions. Also, I'm just a Haskell user (and not doing very much Haskell recently, as compared to when I first opened the bug), so I might be wrong in my understanding of things.

I agree that most "data" operations - the ones you quote, read/write/open/etc - already support non-blocking mode and thus can be used better via higher-level abstractions. And I think these are what most of the high-performace use-cases target (I guess - no data here).

But if I understand correctly—please correct me if I'm wrong—these all rely on the use of O_NONBLOCK, and it's actually few of the calls that support this. Not even close() or fsync()/fdatasync() support this (and these are data-related operations). And, of course, all the "metadata" to call them so operations - e.g. rename, mkdir, unlink, etc. are missing this support as well.

What I'd like to achieve though, is that for simple cases, the API is simple, and doesn't require a forkOS just to safely create a directory (or a temporary file, or or…). Not sure exactly how PatternSynonyms would work here, care to give an example?

iustin · 2020-11-22T16:48:34Z

I'm going to close this pull request, since it seems the best solution is not compile-time behaviour. It would be good though to summarize the main points in the issue itself - @hs-viktor, could you do that please, since you have deeper knowledge of what the best solution is?

vdukhovni · 2020-11-22T21:35:00Z

My sense is:

We should not have compile-time switches for the FFI safety of the various interfaces. Whatever choice is made for a given function should be consistent across all builds
For a number of functions the arguments can supplied a non-blocking handle, in which case the function is guaranteed to either perform the requested operation immediately, or fail with EWOULDBLOCK or EAGAIN. These need to be "unsafe", to facilitate efficient support for non-blocking I/O. Documentation can encourage users to use non-blocking I/O or if need be forkOS if blocking other threads is a concern.
System calls like chmod(2), fstat(2), rename(2), unlink(2), creat(2), ... that manipulate the filesystem, rather than perform I/O on an open descriptor should I think remain "unsafe". While it is true that given an unreachable NFS fileserver or similar these can block indefinitely, the same is then true of paging in the running application if executed from such a filesystem, and indeed with high probability all the available capabilities will block as more requests for the non-responsive resource are processed.
System calls like poll(2), flock(2), ... that block waiting for an event, and are expected to block for some time even under "normal" conditions should be "safe" (or "intr" if that also implies "safe").
Documentation can highlight which "unsafe" calls might block under adverse conditions, and suggest work-arounds for applications where this is a concern.
I some (rare?) cases we can offer both the "unsafe" and the "safe" functions (under different names), and let the caller choose.

iustin closed this Nov 22, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce flag for overall FFI calls safe/unsafe behaviour #140

Introduce flag for overall FFI calls safe/unsafe behaviour #140

iustin commented Nov 20, 2019 •

edited

Loading

simonmar commented Nov 22, 2019 •

edited

Loading

iustin commented Nov 22, 2019

Bodigrim commented Nov 14, 2020

iustin commented Nov 14, 2020

Bodigrim commented Nov 14, 2020

Bodigrim commented Nov 15, 2020

iustin commented Nov 15, 2020

hs-viktor commented Nov 15, 2020

hs-viktor commented Nov 15, 2020

vdukhovni commented Nov 15, 2020

iustin commented Nov 16, 2020

hs-viktor commented Nov 16, 2020 •

edited

Loading

hs-viktor commented Nov 16, 2020

iustin commented Nov 16, 2020

iustin commented Nov 22, 2020

vdukhovni commented Nov 22, 2020

Introduce flag for overall FFI calls safe/unsafe behaviour #140

Introduce flag for overall FFI calls safe/unsafe behaviour #140

Conversation

iustin commented Nov 20, 2019 • edited Loading

simonmar commented Nov 22, 2019 • edited Loading

iustin commented Nov 22, 2019

Bodigrim commented Nov 14, 2020

iustin commented Nov 14, 2020

Bodigrim commented Nov 14, 2020

Bodigrim commented Nov 15, 2020

iustin commented Nov 15, 2020

hs-viktor commented Nov 15, 2020

hs-viktor commented Nov 15, 2020

vdukhovni commented Nov 15, 2020

iustin commented Nov 16, 2020

hs-viktor commented Nov 16, 2020 • edited Loading

hs-viktor commented Nov 16, 2020

iustin commented Nov 16, 2020

iustin commented Nov 22, 2020

vdukhovni commented Nov 22, 2020

iustin commented Nov 20, 2019 •

edited

Loading

simonmar commented Nov 22, 2019 •

edited

Loading

hs-viktor commented Nov 16, 2020 •

edited

Loading