-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce flag for overall FFI calls safe/unsafe behaviour #140
Conversation
Start of changes for addressing #34. The flag is not used yet, but will be in future patches. The idea is that all functions will be changed from `ccall unsafe` to `ccall FFI_*`, and then this flag denotes whether the FFI defines map to the actual values or all point to old-style/"fast" unsafe calls. Sending this first to see if we agree on approach for converting the code. Signed-off-by: Iustin Pop <[email protected]>
I'm not sure if we want all FFI calls to be declared safe. There are probably some that never block and could be declared But otherwise sure, this looks reasonable to me. |
Not sure I understand. The patch introduces three CPP defines, which will be used as appropriate for each ffi function. If flag is set to true, then all these will map to unsafe, like in previous behaviour. |
It does not make much sense to merge this PR without any actual application of the flag. If a release comes out before such applications are implemented, we'll find ourselves in an awkward situation. |
Of course, but at least I wanted to get agreement that this way (using a configure-time flag) is the way to go. For some reason, there seem to be disagreement, although I don't understand exactly why. |
Well, it is hard to come to an agreement about the flag, unless you demonstrate how you intend to use it. If |
@iustin sorry, I might have sounded unwelcoming yesterday. I think that a compile-time flag is a way to go, it's just that its precise semantics is not clear from this PR alone. If you wish to flesh out #34 one module at a time, we can setup a separate branch to merge correspondent PRs, so that they do not appear in @vdukhovni what do you think about the compile-time flag approach? |
@Bodigrim - no worries. The PR was made long ago, without looking I don't remeber the semantics. Checking… Ah yes, this is very basic. I had a longer patch (with actual functions annotated) way way back, but it stopped being mergeable years ago. Basically we need to a flag to allow falling back to the current behaviour. Whether it's called
The idea is to change all (literal) "unsafe" annotation into either FFI_UNSAFE, FFI_SAFE, FFI_INTR, as appropriate for the respective call. However, doing this across entire library is a large effort, so I wanted to start piecewiese, in order to prevent the previous issue - a half-complete patch that was no longer mergeable. Your idea of doing this module-by-module makes sense, thanks. Naming-related: I think I'd like to rename the cabal flag to 'unsafe_calls', to be a bit more explicit. Usage: as said above, each actual C call needs to be investigated and annotated as appropriate
And that's all there is to it. Longer term (a few releases), probably the flag default should flip so that by default we get correctness not speed, but that's for later. |
I am not convinced that the compile-time flags meet reasonable usability requirements. However the library built, somebody ends up unhappy, because either performance or "safety" suffers. My gut instinct is, that for calls where it is sometimes appropriate to use the "safe" version, and sometimes the "unsafe", we should provide both functions and the user choose which one to use. So, when we e.g. expose the raw read(2) system call, a caller who wants a fast non-blocking read on a local disk file can call the "unsafe" version, while a caller who may want (for some reason) to directly block on a read of a socket, without going through the IO manager, can call the unsafe version. Otherwise, when it is clear either "safe", "unsafe" or "into" is simply correct in all cases, then do that with no macros. A package whose semantics are unknowable to its caller (because built separately) sounds unwise to me. The API should have clear predictable behaviour. |
By the way, in case you're wondering, @hs-viktor and @vdukhovni are (alleged by @hs-viktor) one and the same person. |
And @vdukhovni agrees. |
Let me offer a counter point of view: there are no "fast" calls when dealing with IO that could be annotated with Thus, the only correct implementation is to mark all such calls as safe or intr, to prevent heisenbugs that are even more tricky to debug and will leave users to think that the Haskell runtime is flaky in the sense of unpredictable delays or other strange behaviour (hangs in the single threaded runtime, for example). Providing both variants might work, but would lead to significant increase of the API, and would lead to amplification in libraries-that-use-this library, since they would have to do the same. I don't think there is a good solution here, so the question is which is the best compromise. The idea of compile time behaviour was that all distributions would default to the safe and slow approach, and only custom stacks would use unsafe (at their risk). Whether that makes sense or not, I have no idea. So, the question is, which "evil" to accept? Difference in behaviour based on compilation flags (under the control of the build process), significantly increased API (under to control of the programmer, somewhat), or slowness in all IO paths (no choice)? |
This is why the RTS provides an IO manager, and high-level APIs called by users generally try a non-blocking operation and if that does not immediately succeed, ask the I/O manager to wake them up when the resource is ready. Thus many users os the APIs invoke them on non-blocking handles, and expect fast unsafe semantics. Users who directly use "unix" can be shown examples of correct usage with potentially blocking calls. Callers who want to block can do so in a bound thread via forkOS, or can (if really compelling) define their own "foreign safe" binding for one or two functions. Of all the options, having unpredictable compile-time feature selection feels the worst. Some dependent modules may expect "safe" others "unsafe", they all get linked together, now what? I think the current "unsafe" bindings have in fact worked quite well for a long time for most users, and there are safe interfaces built around them. Since there are potential traps for naîve users, my suggestion is to focus on documentation and examples of "safe" usage. |
I am not sure I agree. Given the "unsafe" interfaces, one can generally construct safe ways to use them, or use the "unsafe" variants directly when appropriate. If instead only the "safe" interfaces are available, then there's no way to avoid the costs other than to copy the code into one's own project with "unsafe" restored. So I'd like to see a case-by-case approach that justifies "safe" for particular functions, after looking into whether safer interfaces are already available via the IO manager (that may even internally rely on the "unsafe" functions in a safe way). Interfaces that supporting polling and non-blocking operations should IMHO remain "unsafe", with safety implemented at a higher layer by making use of appropriate event notification mechanisms. Interfaces that are generally expected non-blocking, but can block on network filesystems, ... can be documented as potentially requiring Thus I would be reluctant to make the brea'n'butter read(2), write(2), open(2), openat(2), ... functions "safe", but would rather document the tradeoffs. It is easy enough for the few users who want "safe" versions to just declare their own versions. With PatternSynonyms we can provide portable access to all the requisite system constants so that all the user needs to do is just make the "safe" call instead of the one provided by the library, and does not have to reimplement the associated glue. |
To be very clear, I do not care exactly how this is implemented, but I think there's a bit more to it, hence the discussions. Also, I'm just a Haskell user (and not doing very much Haskell recently, as compared to when I first opened the bug), so I might be wrong in my understanding of things. I agree that most "data" operations - the ones you quote, read/write/open/etc - already support non-blocking mode and thus can be used better via higher-level abstractions. And I think these are what most of the high-performace use-cases target (I guess - no data here). But if I understand correctly—please correct me if I'm wrong—these all rely on the use of What I'd like to achieve though, is that for simple cases, the API is simple, and doesn't require a forkOS just to safely create a directory (or a temporary file, or or…). Not sure exactly how PatternSynonyms would work here, care to give an example? |
I'm going to close this pull request, since it seems the best solution is not compile-time behaviour. It would be good though to summarize the main points in the issue itself - @hs-viktor, could you do that please, since you have deeper knowledge of what the best solution is? |
My sense is:
|
Start of changes for addressing #34. The flag is not used yet, but
will be in future patches.
The idea is that all functions will be changed from
ccall unsafe
toccall FFI_*
, and then this flag denotes whether FFI defines map to theactual values or all point to old-style/"fast" unsafe calls.
Sending this first to see if we agree on approach for converting the
code.
Signed-off-by: Iustin Pop [email protected]