-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose dirent pointer and added readDirStreamWith #251
Conversation
a2e6261
to
3f5c348
Compare
Why not conditionally provide a new high-level Linux/glibc-specific API that returns a directory entry and the file type. The conditional test here would be the availability of the I think this would be cleaner/safer than exposing the proposed API. As for avoiding allocation of a dirent structure, with all the attendant syscall overhead, the memory GC cost should be fairly modest. So I am not entirely enthused by this. Which is not a hard no, but significant scepticism. |
It was my understanding that the unix package is about POSIX-compliance and additional features should be implemented in upstream libraries like hpath-path. This patch aims to make it easier for upstream libraries to do that as they don't have to copy/replicate the work done here.
Ok, that's fair. I consider |
I am aware of |
I'm not opposed to this. Can we vote on this @hs-viktor @vdukhovni @Bodigrim 👍 from me |
Which |
The changes in this PR. I'm fine with these low-level interfaces. Alternatively, we can also put them into Internal module and expose it to signal "you're on your own". |
If no other maintainer chimes in, then I'll merge this in 2 weeks. |
@hs-viktor @vdukhovni just a gentle reminder. |
I still mostly stand by my original comment, and perhaps expose this only as an Internal API with no stability guarantees. Maintainers of dependent packages would have to be willing to adapt to future changes... Why not just support the higher-level (system-specific) API? |
I'm fine with exposing this as Internal. |
haskell/ghcup-hs#415 (comment) One more reason to merge this and add tests. |
This is way out of my area of expertise, but I'm inclined to agree with Viktor.
Practically
I'd accept such contribution. |
@mmhat are you interested to work on a high-level API here? |
I don't think it's appropriate for this PR to provide the high-level API. We're going to burn out the contributor this way. We should decide whether we merge this PR mostly as-is and expose it as internal API. This has a 👍 from me. |
That's a very sensible concern, but @mmhat originally said that "If the conditional high-level API you proposed would be accepted as a contribution I'd be more than happy to implement those". It seems a win-win to me. |
@Bodigrim Luckly I have a bit more time to work on this until the end of the year so I could do that. |
@Bodigrim @hasufell I added some high-level API as outlined in the comment by @vdukhovni. I also moved |
I just stumbled over this: So... |
Another reason to not expose the low-level API. |
In ghcup I've written a portable version of This is similar to what other projects have been doing (falling back to 'stat' on 'DT_UNKNOWN'): ggreer/the_silver_searcher#37 |
I kindly disagree: IMHO, when it comes to directory tree traversal, the functions we provide in this package are already low-level. The user needs to understand concepts like a directory stream and we throw a bunch of low-level technical terms at them in the Haddocks (like To resolve this issue with unsupported filesystems I see the following options:
If we go with (2) or (3) I'd like to add some documentation pointing out what the limitations of Personally, I'd prefer option (3) with |
When would someone pass No need for a Bool option imo. |
Good point; I always thought of the There is however a corner case I am not sure how to deal with in a fallback implementation: Suppose the underlying platform does not properly support |
We should be able to check for DT_UNKNOWN or -1 in the stat-fallback, no? |
Not surprising, MacOS unix is a *BSD descendent. There is no statx(2) in FreeBSD or NetBSD either, and I expect ditto for OpenBSD. |
My vote is still 1. Not fond of exposing FFI to low-level storable structures with platform-dependent fields. |
A user could approximate this by checking whether |
#include "HsUnix.h"
module Main where
main :: IO ()
main = do
#ifdef HAVE_STATX
putStrLn "STATX"
#else
putStrLn "Hello, Haskell!"
#endif This seems to work. |
Sure, but I was thinking that it be friendly to our users to do that for them, and expose a boolean value they can test without any of This is not a "must have", just something that could be a useful addition, unless you see reasons not to... |
I'm a bit embarrassed that this didn't move forward in time. My suggestion is to move the new API to modules that end in Any objections? |
@hasufell Would work for me. But I think it would be good to keep the issue open, I still hope that we expose some of the functionality in the public API in the future... |
Alright. If you adjust the PR to expose via To be clear: internal modules are still under PVP, but I see no obligation to do e.g. deprecation periods for them. They may break at random. |
This commit exposes the dirent pointer used in the directory stream functions wrapped in a newtype called `DirEnt`. It also adds a new function `readDirStreamWith` that takes a callback that is used to obtain the result from the pointer to the directory entry.
This version of `readDirStreamWith` takes a pre-allocated pointer as an additional argument that is used to store the pointer to the dirent. It is useful when you want to read a lot of entries and want to safe a few allocations since you can reuse the pointer for each call to `readDirStreamWithPtr`.
* Added `readDirStream`/`readDirStream` for ByteString, PosixPath * Removed `DirEnt`, `readDirStreamWith`, `readDirStreamWithPtr` from public API
* Changed the assignment of values for missing DT_* constants. Since the 'd_type' field is of type 'unsigned char' we assign a negative number for each missing DT_* constant so they are distinguishable but do not collide with the values from the libc implementation. * Added a new 'DirStreamWithPath' that contains the path of the directory the directory stream belongs to. * 'readDirStreamWithType' falls back to a 'stat' if the 'd_type' is unknown or undetermined. * Added some tests for 'readDirStreamWithType'.
@hasufell The stuff from the |
@hasufell @vdukhovni Is there anything left to do for me or can we finally merge this? |
This has been released: https://hackage.haskell.org/package/unix-2.8.6.0 |
This PR adds two new functions
readDirStreamWith
andreadDirStreamWithPtr
:readDirStreamWith
is a version ofreadDirStreamMaybe
that takes a callback that is used to obtain the result from the pointer to the directory entry. This directory pointer is wrapped in a newtype calledDirEnt
. The function is intended to be used in cases where we know about the features of the installed libc: For example, glibc provides the information about the type of the directory entry directly in the dirent struct. Therefore it can be used directly without calling further functions and additional calls to stat(2).Since those extensions to dirent are not covered by POSIX but widely used, we provide only the possibility to make use of those and do not add this functionality to
unix
.readDirStreamWithPtr
takes a pre-allocated pointer in addition to the other arguments ofreadDirStreamWith
. This pointer is used to store the pointer of the dirent struct (if there is any) which can safe some allocations when you want to read a lot of directory entries as this pre-allocated pointer can be shared between the calls to the function.In addition to that a small CPP bug was fixed which prevented compilation on my system. Also, some tests for the readDirStream functions were added.