-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add portable support for file open with data caching supressed/eliminated. #322
Comments
Sounds reasonable to me. Were you preparing a PR? |
Yes, we would intend to prepare a PR. |
New information: it turns out that the platforms that do support direct I/O or equivalent, can all set it via fcntl. It doesn't have to be set at file open time. In particular, Linux, FreeBSD and NetBSD all support setting So this may well be the better way to go, to add something like this to the
Again, it would be a no-op on platforms that do not support such hints (e.g. solaris, openbsd). |
Or maybe The general pattern of Advice and opinions welcome. |
Further update: OSX actually does not support a way to get the caching mode, only a way to set it. So a portable API would be just And the CI test would be just: does the call not throw an exception (due to the syscall not returning -1). So no (portable) ability to test that what we get is the value we set. This arguably makes sense for a portable API anyway, given that it's supposed to be a no-op on platforms where it's not supported, and on non-supporting platforms there is no such state to get. |
We do have a number of APIs that are sort of platform specific. The pattern is: Lines 93 to 99 in d740ca9
So I don't see a problem with adding |
What is the status? |
Is your feature request related to a problem? Please describe.
The problem is trying to use modern SSDs to their maximum performance for random I/O (particularly random reads) on normal files (not raw block devices), across multiple cores/capabilities. To do this one needs two things: good async I/O APIs and opening files in a mode that bypasses the page cache. Bypassing the page cache is needed to achieve the maximum IOPS, especially when submitting IO operations from many OS threads at once (so from many RTS capabilities). Good async I/O APIs is out of scope for this feature request.
A similar problem is wanting to do lots of random I/O while optimising the memory of the host system by not polluting the page cache with disk pages that will only be used once (to make best use of the page cache for other files that are used). Again for this use case one wants to open a file in a mode that bypasses or suppresses the page cache.
Another similar problem is wanting to do disk I/O performance benchmarking, and one needs to work around the caching that the OS does: either by dropping caches before a run and avoiding re-reading the same page twice, or avoiding caching altogether.
Describe the solution you'd like
The solution is to allow opening a file in a mode that attempts to suppresses or eliminates the use of disk/page caching for this use of this file. This is a feature that all widely used unix-like OSs support, but it is not standardised by posix:
O_DIRECT
flag toopen(2)
.O_DIRECT
flag toopen(2)
.F_NOCACHE
tofcntl(2)
(link here is to the iPhoneOS man page version because apple removed the online rendered version of the desktop man pages)For platforms that do not support any of these methods, the fallback should simply be to do nothing. The semantics of continuing to do caching is contained within the semantics of no caching (but with different performance characteristics).
Note also that given we will document the semantics as trying to do less/no caching, then we also don't worry about the slight difference in behaviour between OSX and FreeBSD and Linux on the use of the page cache. (OSX will use cached pages for the file if they are present already, while Linux will ignore cached pages even if there are cached pages already. This difference is only relevant for I/O benchmarks, and such programs need to be aware of a lot of platform specific details already).
The feature should be implemented as an extra boolean flag in the
OpenFileFlags
. The name of this field should be descriptive since there is no POSIX name to follow (and different platforms call it different things, so e.g.direct
would be inappropriate). Suggestions includenoCache :: Bool
, since that's simply descriptive (though it happens to be what OSX uses too).Additionally (and this is a matter of API design tastes where reasonable people may differ) one may wish to provide some feature flag that one can test to see if support is present (since no exception will be thrown if it is not present).
The documentation for the feature should also clearly describe that when using this feature, some platforms impose additional constraints on the alignment of file reads/writes and the memory buffers used for reads/writes. Optionally it may also make sense to provide some constants to give the most portable values for disk and memory alignment, or an action to obtain these alignment hints. Feedback on this aspect of the API is welcome.
Describe alternatives you've considered
The alternative is an extension package,
unix-odirect
or something, with just the file open support and nothing else.Additional context
My colleagues and I are happy to implement this feature, including docs etc and shepherd it through PR review.
Related older tickets: #48 and #6. But these propose just using and exposing the non-portable
O_DIRECT
rather than trying to provide portable support.API breaking changes
It would be an extra member of the
OpenFileFlags
record, with a default (normal caching behaviour) in thedefaultFileFlags
value. So this should not break most exising library users which create theOpenFileFlags
record value by overridingdefaultFileFlags
rather than using the raw constructor.Posix compliance
This is a feature available in all major Posix compatible OSs (even windows) but it is not standardised by POSIX.
Relevant excerpts from man pages (linked above):
open O_DIRECT
:open O_DIRECT
:fcntl F_NOCACHE
:The text was updated successfully, but these errors were encountered: