-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow local queries without write access to archive #258
Comments
My question is, if I have no way to synchronize using the file system, how can I avoid that a repack on the dataset, which can potentially rewrite any part of an existing data segment, turns a running query into garbage? I think answering that question depends on organizational structures and processes. For example, if queries are tied to specific times, one could add a dataset configuration defining query times and maintenance times on a daily schedule basis, like saying that one cannot query between 00:00 and 04:00, and one cannot do repacks between 04:00 and 24:00. Or there could be the assumption that when a dataset is down for maintenance, it gets unmounted/unexported from the readonly part of the filesystem where the queries happen? Like, taken offline for maintenance? It's ok to assume that messages are not removed. How about messages overwriting old ones (like datasets with Note also that a repack would reorder data in a dataset without deleting anything. For example, if data is imported not in strictly reftime order, a repack reorders it so that a query, which returns data sorted by reftime, can read the segment as much as possible sequentially rather than jumping back and forth. I don't know how significant is the impact of that optimization, and I guess it would depend on what kind of data are in a dataset. I'd expect it to be worse for BUFR and VM2, and not so bad for big GRIBs and HDF5 files. It's ok not to do that if the performance change is understood not to be a big deal. I feel like there are many options and none universally good, and I'd like to identify some scenarios in detail in order to identify specific sets of tradeoffs |
Actually a low-profile implementation that performs the query on a best-effort basis (possibly returning an error code if there is the chance to assess that some of the relevant metadata have changed in the middle) would be enough. Of course this behavior shoud be enabled by an option acting as a disclaimer for the users that they can receive rubbish. If there is a chance to implement such a behavior without a big effort we could go on, otherwise just close as WONTDO. |
It is sometimes useful, especially in a shared HPC environment, where it is difficult to keep a daemon running, to be able to perform a local query on filesystem without having write permission on the archive directories. It is acceptable to make the assumption that no messages are removed from the archived files, i.e. changes to the archives only add messages to files and indices.
The text was updated successfully, but these errors were encountered: