-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fast area-subsetting when loading dfs #741
Comments
Thanks for raising this. It is a known problem which is due to the underlying library MIKE core and its binary dependencies. It is currently not possible to read just part of the data which is a pity. We unfortunately depend on others to fix this. |
One option would be to keep the global dataset in a format supporting spatial subsetting, (e.g. NetCDF, Zarr) and create the dfs2 needed for the simulation on demand instead of storing a global dataset in dfs2. |
I had a similar problem with dfs.read(), and I solved it by moving the file
from a slow drive to an SSD. The difference is huge. The read initially
took 450 seconds, and on SSD, it only took 17 seconds.
I am not saying that it cannot be improved algorithmically, but until that
happens this could be a temporary solution.
Bogdan
…On Tue, Nov 5, 2024 at 3:38 AM Henrik Andersson ***@***.***> wrote:
One option would be to keep the global dataset in a format supporting
spatial subsetting, (e.g. NetCDF, Zarr) and create the dfs2 needed for the
simulation on demand instead of storing a global dataset in dfs2.
—
Reply to this email directly, view it on GitHub
<#741 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABMRXX7VW64PJHRYMKFP3TZ7B7ZFAVCNFSM6AAAAABRF7HZNWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINJWGU2TCMZUHA>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
...and I was about to open an issue for implementing the area subsetting for reading dfs3 files (for dfs3, the find_index method in _grid_geometry.py isn't implemented yet).
Jesper, do you know whether this is a feature that somewhat easily can be implemented in MIKE core, or whether there are some larger issues with it (file format or similar)? |
@JesperGr can comment on how easy it is, but I am pretty sure it requires changes to the C library called in this line https://github.com/DHI/mikecore-python/blob/6ebbb1a1abebb88cd2060346611fa628eac57a82/mikecore/DfsFile.py#L935 |
I have a request for fast loading of subset of dfs data.
Background: I have a dfs2 file for a global dataset, and I wish to read all timesteps of this data but only a sub-area. The way I do it now, is to pass a bounding box to the mikeio.read() method:
ds = mikeio.read(fn,area = bbox_tuple)
The problem is that this method is really slow, even thought the area is very small. It seems that MIKEIO needs to load the entire area under the hood before subsetting. For reference: it takes 16 minutes to load the data, even though the resulting dfs2 file is only 1900 kb (the original global file is around 110 GB).
Ideally, it would be possible to use the mikeio.generic.extract() method with an 'area' argument instead of only subsetting in time. I imagine something along the lines of:
mikeio.generic.extract(fn, fn_out, area=bbox_tuple, start=0, end=-1, step=1, items=None)
The text was updated successfully, but these errors were encountered: