Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fast area-subsetting when loading dfs #741

Open
FrejaTerpPetersen opened this issue Nov 5, 2024 · 5 comments
Open

Fast area-subsetting when loading dfs #741

FrejaTerpPetersen opened this issue Nov 5, 2024 · 5 comments

Comments

@FrejaTerpPetersen
Copy link

I have a request for fast loading of subset of dfs data.

Background: I have a dfs2 file for a global dataset, and I wish to read all timesteps of this data but only a sub-area. The way I do it now, is to pass a bounding box to the mikeio.read() method:
ds = mikeio.read(fn,area = bbox_tuple)
The problem is that this method is really slow, even thought the area is very small. It seems that MIKEIO needs to load the entire area under the hood before subsetting. For reference: it takes 16 minutes to load the data, even though the resulting dfs2 file is only 1900 kb (the original global file is around 110 GB).

Ideally, it would be possible to use the mikeio.generic.extract() method with an 'area' argument instead of only subsetting in time. I imagine something along the lines of:
mikeio.generic.extract(fn, fn_out, area=bbox_tuple, start=0, end=-1, step=1, items=None)

@jsmariegaard
Copy link
Member

Thanks for raising this. It is a known problem which is due to the underlying library MIKE core and its binary dependencies. It is currently not possible to read just part of the data which is a pity. We unfortunately depend on others to fix this.

@ecomodeller
Copy link
Member

One option would be to keep the global dataset in a format supporting spatial subsetting, (e.g. NetCDF, Zarr) and create the dfs2 needed for the simulation on demand instead of storing a global dataset in dfs2.

@bhlevca
Copy link

bhlevca commented Nov 5, 2024 via email

@Snowthe
Copy link

Snowthe commented Nov 18, 2024

...and I was about to open an issue for implementing the area subsetting for reading dfs3 files (for dfs3, the find_index method in _grid_geometry.py isn't implemented yet).
But that of course mainly makes sense if it actually means that only a subset of the data is loaded before the subset is returned.

Thanks for raising this. It is a known problem which is due to the underlying library MIKE core and its binary dependencies. It is currently not possible to read just part of the data which is a pity. We unfortunately depend on others to fix this.

Jesper, do you know whether this is a feature that somewhat easily can be implemented in MIKE core, or whether there are some larger issues with it (file format or similar)?

@ecomodeller
Copy link
Member

...and I was about to open an issue for implementing the area subsetting for reading dfs3 files (for dfs3, the find_index method in _grid_geometry.py isn't implemented yet). But that of course mainly makes sense if it actually means that only a subset of the data is loaded before the subset is returned.

Thanks for raising this. It is a known problem which is due to the underlying library MIKE core and its binary dependencies. It is currently not possible to read just part of the data which is a pity. We unfortunately depend on others to fix this.

Jesper, do you know whether this is a feature that somewhat easily can be implemented in MIKE core, or whether there are some larger issues with it (file format or similar)?

@JesperGr can comment on how easy it is, but I am pretty sure it requires changes to the C library called in this line https://github.com/DHI/mikecore-python/blob/6ebbb1a1abebb88cd2060346611fa628eac57a82/mikecore/DfsFile.py#L935

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants