Fast area-subsetting when loading dfs #741

FrejaTerpPetersen · 2024-11-05T08:29:14Z

I have a request for fast loading of subset of dfs data.

Background: I have a dfs2 file for a global dataset, and I wish to read all timesteps of this data but only a sub-area. The way I do it now, is to pass a bounding box to the mikeio.read() method:
ds = mikeio.read(fn,area = bbox_tuple)
The problem is that this method is really slow, even thought the area is very small. It seems that MIKEIO needs to load the entire area under the hood before subsetting. For reference: it takes 16 minutes to load the data, even though the resulting dfs2 file is only 1900 kb (the original global file is around 110 GB).

Ideally, it would be possible to use the mikeio.generic.extract() method with an 'area' argument instead of only subsetting in time. I imagine something along the lines of:
mikeio.generic.extract(fn, fn_out, area=bbox_tuple, start=0, end=-1, step=1, items=None)

The text was updated successfully, but these errors were encountered:

jsmariegaard · 2024-11-05T08:34:15Z

Thanks for raising this. It is a known problem which is due to the underlying library MIKE core and its binary dependencies. It is currently not possible to read just part of the data which is a pity. We unfortunately depend on others to fix this.

ecomodeller · 2024-11-05T08:38:18Z

One option would be to keep the global dataset in a format supporting spatial subsetting, (e.g. NetCDF, Zarr) and create the dfs2 needed for the simulation on demand instead of storing a global dataset in dfs2.

bhlevca · 2024-11-05T21:01:13Z

I had a similar problem with dfs.read(), and I solved it by moving the file from a slow drive to an SSD. The difference is huge. The read initially took 450 seconds, and on SSD, it only took 17 seconds. I am not saying that it cannot be improved algorithmically, but until that happens this could be a temporary solution. Bogdan

…

On Tue, Nov 5, 2024 at 3:38 AM Henrik Andersson ***@***.***> wrote: One option would be to keep the global dataset in a format supporting spatial subsetting, (e.g. NetCDF, Zarr) and create the dfs2 needed for the simulation on demand instead of storing a global dataset in dfs2. — Reply to this email directly, view it on GitHub <#741 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABMRXX7VW64PJHRYMKFP3TZ7B7ZFAVCNFSM6AAAAABRF7HZNWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINJWGU2TCMZUHA> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

Snowthe · 2024-11-18T13:43:17Z

...and I was about to open an issue for implementing the area subsetting for reading dfs3 files (for dfs3, the find_index method in _grid_geometry.py isn't implemented yet).
But that of course mainly makes sense if it actually means that only a subset of the data is loaded before the subset is returned.

Thanks for raising this. It is a known problem which is due to the underlying library MIKE core and its binary dependencies. It is currently not possible to read just part of the data which is a pity. We unfortunately depend on others to fix this.

Jesper, do you know whether this is a feature that somewhat easily can be implemented in MIKE core, or whether there are some larger issues with it (file format or similar)?

ecomodeller · 2024-11-18T14:10:41Z

...and I was about to open an issue for implementing the area subsetting for reading dfs3 files (for dfs3, the find_index method in _grid_geometry.py isn't implemented yet). But that of course mainly makes sense if it actually means that only a subset of the data is loaded before the subset is returned.

Thanks for raising this. It is a known problem which is due to the underlying library MIKE core and its binary dependencies. It is currently not possible to read just part of the data which is a pity. We unfortunately depend on others to fix this.

Jesper, do you know whether this is a feature that somewhat easily can be implemented in MIKE core, or whether there are some larger issues with it (file format or similar)?

@JesperGr can comment on how easy it is, but I am pretty sure it requires changes to the C library called in this line https://github.com/DHI/mikecore-python/blob/6ebbb1a1abebb88cd2060346611fa628eac57a82/mikecore/DfsFile.py#L935

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fast area-subsetting when loading dfs #741

Fast area-subsetting when loading dfs #741

FrejaTerpPetersen commented Nov 5, 2024

jsmariegaard commented Nov 5, 2024

ecomodeller commented Nov 5, 2024

bhlevca commented Nov 5, 2024 via email

Snowthe commented Nov 18, 2024

ecomodeller commented Nov 18, 2024

Fast area-subsetting when loading dfs #741

Fast area-subsetting when loading dfs #741

Comments

FrejaTerpPetersen commented Nov 5, 2024

jsmariegaard commented Nov 5, 2024

ecomodeller commented Nov 5, 2024

bhlevca commented Nov 5, 2024 via email

Snowthe commented Nov 18, 2024

ecomodeller commented Nov 18, 2024