-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mulitband rasters saved incorrectly in kedro_datasets_experimental.rioxarray.GeoTIFFDataset #980
Comments
Thank you for your query. I am looping in the creator of this dataset @tgoelles to assist with the above. |
Happy to see that geospatial folks are using it. I had a reason to make a separate _save_multiband, but don't remember 🫣. I think something was missing in the rio.to_raster or it did not work back then. Would be great if we could eliminate the need for _save_multiband. I guess your issue 1 is the most critical one and a bit surprising. Maybe you can alaborate a bit on it. I think number 2 can easily be implemented. Number 3 is an opinionated move I guess, I added a default NAN value to be sure to have one. I had a horrible bug down the line related to pytorch and torchgeo which was hard to find because of it. |
Thanks, for sure. On why you might have written this. I see in the documentation for rioxarray, that For 1. Here, the kedro-plugins/kedro-datasets/kedro_datasets_experimental/rioxarray/geotiff_dataset.py Lines 163 to 170 in 159e0a3
You can compare: In [19]: rio.transform.from_bounds(arr.x.min(), arr.y.min(), arr.x.max(), arr.y.max(), arr[0].shape[1], arr[0].shape[0])
Out[19]:
Affine(<xarray.DataArray 'x' ()> Size: 8B
array(9.9609375)
Coordinates:
spatial_ref int64 8B 0, <xarray.DataArray 'y' ()> Size: 8B
array(0.)
Coordinates:
spatial_ref int64 8B 0, <xarray.DataArray 'x' ()> Size: 8B
array(250162.63166085)
Coordinates:
spatial_ref int64 8B 0,
<xarray.DataArray 'x' ()> Size: 8B
array(0.)
Coordinates:
spatial_ref int64 8B 0, <xarray.DataArray 'y' ()> Size: 8B
array(-9.9609375)
Coordinates:
spatial_ref int64 8B 0, <xarray.DataArray 'y' ()> Size: 8B
array(4317235.02328553)
Coordinates:
spatial_ref int64 8B 0)
In [20]: arr.rio.transform()
Out[20]:
Affine(10.0, 0.0, 250157.63166085022,
0.0, -10.0, 4317240.023285527) |
Description
Hi, thanks for creating this dataset plugin for us geospatial folks.
The GeoTIFFDataset uses rioxarray to open and write datasets, except in the case of multi-band rasters. In this case, a custom
_save_multiband
function will use rasterio to write the data.kedro-plugins/kedro-datasets/kedro_datasets_experimental/rioxarray/geotiff_dataset.py
Lines 137 to 138 in 159e0a3
There are three issues with this implementation, which result in the saved dataset to not matching the loaded dataset.
In my pipelines, I've worked around the issue by writing a simple custom dataset that always uses the standard
dataarray.rio.to_raster()
, which appears to work as expected. However, given that effort was put into writing the _save_multiband in the first place, I feel like I'm missing something.Context
This bug affects processing all raster datasets with a
band
dimension, e.g. landsat/sentinel dataSteps to Reproduce
Load and save a geotiff dataset with bands
Expected Result
The output should be identical to the input
The text was updated successfully, but these errors were encountered: