Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XarrayReader produces upside down images for some datasets with xarray backend #34

Open
hrodmn opened this issue Oct 23, 2024 · 13 comments

Comments

@hrodmn
Copy link
Contributor

hrodmn commented Oct 23, 2024

I'm not sure if this is a problem in titiler-cmr or somewhere upstream (rio-tiler) but when reading images from some NetCDF files without doing a reprojection operation, the images are returned upside down!

This request to one NetCDF dataset returns the expected result:
https://9ox7r6pi8c.execute-api.us-west-2.amazonaws.com/bbox/-99.524,19.043,-78.647,31.126.png?concept_id=C2036881735-POCLOUD&datetime=2024-09-20T00:00:00Z/2024-09-21T00:00:00Z&variable=analysed_sst&backend=xarray&rescale=273,325&colormap_name=thermal&return_mask=true
image

A request to a different NetCDF dataset (MUR SST) with all of the same parameters yields an upside down image!
https://9ox7r6pi8c.execute-api.us-west-2.amazonaws.com/bbox/-99.524,19.043,-78.647,31.126.png?concept_id=C1996881146-POCLOUD&datetime=2024-09-20T00:00:00Z/2024-09-21T00:00:00Z&variable=analysed_sst&backend=xarray&rescale=273,325&colormap_name=thermal&return_mask=true
image

/tile requests to the second dataset return the expected result:
https://9ox7r6pi8c.execute-api.us-west-2.amazonaws.com/tiles/WebMercatorQuad/5/8/13.png?concept_id=C1996881146-POCLOUD&datetime=2024-09-20T00:00:00Z/2024-09-21T00:00:00Z&variable=analysed_sst&backend=xarray&rescale=273,325&colormap_name=thermal&return_mask=true
image

When I specify a dst_crs that is not the dataset's native CRS, /bbox produces a correct image:
https://9ox7r6pi8c.execute-api.us-west-2.amazonaws.com/bbox/-99.524,19.043,-78.647,31.126.png?concept_id=C1996881146-POCLOUD&datetime=2024-09-20T00:00:00Z/2024-09-21T00:00:00Z&variable=analysed_sst&backend=xarray&rescale=273,325&colormap_name=thermal&return_mask=true&dst_crs=epsg:3857
image

@hrodmn hrodmn changed the title /bbox produces upside down images for some datasets with xarray backend XarrayReader produces upside down images for some datasets with xarray backend Oct 23, 2024
@hrodmn
Copy link
Contributor Author

hrodmn commented Oct 23, 2024

After digging into the issue some more I think the problem is in the default CRS definition for non-georeferenced datasets in titiler-cmr:

crs = da.rio.crs or "epsg:4326"
da.rio.write_crs(crs, inplace=True)

The dataset that works as expected (GAMSSA) contains a georeference while the dataset that is not working correctly (MUR SST) does not contain a georeference and has it manually set to "epsg:4326". If I set the CRS using the CRS WKT from the GAMSSA dataset (instead of "epsg:4326", the MUR SST dataset behaves correctly.

diff --git a/titiler/cmr/reader.py b/titiler/cmr/reader.py
index 979629b..5e0af7d 100644
--- a/titiler/cmr/reader.py
+++ b/titiler/cmr/reader.py
@@ -197,7 +197,10 @@ def get_variable(
         da = da.sortby(da.x)

     # Make sure we have a valid CRS
-    crs = da.rio.crs or "epsg:4326"
+    crs = (
+        da.rio.crs
+        or 'GEOGCS["undefined",DATUM["undefined",SPHEROID["undefined",6379137,298.257223563]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AXIS["Longitude",EAST],AXIS["Latitude",NORTH]]'
+    )
     da.rio.write_crs(crs, inplace=True)

This does not explain why everything works when a reprojection operation is performed without any changes to the default CRS, but it suggests that defaulting to epsg:4326 is probably not the right thing.

@maxrjones
Copy link
Member

maxrjones commented Oct 23, 2024

rioxarray will set the Affine transformation via the crs attribute and coordinates if it is not already defined in the dataset, which means the data gets "flipped" properly to match the destination transform in the reprojection step. rio_tiler's XarrayReader skips the reprojection entirely if the source and destination crs are the same - https://github.com/cogeotiff/rio-tiler/blob/abec15162c1d8cf9dd67cb5945d8532e2b6f562c/rio_tiler/io/xarray.py#L249-L263. If the bbox endpoint should always return an image that's oriented with y coordinates positive upwards, I'd guess this should be implemented at the rio-tiler level 🤔

I think the automatic assignment of a CRS is a broader issue. I don't agree with assuming that it's "epsg:4326" if it's not in the metadata. I guess there are a lot of datasets that are missing that metadata, so there should be a way for users to get around that still. Maybe a query parameter in the titiler-cmr endpoints?

@hrodmn
Copy link
Contributor Author

hrodmn commented Oct 23, 2024

After more investigation with @maxrjones and others, the problem is actually pretty simple. Neither of these datasets follow the convention of y axis coordinates positive upwards in the arrays.

The GAMSSA dataset has a georeference attached to it that rioxarray can find, and when rio-tiler checks to see if the CRS matches the dest_crs, it sees that they are not equal and performs a reprojection. This reprojection operation yields an array that does follow the conventional y-coordinate direction.

The MUR SST dataset does not have a georeference attached to it and gets the "epsg:4326" value hard-coded. In the part method, rio-tiler checks to see if self.input.rio.crs matches dest_crs (default: "epsg:4326") and they do match so no reprojection operation is performed. This means the array is never transformed to match conventional expectation of y-coordinates.

If the bbox endpoint should always return an image that's oriented with y coordinates positive upwards, I'd guess this should be implemented at the rio-tiler level 🤔

We should propose a fix in rio-tiler to ensure the array that is passed to ImageData matches expectation of y axis coordinates positive upwards.

I think the automatic assignment of a CRS is a broader issue. I don't agree with assuming that it's "epsg:4326" if it's not in the metadata. I guess there are a lot of datasets that are missing that metadata, so there should be a way for users to get around that still. Maybe a query parameter in the titiler-cmr endpoints?

Yes I agree. We should add a query parameter for src_crs that would allow an application to provide a CRS definition for datasets that do not have one attached, then raise an exception if the dataset is missing a CRS and src_crs is not supplied.

@maxrjones
Copy link
Member

Here's the roughly equivalent rio_tiler code showing the inconsistent behavior between the Rasterio and Xarray readers:

import earthaccess
import rio_tiler
import xarray as xr
import matplotlib.pyplot as plt
from odc.geo.geom import Geometry
from shapely.geometry import box

# Download dataset
earthaccess.login()
variable = "analysed_sst"
results = earthaccess.search_data(
    concept_id="C2036881735-POCLOUD", count=1, temporal=("2024-09-20", "2024-09-20")
)
fp = earthaccess.download(results, "earthaccess_data")[0]


# Set up GeoJSON for reader
bbox = box(-99.524, 19.043, -78.647, 31.126)
gbox = Geometry(bbox, "EPSG:4326")
shape = gbox.geojson()

# Use Xarray reader
ds = xr.open_dataset(fp)
ds = ds.rio.write_crs("EPSG:4326")
r = rio_tiler.io.xarray.XarrayReader(ds[variable])
im = r.feature(shape)
plt.imshow(im.data.squeeze())
plt.colorbar()
plt.show()

# Use rasterio reader
r = rio_tiler.io.rasterio.Reader(f"NETCDF:{str(fp)}:analysed_sst")
im = r.feature(shape)
plt.imshow(im.data.squeeze())
plt.colorbar()
plt.show()

I'm not sure where mask_and_scale and masking gets handled with using the rasterio reader, which is the reason for the difference in values. I could open a PR tomorrow suggesting a change to the Xarray reader to return consistent array structure.

image

image

@vincentsarago
Copy link
Member

I'm not sure where mask_and_scale and masking gets handled with using the rasterio reader, which is the reason for the difference in values. I could open a PR tomorrow suggesting a change to the Xarray reader to return consistent array structure.

@maxrjones can you try plt.imshow(im.data_as_image()) instead of plt.imshow(im.data.squeeze())

import earthaccess
import rio_tiler
import xarray as xr
import matplotlib.pyplot as plt
from geojson_pydantic import Polygon

# Download dataset
earthaccess.login()
variable = "analysed_sst"
results = earthaccess.search_data(
    concept_id="C2036881735-POCLOUD", count=1, temporal=("2024-09-20", "2024-09-20")
)
fp = earthaccess.download(results, "earthaccess_data")[0]

# Set up GeoJSON for reader
bbox = (-99.524, 19.043, -78.647, 31.126)
shape = Polygon.from_bounds(*bbox).model_dump(exclude_none=True)

# Use Xarray reader
print("xarray")
ds = xr.open_dataset(fp)
ds = ds.rio.write_crs("EPSG:4326")
r = rio_tiler.io.xarray.XarrayReader(ds[variable])
im = r.feature(shape)

plt.imshow(im.data_as_image())
plt.colorbar()
plt.show()

print("rasterio")
# Use rasterio reader
r = rio_tiler.io.rasterio.Reader(f"NETCDF:{str(fp)}:analysed_sst")
im = r.feature(shape)

plt.imshow(im.data_as_image())
plt.colorbar()
plt.show()
Screenshot 2024-10-24 at 9 30 54 AM

@vincentsarago
Copy link
Member

Thank you both for digging on this issue 🙏

I think the automatic assignment of a CRS is a broader issue. I don't agree with assuming that it's "epsg:4326" if it's not in the metadata. I guess there are a lot of datasets that are missing that metadata, so there should be a way for users to get around that still. Maybe a query parameter in the titiler-cmr endpoints?

I totally agree 🙏

Yes I agree. We should add a query parameter for src_crs that would allow an application to provide a CRS definition for datasets that do not have one attached, then raise an exception if the dataset is missing a CRS and src_crs is not supplied.

I'm a bit worry about passing use WKT crs via query parameter, but I don't see better solution

@vincentsarago
Copy link
Member

if we agree that not setting a CRS automatically in the XarrayReader, then might get issue when using titiler-cmr backend because we won't be able to easily forward the Error message up to the users 😬

@vincentsarago
Copy link
Member

ok I think I may have a solution

import xarray

ds = xarray.open_dataset("20241012090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1.nc", decode_coords="all")
da = ds["analysed_sst"]
print(da.rio.bounds())
print(da.rio.crs)
print(da.rio.transform())
print(da.rio._unordered_bounds())

(-179.99500549324037, -89.99499786365084, 180.0050000000763, 89.99499786365084)
None
| 0.01, 0.00,-180.00|
| 0.00, 0.01,-89.99|
| 0.00, 0.00, 1.00|
(-179.99500549324037, 89.99499786365084, 180.0050000000763, -89.99499786365084)

if you look at the tranform the y resolution is positive, while the bounds (form rio.bounds()) indicate that the transform Y resolution should be negative

rioxarray reorder the bounds, so if you look at the difference between da.rio._unordered_bounds() and da.rio.bounds() this might indicate that the transform should have a -Y resolution

@hrodmn
Copy link
Contributor Author

hrodmn commented Oct 24, 2024

This is another case where the vrt:// URI syntax shines. What if we could just use the rasterio backend for all of these NetCDF datasets? I don't know how to specify a vrt:// URI in a 4-D (lon, lat, time, variable) where you need to select a time slice AND a variable, but it might be possible (I haven't found an example dataset with that format yet).

This /vsicurl operation takes ~23 seconds on my laptop.

import earthaccess
import matplotlib.pyplot as plt
import rio_tiler

# query MUR SST from CMR
earthaccess.login()
results = earthaccess.search_data(
    concept_id="C1996881146-POCLOUD", count=1, temporal=("2024-10-22", "2024-10-22")
)

url = results[0].data_links()[0]

# Use rasterio reader to read a bbox
bbox = (-99.524, 19.043, -78.647, 31.126)
variable = "analysed_sst"

r = rio_tiler.io.rasterio.Reader(f"vrt:///vsicurl/{url}?sd_name={variable}")
im = r.part(bbox)
plt.imshow(im.data_as_image())
plt.show()

image

Does the XarrayReader have some performance advantage over the GDAL reader in this case?

@hrodmn
Copy link
Contributor Author

hrodmn commented Oct 24, 2024

I don't know how to specify a vrt:// URI in a 4-D (lon, lat, time, variable) where you need to select a time slice AND a variable, but it might be possible (I haven't found an example dataset with that format yet).

This example file has four dimensions: https://www.unidata.ucar.edu/software/netcdf/examples/tos_O1_2001-2002.nc

You can pick the variable with sd_name={variable} and a time slice with bands={i}.

gdalinfo "vrt:///vsicurl/https://www.unidata.ucar.edu/software/netcdf/examples/tos_O1_2001-2002.nc?sd_name=tos&bands=1"

@maxrjones
Copy link
Member

I don't know how to specify a vrt:// URI in a 4-D (lon, lat, time, variable) where you need to select a time slice AND a variable, but it might be possible (I haven't found an example dataset with that format yet).

This example file has four dimensions: unidata.ucar.edu/software/netcdf/examples/tos_O1_2001-2002.nc

You can pick the variable with sd_name={variable} and a time slice with bands={i}.

gdalinfo "vrt:///vsicurl/https://www.unidata.ucar.edu/software/netcdf/examples/tos_O1_2001-2002.nc?sd_name=tos&bands=1"

How would this work with rio-tiler?

I tried:

import matplotlib.pyplot as plt
import rio_tiler

# Use rasterio reader to read a bbox
bbox = (-99.524, 19.043, -78.647, 31.126)
src = "vrt:///vsicurl/https://www.unidata.ucar.edu/software/netcdf/examples/tos_O1_2001-2002.nc?sd_name=tos&bands=1"

r = rio_tiler.io.rasterio.Reader(src)
im = r.part(bbox)
plt.imshow(im.data_as_image())
plt.show()

@vincentsarago
Copy link
Member

You shouldn't need the /vsicurl/ prefix with rasterio

@maxrjones
Copy link
Member

You shouldn't need the /vsicurl/ prefix with rasterio

Good to know, thanks! The bands= trick via VRT wasn't working for me, but it was possible to use indexes=1 with Reader.part(). Still, I'm not sure how you would go from an application/user specified time stamp to an index without a tool like xarray / rioxarray.

import matplotlib.pyplot as plt
import rio_tiler

# Use rasterio reader to read a bbox
bbox = (190, 19.043, 240, 31.126)
src = "https://www.unidata.ucar.edu/software/netcdf/examples/tos_O1_2001-2002.nc?sd_name=tos&bands=1"

r = rio_tiler.io.rasterio.Reader(src)
im = r.part(bbox, indexes=1)
plt.imshow(im.data_as_image())

@hrodmn FYI while that dataset has 4 dimensions, the tos variable only has three. concept_ID="G1991303139-POCLOUD" in an example of a dataset with 4 dimensional data variables (time: 1, Z: 50, latitude: 360, longitude: 720).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants