Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supporting a flexible dataset schema #37

Open
jhamman opened this issue Jul 16, 2024 · 6 comments
Open

Supporting a flexible dataset schema #37

jhamman opened this issue Jul 16, 2024 · 6 comments

Comments

@jhamman
Copy link
Contributor

jhamman commented Jul 16, 2024

Xpublish-EDR currently support a very narrow dataset schema. For example, coordinates must be named X, Y, and T corresponding to longitude, latitude, and time. When datasets do not conform to this, there is a somewhat confusing error raised about CF conventions:

try:
ds = dataset.cf.sel(X=query.point.x, Y=query.point.y, method="nearest")
except KeyError:
raise HTTPException(
status_code=404,
detail="Dataset does not have CF Convention compliant metadata",
)

Of course, there are many ways a dataset can be CF complaint (with or without these coordinates present). My understanding of the EDR spec is that it is agnostic to the names of these coordinates in the underlying data source. So Xpublish-EDR should be able to find a way to bridge between coordinate names provided they are real CF.

Can we work on making Xpublish-EDR more flexible here? CF-Xarray could offer a shortcut here.

@mpiannucci
Copy link
Contributor

@abkfenris and I have talked about this before, because a lot of the (not super thought out) abstractions for xpublish-wms overlap a ton with EDR functionality, and same with our newer subsetting prototype plugin.

Both of these are built on top of cf-xarray, which as you mentioned is probably the most important piece here.

@abkfenris
Copy link
Member

Xpublish-EDR should be using CF-Xarray's coordinate criteria to figure out which coordinates correspond to X, Y, and T, and not require datasets to have coordinates be named exactly that.

Here's the attributes on one of my datasets that is playing nice with Xpublish-EDR/CF-Xarray (though my Xpublish server is currently being cranky, so I gotta go figure that out):

image

@jhamman
Copy link
Contributor Author

jhamman commented Jul 16, 2024

Thanks both for your input here. I'm realizing now that I jumped to a conclusion about CF (and cf-xarray) that may not have been correct. I'll do a bit more digging and return with a more concrete request.

@abkfenris
Copy link
Member

I think there is probably a place for configuration on both the plugin and dataset sides.

We could enable setting up the ERD plugin with some custom coordinate criteria so folks could enable matching on patterns that are common in their datasets.

On the dataset side, it would probably make sense to have a standard attribute pattern that can prompt plugins but doesn't get sent to users by plugins that would otherwise share attributes.

ds['funky_time'].attrs['_xpublish']['edr']['axis'] = 'T'

@jhamman
Copy link
Contributor Author

jhamman commented Jul 16, 2024

After looking at this in more detail, I understand the problem -- our dataset is simply missing the axis (or similar) attributes for X, Y, and T.

The feature request I would offer here then is to have some sort of verification tool that runs when registering a dataset with a router. If an error or warning had been issued at startup, this would have been much easier to debug.

@abkfenris
Copy link
Member

Hmm, I don't think we'd want to make it the default, since with dataset providers Xpublish doesn't necessarily know of all datasets at startup. We also don't have a hook currently to call plugins on startup, but after datasets may be available.

How about an endpoint to check if the dataset has the right attributes?

There is an existing issue in CF-Xarray for some sort of checker xarray-contrib/cf-xarray#366 but I don't think we need to be blocked by that. I think we can do a simpler query to see if CF-Xarray has identified the axes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants