Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cf: if crs key not present, check keys like proj4, wkt, epsg, esri #736

Open
asinghvi17 opened this issue Sep 11, 2024 · 6 comments
Open

Comments

@asinghvi17
Copy link
Collaborator

asinghvi17 commented Sep 11, 2024

This is on the cf branch.

I tried to load a raster where grid_mapping => "crs", but there's no crs field in the dataset, only a proj4 field. It also looks like the current code tries to load a variable in the dataset? somehow? Maybe that's a typo.

Ended up hacking around it by simply replacing

function _layermetadata(ds::AbstractDataset; layers)
map(layers.attrs) do attr
md = _metadatadict(_sourcetrait(ds), attr)
if haskey(attr, "grid_mapping")
md["grid_mapping"] = Dict(CDM.attribs(ds[attr["grid_mapping"]]))
end
md
end
end

with a set of checks on possible crs strings, and it doesn't error if it can't find it, only warns.

function _layermetadata(ds::AbstractDataset; layers)
    map(layers.attrs) do attr
        md = _metadatadict(_sourcetrait(ds), attr)
        if haskey(attr, "grid_mapping")
            gm_str = attr["grid_mapping"]
            md["grid_mapping"] = if haskey(ds, gm_str)
                Dict(CDM.attribs(ds[gm_str]))
            elseif haskey(CDM.attribs(ds), gm_str)
                Dict(gm_str => CDM.attribs(ds)[gm_str])
            elseif gm_str == "crs"
                if haskey(ds, "crs")
                    Dict("crs" => CDM.attribs(ds)["crs"])
                elseif haskey(ds, "proj4")
                    Dict("crs" => CDM.attribs(ds)["proj4"])
                end
            else
                @warn "Could not find crs/."
            end
        end
        md
    end
end

Is this a solution that makes sense, or should we just not do it? It doesn't actually read the CRS still.

@rafaqz
Copy link
Owner

rafaqz commented Sep 12, 2024

What's in the spec? I thought the grid_mapping variable was a key that linked to a shared crs string for the whole dataset.

@asinghvi17
Copy link
Collaborator Author

asinghvi17 commented Sep 12, 2024

Edited after reading the spec

What happens if that crs variable doesn't exist?

The spec does say it's supposed to be a variable:
https://cfconventions.org/cf-conventions/cf-conventions.html#grid-mappings-and-projections

but this particular dataset does not have that variable (though it does have a global proj4 field). It shouldn't error on load at least.

@asinghvi17
Copy link
Collaborator Author

asinghvi17 commented Sep 12, 2024

Here's an overview of the dataset, for reference:

julia> zg = zopen("reference://nwm_reanalysis.json", "r")
ZarrGroup at ReferenceStore with 2939545 references
 and path 
Variables: time elevation qBtmVertRunoff q_lateral qSfcLatRunoff streamflow feature_id velocity latitude qBucket order longitude 

julia> zg["velocity"].attrs
Dict{String, Any} with 10 entries:
  "missing_value"     => -999900
  "coordinates"       => "longitude latitude"
  "units"             => "m s-1"
  "add_offset"        => 0.0
  "_ARRAY_DIMENSIONS" => Any["time", "feature_id"]
  "long_name"         => "River Velocity"
  "grid_mapping"      => "crs"
  "scale_factor"      => 0.01
  "_Netcdf4Dimid"     => 0
  "valid_range"       => Any[0, 5000000]

julia> zg.attrs
Dict{String, Any} with 19 entries:
  "TITLE"                     => "OUTPUT FROM WRF-Hydro v5.2.0-beta2"
  "code_version"              => "v5.2.0-beta2"
  "stream_order_output"       => 1
  "proj4"                     => "+proj=lcc +units=m +a=6370000.0 +b=6370000.0 +lat_1=30.0 +lat_2=60.0 +lat_0=40.0 +lon_0=-97.0 +x_0=0 +y_0=0 +k_0=1.0 +…
  "cdm_datatype"              => "Station"
  "model_initialization_time" => "1979-02-01_00:00:00"
  "model_output_valid_time"   => "1979-02-01_01:00:00"
  "dev_channelBucket_only"    => 0
  "model_configuration"       => "retrospective"
  "featureType"               => "timeSeries"
  "dev_channel_only"          => 0
  "Conventions"               => "CF-1.6"
  ⋮                           => ⋮

julia> zg.attrs |> println
Dict{String, Any}("TITLE" => "OUTPUT FROM WRF-Hydro v5.2.0-beta2", "code_version" => "v5.2.0-beta2", "stream_order_output" => 1, "proj4" => "+proj=lcc +units=m +a=6370000.0 +b=6370000.0 +lat_1=30.0 +lat_2=60.0 +lat_0=40.0 +lon_0=-97.0 +x_0=0 +y_0=0 +k_0=1.0 +nadgrids=@", "cdm_datatype" => "Station", "model_initialization_time" => "1979-02-01_00:00:00", "model_output_valid_time" => "1979-02-01_01:00:00", "dev_channelBucket_only" => 0, "model_configuration" => "retrospective", "featureType" => "timeSeries", "dev_channel_only" => 0, "Conventions" => "CF-1.6", "model_total_valid_times" => 1416, "station_dimension" => "feature_id", "dev_NOAH_TIMESTEP" => 3600, "dev_OVRTSWCRT" => 1, "_NCProperties" => "version=2,netcdf=4.7.4,hdf5=1.10.7,", "model_output_type" => "channel_rt", "dev" => "dev_ prefix indicates development/internal meta data")

and here's an overview from ZarrDatasets:


julia> zd = ZarrDataset(zg)
Dataset: 
Group: root

Dimensions
   time = 367439
   feature_id = 2776738

Variables
  time   (367439)
    Datatype:    Dates.DateTime (Int32)
    Dimensions:  time
    Attributes:
     units                = minutes since 1970-01-01 00:00:00 UTC
     NAME                 = time
     long_name            = valid output time
     _Netcdf4Dimid        = 1
     standard_name        = time
     valid_min            = 4777980
     valid_max            = 4862880

  elevation   (2776738 × 367439)
    Datatype:    Union{Missing, Float32} (Float32)
    Dimensions:  feature_id × time
    Attributes:
     units                = meters
     coordinates          = longitude latitude
     long_name            = Feature Elevation
     _Netcdf4Dimid        = 0
     standard_name        = Elevation
     _FillValue           = 0.0

  qBtmVertRunoff   (2776738 × 367439)
    Datatype:    Union{Missing, Float64} (Int32)
    Dimensions:  feature_id × time
    Attributes:
     missing_value        = -9999000
     coordinates          = longitude latitude
     units                = m3
     add_offset           = 0.0
     long_name            = Runoff from bottom of soil to bucket
     grid_mapping         = crs
     scale_factor         = 0.0010000000474974513
     _Netcdf4Dimid        = 0
     _FillValue           = 0
     valid_range          = Any[0, 20000000]

  q_lateral   (2776738 × 367439)
    Datatype:    Union{Missing, Float64} (Int32)
    Dimensions:  feature_id × time
    Attributes:
     missing_value        = -99990
     coordinates          = longitude latitude
     units                = m3 s-1
     add_offset           = 0.0
     long_name            = Runoff into channel reach
     grid_mapping         = crs
     scale_factor         = 0.10000000149011612
     _Netcdf4Dimid        = 0
     _FillValue           = 0
     valid_range          = Any[0, 500000]

  qSfcLatRunoff   (2776738 × 367439)
    Datatype:    Union{Missing, Float64} (Int32)
    Dimensions:  feature_id × time
    Attributes:
     missing_value        = -999900000
     coordinates          = longitude latitude
     units                = m3 s-1
     add_offset           = 0.0
     long_name            = Runoff from terrain routing
     grid_mapping         = crs
     scale_factor         = 9.999999747378752e-6
     _Netcdf4Dimid        = 0
     _FillValue           = 0
     valid_range          = Any[0, 2000000000]

  streamflow   (2776738 × 367439)
    Datatype:    Union{Missing, Float64} (Int32)
    Dimensions:  feature_id × time
    Attributes:
     missing_value        = -999900
     coordinates          = longitude latitude
     units                = m3 s-1
     add_offset           = 0.0
     long_name            = River Flow
     grid_mapping         = crs
     scale_factor         = 0.009999999776482582
     _Netcdf4Dimid        = 0
     _FillValue           = 0
     valid_range          = Any[0, 5000000]

  feature_id   (2776738)
    Datatype:    Int32 (Int32)
    Dimensions:  feature_id
    Attributes:
     cf_role              = timeseries_id
     long_name            = Reach ID
     _Netcdf4Dimid        = 0
     comment              = NHDPlusv2 ComIDs within CONUS, arbitrary Reach IDs outside of CONUS
     NAME                 = feature_id

  velocity   (2776738 × 367439)
    Datatype:    Union{Missing, Float64} (Int32)
    Dimensions:  feature_id × time
    Attributes:
     missing_value        = -999900
     coordinates          = longitude latitude
     units                = m s-1
     add_offset           = 0.0
     long_name            = River Velocity
     grid_mapping         = crs
     scale_factor         = 0.009999999776482582
     _Netcdf4Dimid        = 0
     _FillValue           = 0
     valid_range          = Any[0, 5000000]

  latitude   (2776738)
    Datatype:    Union{Missing, Float32} (Float32)
    Dimensions:  feature_id
    Attributes:
     units                = degrees_north
     long_name            = Feature latitude
     _Netcdf4Dimid        = 0
     standard_name        = latitude
     _FillValue           = 0.0

  qBucket   (2776738 × 367439)
    Datatype:    Union{Missing, Float64} (Int32)
    Dimensions:  feature_id × time
    Attributes:
     missing_value        = -999900000
     coordinates          = longitude latitude
     units                = m3 s-1
     add_offset           = 0.0
     long_name            = Flux from gw bucket
     grid_mapping         = crs
     scale_factor         = 9.999999747378752e-6
     _Netcdf4Dimid        = 0
     _FillValue           = 0
     valid_range          = Any[0, 2000000000]

  order   (2776738 × 367439)
    Datatype:    Union{Missing, Int32} (Int32)
    Dimensions:  feature_id × time
    Attributes:
     coordinates          = longitude latitude
     long_name            = Streamflow Order
     _Netcdf4Dimid        = 0
     standard_name        = order
     _FillValue           = 0

  longitude   (2776738)
    Datatype:    Union{Missing, Float32} (Float32)
    Dimensions:  feature_id
    Attributes:
     units                = degrees_east
     long_name            = Feature longitude
     _Netcdf4Dimid        = 0
     standard_name        = longitude
     _FillValue           = 0.0

Global attributes
  TITLE                = OUTPUT FROM WRF-Hydro v5.2.0-beta2
  code_version         = v5.2.0-beta2
  stream_order_output  = 1
  proj4                = +proj=lcc +units=m +a=6370000.0 +b=6370000.0 +lat_1=30.0 +lat_2=60.0 +lat_0=40.0 +lon_0=-97.0 +x_0=0 +y_0=0 +k_0=1.0 +nadgrids=@
  cdm_datatype         = Station
  model_initialization_time = 1979-02-01_00:00:00
  model_output_valid_time = 1979-02-01_01:00:00
  dev_channelBucket_only = 0
  model_configuration  = retrospective
  featureType          = timeSeries
  dev_channel_only     = 0
  Conventions          = CF-1.6
  model_total_valid_times = 1416
  station_dimension    = feature_id
  dev_NOAH_TIMESTEP    = 3600
  dev_OVRTSWCRT        = 1
  _NCProperties        = version=2,netcdf=4.7.4,hdf5=1.10.7,
  model_output_type    = channel_rt
  dev                  = dev_ prefix indicates development/internal meta data

@rafaqz
Copy link
Owner

rafaqz commented Sep 12, 2024

Hm I thought grid_mapping = crs meant there was a field called "crs" that held the crs.

Maybe there's a convention to name them e.g. proj4 etc to identify the type? I'm not sure this stuff is fully official. We need somewhere in writing what this should be.

@asinghvi17
Copy link
Collaborator Author

So I just found out that the data structure I encountered is very much not official.

That being said it still works in Python so we should probably look at that.

@rafaqz
Copy link
Owner

rafaqz commented Oct 10, 2024

Yeah, there are lots of hacks to get around the netcdf CF.

I'm happy to check for those keys, but would be good to see the python code that's generating them to get it right

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants