Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HIGHLY DESIRED: python cmor example for variables on native ocean tripolar grid #766

Open
ilaflott opened this issue Dec 17, 2024 · 30 comments
Assignees

Comments

@ilaflott
Copy link

I've been doing some work for NOAA-GFDL trying to "CMORize" certain types of data produced by our models and post-processing software. Data on the native tripolar ocean grid is one of the last test cases i need to facilitate.

I've been attempting to be a good user about figuring this out- i.e., doing my homework. But at this point I need a human to talk to, even if just to confirm i'm on the right track.

I've studied the CF conventions... I've found similar auxiliary-coordinate examples elsewhere (e.g. Project Pythia)... I found the aux-coordinate example on a normal lat/lon grid and all the other ones in the CMOR API doc... I found all the coordinate information I need from my people and how to get _bnds from it properly. I found the two small super-important sentences in the CMOR API doc.

I also studied a closed/stale issue (#472 from 2019) quite extensively, finding a new idea to try. I haven't gone through with it yet (about to), but this was far too difficult a scenario to find information on.

@taylor13 suggested it would be nice to add an example handling exactly this case to the CMOR API doc, and I completely concur. A concrete example would be of high utility to me, and others that are attempting to CMORize their tripolar ocean data for MIP-publishing workflows.

@mauzey1 mauzey1 self-assigned this Dec 17, 2024
@taylor13
Copy link
Collaborator

As a suggestion (but only for consideration) that perhaps Chris @mauzey1 might be able to work with Ian @ilaflott to stand up an example of how to store tripolar ocean grid data with CMOR.

@ilaflott
Copy link
Author

I am absolutely thrilled to continue to be a good user about things.

I'm pondering a minimal case showing my current problems with the dummy-index approach. I'm bumping into monotonicity concerns, or min/max val checks. when i try to avoid one, i run smack-dab into the other.

@durack1
Copy link
Contributor

durack1 commented Dec 18, 2024

@ilaflott this is MOM6 output right?

I wonder whether we could liberate an example from the CSIRO folks, who have been using (MOM5, I think) in their ACCESS configs.. It might be a good idea to start with a working example, and then tweak.. @chloemackallah any tips you have for us, or examples you can point to in https://github.com/ACCESS-Community-Hub/APP4?

@mauzey1
Copy link
Collaborator

mauzey1 commented Dec 18, 2024

@ilaflott

I'm not sure if this is exactly what you are looking for but there is a Python test in CMOR that demonstrates the use of lat-lon grids with vertices.

https://github.com/PCMDI/cmor/blob/main/Test/test_python_grid_and_ocn_sigma.py

@ilaflott
Copy link
Author

ilaflott commented Dec 19, 2024

Thanks for that- studying your test now too! I hope that upon doing so, it "clicks" on my end, and I can make my case work.

But also, consider it: the only case on my end that i could not figure out, is the ones NOT covered by what's super-easily-searchable in https://cmor.llnl.gov/mydoc_cmor3_python/.

What I truly desire as an outcome, is for the aforementioned link to be updated with an example fully blessed by the primary cmor caretakers. I will personally by hand test your example and help you debug it.

@hot007
Copy link

hot007 commented Dec 19, 2024

@ilaflott this is MOM6 output right?

I wonder whether we could liberate an example from the CSIRO folks, who have been using (MOM5, I think) in their ACCESS configs.. It might be a good idea to start with a working example, and then tweak.. @chloemackallah any tips you have for us, or examples you can point to in https://github.com/ACCESS-Community-Hub/APP4?

Hey @durack1 , just stepping in here... so @chloemackallah handed APP4 over to ACCESS-NRI but it was basically used only for CMIP6. @paolap (formerly) at the Centre of Excellence for Climate Extremes recently released ACCESS-MOPPeR (https://access-mopper.readthedocs.io/en/latest/) which builds on it and can extend beyond CMOR use to other experiments via custom tables, and is probably what you want to base anything on if you want to go poking in ACCESS post-processing, as it's what we're hoping will form the basis for the CMIP7 postprocessor :) (it cleans up some of the residual 'mess' in APP4 too which was not Chloe's fault but you can always do better with hindsight!)
Specifically for (tripolar) ocean maybe this is of interest? https://github.com/ACCESS-Community-Hub/ACCESS-MOPPeR/blob/9b619e408c6d309f7e8f492fc9c1593f8d276ed7/src/mopper/calc_ocean.py

@durack1
Copy link
Contributor

durack1 commented Dec 19, 2024

marvellous, thanks @hot007! This is a great start, @ilaflott the CMOR user base comes to your rescue?

If you guys have test/example cases etc that you'd like to contribute to the repo, I think we'd definitely welcome them :)

@taylor13
Copy link
Collaborator

I would note that we have had a similar request from an obs4MIPs contributor who would like to store data on a hemispheric EASE2 grid. I've copied below some email exchanges about this:

2/1/24
Hi,

We're planning on preparing one or more of our sea-ice concentration
datasets for obs4MIPS and was wondering if you had some pointers?

The dataset will be of a smilar format to
https://catalogue.ceda.ac.uk/uuid/eade27004395466aaa006135e1b2ad1a/

so a on a 12.5 km resolution EASE2 grid split into NH and SH files, one
per day, and the relevant variable being
"""
int ice_conc(time, yc, xc) ;
ice_conc:_FillValue = -32767 ;
ice_conc:long_name = "fully filtered concentration of sea ice
using atmospheric correction of brightness temperatures and open water
filters" ;
ice_conc:standard_name = "sea_ice_area_fraction" ;
ice_conc:units = "%" ;
ice_conc:valid_min = 0 ;
ice_conc:valid_max = 10000 ;
ice_conc:grid_mapping = "Lambert_Azimuthal_Grid" ;
ice_conc:ancillary_variables = "total_standard_uncertainty
status_flag" ;
ice_conc:scale_factor = 0.01 ;
ice_conc:comment = "this field is the primary sea ice
concentration estimate for this climate data record" ;
ice_conc:coordinates = "time lat lon" ;
"""

I've started experimenting a bit with CMOR, and it looks to me like all
the other inputs are on a lat/lon grid, so I'm wondering if that's a
requirement and our data would need to be resampled? Trying to use the
lat/lon as they are only give errors like
!!!!!!!!!!!!!!!!!!!!!!!!!
!
! Error: axis latitude (table: obs4MIPs_SImon) has non monotonic values:
! 16.723207, 16.623869 and 16.723438
!
!!!!!!!!!!!!!!!!!!!!!!!!!
And I haven't see grid coordinates like xc/yc anywhere in the json files in
https://github.com/PCMDI/obs4MIPs-cmor-tables/

Also, would the NH/SH files need to be combined for a "global" files, or
could they be separate? Our NH files have "geospatial_lat_min =
16.62393" and SH has "geospatial_lat_max = -16.62393", so we have no
data in a belt around the equator.

Anyway, if you have some advice I'd greatly appreciate it!

A.


2/1/24
Hi A,

There may be a file naming problem if data from the two hemispheres is separated into 2 files. Under the currently proposed obs4MIPs file name template, the file names would be the same. I have been advocating for CMIP7 that we include in the file name a "region" identifier. If we had that in CMIP6, we would have needed the following identifiers:

glb: global domain
ant: Antarctica
grn: Greenland
nh: Northern Hemisphere
sh: Southern Hemisphere

Similarly, CORDEX also needs a "region" identifier in its filenames.

We are currently finalizing the global attribute list and file naming convention for obs4MIPs. Perhaps Peter can suggest how to best proceed in the mean time.

CMOR's is able to write data on grids other than those described by a rectangular grid of longitudes and latitudes, but its capabilities are quite limited and it doesn't make use of recent augmentation of the CF conventions to include more grid-defining information. If you must avoid regridding, we would have to think about how to best proceed.

You could avoid the both of the above issues by regridding the data in each hemisphere to a regular latitude x longitude grid and combining the two hemispheres into a single file. You could fill the tropical region with either 0 (presuming no sea ice there) or with the missing value indicator. This would make it easier for most users to ingest the data, although it would result in a larger file size and imply more resolution in the polar regions than you really have.

Happy to iterate in finding a way forward,
Karl

P.S. To be clear, I think it would be idea if for obs4MIPs we also included a "region" identifier in the filenames, but that is inconsistent with the current template.


2/2/24
Hi A,

Thank you for your query. We certainly want to help you advance your planned contribution to obs4MIPs. As you have noticed, for obs4MIPs to date we have only been using CMOR with regular lon x lat grids, and additionally there are several important issues raised by Karl that we need to consider for this special case. The good news is the timing is good for us to be thinking through this. We will discuss some options and get back to you next week.

All the best,
Peter

Hi A,

Just a quick update to let you know that efforts are underway to construct an example use of CMOR for your application. Members of the WIP are helping us expedite this. We have identified solutions to the other questions Karl Taylor raised below. We will keep you posted on our progress with this. Many thanks for your patience.
PG


11/25/24
Hi again,

Now that I should start working on this again for the CCI+ project I was wondering if there was any update about the best way to go forward with hemispheric data on equal-area (ie non-regular lat/lon) projection?

Best regards,
A


12/19/24
Dear Obs4MIPs colleagues,

Can we please conclude on this? The AR7 FastTrack process is rolling, and there is an incentive to get the reference observation datasets into Obs4MIPS.

As far as I can see, all we need is a decision / clarification from Obs4MIPS about the handling of polar gridded data: Can they be added to Obs4MIPS in their polar-focused grids (better for the representativity of polar regions), or must they be re-projected to a regular lat/lon grid first (losing detail in the polar regions).

Please refer to the threads of emails below: there has been some progress towards handling polar-gridded data, but no firm decision as far as we can see.

Understandably, we have a preference for the polar grids. But will also accept going for the lat/lon version if it can speed ingestion in Obs4MIPS up.

What do you think?
T


12/19/24
Hi T,

Thanks for bringing this important issue back to our attention. It is not a matter of decision making but rather being able to point you to an example of how to use CMOR with polar grids. We have consulted with the WIP on this and were hopeful we were on a path to have an example soon, so we are now revisiting that possibility. We know it is not difficult – many modeling groups provide their ocean model output on native grid – we just need to get the attention of somebody with the right experience and availability to help.

There is of course the possibility of providing a regridded product, but this would limit the value of your data for model evaluation, so like you I hope that can be avoided. There is also the possibility that you attempt to follow the ODS2.5 data specification without CMOR, but that too is not ideal as would likely require a lot more work to ensure the standards are met which is important to how the data is managed, delivered and used.

So, apologies for this situation but maybe you can give us another chance to provide an example soon? At least you have our attention!

Sincerely,
PG


@ilaflott
Copy link
Author

Given certain responses, I feel I should clarify- I am not a climate scientist, and I am not an oceanographer. I'm an ex-physicist working with NOAA-GFDL in a non-scientific capacity as a software engineer. My Ph.D. was in particle/nuclear physics, and overlap with this field was minimal at best.

The point I'm trying to drive home is this: as a non-scientific contributor, and a non-expert, there is no replacement for well-thought out documentation, as blessed by the experts of the field. Again, this is the only CMORization case of mine that has literally kept me awake at night.

For my other 8 cases, four went down without a problem, the next three took understanding netCDF better, and one required me understanding what in the world olevel was getting at. There was an example in your doc that covered each of these 8 cases. For the tripolar ocean file- I have not succeeded, in two weeks.

PS: I will also bump what @taylor13 is highlighting (i think): ice-realm models are coming, and many will use a tripolar grid (in my current best non-scientific-expert understanding).

PPS: follow up code coming, last 24 hours have been weird. I'm wearing more hats all of a sudden than I am used to. patience appreciated.

PPPS: thank you for your time and energy, really! thank you!!! please help with doc :-)

@ilaflott
Copy link
Author

@taylor13 has created #767 to focus on discussion, referencing here for ease

@mauzey1
Copy link
Collaborator

mauzey1 commented Dec 19, 2024

@ilaflott Did the code that I posted provided at least some hints?

I don't know much about how "tripolar grids" are made (I'm not a climate scientist, either). I just looked at netCDF files for tripolar data like the ones here and then searched for code that produced similar files.

@ilaflott
Copy link
Author

@ilaflott Did the code that I posted provided at least some hints?

I don't know much about how "tripolar grids" are made (I'm not a climate scientist, either).
I just looked at netCDF files for tripolar data like the ones here and then searched for code that produced similar files.

Yeah, the ESGF meta grid i've been exploring. I do NOT need to make a tripolar grid. the grid is "given" to me. I do NOT expect cmor to create a tripolar grid for me at all!!!

For the sake of our discussion in here, consider cmip6-cmor-tables and specifically, table CMIP6_Omon.json, variable sos. I need to do some file-wrangling given internal conventions to get to what I'm showing you, but that's my responsibility :-).

So, assume in my file, if i ncdump -h it, and cut out some irrelevant things, it looks like this:

dimensions:
	time = UNLIMITED ; // (60 currently)
	nv = 2 ;
	yh = 1161 ;
	xh = 1440 ;
	vertex = 4 ;
variables:

	double nv(nv) ;
		nv:long_name = "vertex number" ;
	float sos(time, yh, xh) ;
		sos:_FillValue = 1.e+20f ;
		sos:missing_value = 1.e+20f ;
		sos:units = "psu" ;
		sos:long_name = "Sea Surface Salinity" ;
		sos:cell_methods = "area:mean yh:mean xh:mean time: mean" ;
		sos:cell_measures = "area: areacello" ;
		sos:time_avg_info = "average_T1,average_T2,average_DT" ;
		sos:standard_name = "sea_surface_salinity" ;
	double time(time) ;
		time:units = "days since 0001-01-01 00:00:00" ;
		time:long_name = "time" ;
		time:axis = "T" ;
		time:calendar_type = "NOLEAP" ;
		time:calendar = "noleap" ;
		time:bounds = "time_bnds" ;
	double time_bnds(time, nv) ;
		time_bnds:units = "days since 0001-01-01 00:00:00" ;
		time_bnds:long_name = "time axis boundaries" ;
	double xh(xh) ;
		xh:units = "degrees_east" ;
		xh:long_name = "h point nominal longitude" ;
		xh:axis = "X" ;
	double yh(yh) ;
		yh:units = "degrees_north" ;
		yh:long_name = "h point nominal latitude" ;
		yh:axis = "Y" ;
...
...
...
	float lat(yh, xh) ;
	float lon(yh, xh) ;
	float lat_bnds(yh, xh, vertex) ;
	float lon_bnds(yh, xh, vertex) ;

Now, when i try to define cmor.axis using e.g. lat and or lon, i'm going to get a "expecting 1D array" error. I cannot use xh or yh because those are not the regridded quantities.

OK- so now we try a dummy-index: add in the following to the code to create the dummy indices (only j shown), note the commented-out region:

        j_ind = ds.createVariable('j_index', np.int32, ('yh') )
        print(f'                          np.arange...')
        #j_ind[:] = np.zeros(len(yh), dtype=int )
        j_ind[:] = np.arange(0, len(yh), dtype=np.int32 ) 

then I do:

        cmor.axis("j_index", vales=j_ind[:], units="1")

and this will error out depending on the commented out line. If I use np.zeros, I get a monotonicity error. If I use np.arange to assign the index 0 1 2 3 ..., I will get a max_val error.

Additionally, there's an ambiguity in how I should use CMIP6_Omon.json- an entry for j_index only exists in CMIP6_grids.json, from my understanding. OK... so should I call cmor.setup_table? or cmor.load_table? when should I call that? how does the meta-data between the two tables talk-to each other? does one have precedence over the other when redundant field values are present?

I don't think I've even gotten to the part where I can start feeling out cmor.grid and cmor.set_grid_mapping from the "outside", which is suggested as necessary by current documentation. But i'm almost certainly sure- there's no simple, single formula (without step functions) that projects from the one coordinate system to spherical lat-lon. And I'm relatively sure set_grid_mapping doesn't facillitate step-functions.

@taylor13
Copy link
Collaborator

My recall of this part of CMOR is vague, but I've reviewed the documentation for cmor.grid here, and I would try the following:

I think you do need to invoke cmor.grid to define your grid, but I don't think you need to define grid mapping parameters, which apply to "grid mappings" which I think do not apply to the tripolar grid.

My guess is that your "sos" data array has 3 logical dimensions, 1 for time and 2 (say, yh and xh) to spatially locate each data value. You need to first define simple "index" dimensions by calling cmor.axis and referencing table CMIP6_grids.json, where the axes, i_index, j_index, k_index, ... are defined, each representing one of the dimensions of a multi-dimensional array. If xh is considered the 1st dimension, then invoke cmor.axis with table_entry="i_index" and length set to the length of the xh dimension. Similarly, for yh, but with the entry set to "j_index".

Then, you're ready to execute cmor.grid following the documentation here.

And then you can define your "sos" variable with cmor.variable as described here .

Perhaps others can check whether this guidance is rubbish.

@hot007
Copy link

hot007 commented Dec 20, 2024

hi @ilaflott
I don't know CMOR specifics because clever people have written wrappers as honestly, even for people with climate science backgrounds, CMOR can be hard to deal with! (and at least half of us also just land here out of astronomy/physics and have to learn it on the job!). From my limited understanding of what MOPPeR/APP do though for ACCESS (which uses MOM as its ocean component) is that @taylor13 is on the money with the ^^ described use of CMO6_grids.json to define the tripolar axes, and from there things should be able to be made to work, with appropriate swearing and ritual sacrifice. I feel like I'm the wrong person to talk to as I'm not currently involved in CMIP modelling and so haven't actually run MOPPeR myself but we're in a bit of a flux period here at the moment or I'd point you to someone more relevant!

@ilaflott
Copy link
Author

thank you all for your help, i need to rescope my PR on my end, and spin off the effort to work on this case into a separate pull/issue. So it's gonna be a tad longer on my end.

But mark my words I'm coming back to this. My processing this file flawlessly has become a personal vendetta, and it will vastly improve things in our workflow on our end, WHEN i make it work. I'll pay it forward when I get it working.

@mauzey1 I'm looking at that code again, and I have some things to try.

@taylor13 thank you very much for the detailed reply. that also gives me some things to try.

@hot007 sorry I didn't mean to seem like i was ignoring, I just didn't want to reply until I gave that package/doc an honest peek and digested it a little (slow reader). TBH I am still poking through the doc, and the CMOR notes there are notably detailed, thank you! I'm not sure ACCESS-MOPPeR has what I need, but I'll keep reading and keep an open mind.

@taylor13
Copy link
Collaborator

Hi @ilaflott,

I missed that Chris had already pointed you to a example test code that I think could easily be modified to write your tripolar field.

My discussion above of the "index" approach to defining axes applies to the most general unstructured grid when the native coordinates are of no interest. For the tripolar grid, the native coordinate values and bounds are probably of interest, in addition to the longitudes and latitudes and their vertices, so I would follow the example Chris provided, rather than my "index" approach.

@mauzey1
Copy link
Collaborator

mauzey1 commented Dec 20, 2024

I will make a Python CMOR demo that will write a file similar to what is happening here. We'll added it to our doc site examples to help others in the future.

@mauzey1
Copy link
Collaborator

mauzey1 commented Dec 20, 2024

@ilaflott

I've made a new version of the code that I posted earlier but more geared towards the variable you are trying to write.

https://github.com/PCMDI/cmor/blob/0d12afb7c947acfd18995c286f9d72e03e754df7/Test/test_cmor_python_omon_sos_grid.py

@hot007
Copy link

hot007 commented Dec 23, 2024

Haha no offense taken @ilaflott , ACCESS-MOPPeR is written specifically for the ACCESS model which is broadly speaking UM+MOM, it is likely of little utility to other GCMs, however just wanted to throw it in there to demonstrate there are MOM (tripolar) CMORisers out there :) You can indeed make this work :)

@mauzey1
Copy link
Collaborator

mauzey1 commented Jan 7, 2025

@ilaflott

Does this new test for setting up the grid values help with CMORizing tripolar data? If so, we can start the process of making the test another Python example for CMOR here.

@durack1
Copy link
Contributor

durack1 commented Jan 7, 2025

@gleckler1 pinging you on this thread!

@gleckler1
Copy link

This looks very encouraging! I will try it out soon.

@ilaflott
Copy link
Author

ilaflott commented Jan 8, 2025

i was beginning to parse the other unit test yesterday- this is much cleaner. i'm sure this will help!

@gleckler1
Copy link

gleckler1 commented Jan 8, 2025

After collecting and appropriately modifying the required jsons for this test, I was able to successfully execute resulting in the following
sos_Omon_PCMDI-test-1-0_piControl-withism_r3i1p1f1_gn_198001-198002.txt

@durack1
Copy link
Contributor

durack1 commented Jan 9, 2025

@gleckler1 just cleaned that up into a file attachment. It looks like a too simple a case to me, as the x and y coords are just vectors, and not arrays, with spatially varying sizes - exactly what the tripolar data looks like - the x and y are indices into a lon and lat variable that are an array

@gleckler1
Copy link

These are the results from the code ilaflott pointed to 3 weeks ago (above) that generates sample data. But I agree with you it is too simple an example. I'm attempting to follow the logic with OSI-SAF.

@ilaflott
Copy link
Author

ilaflott commented Jan 9, 2025

@ilaflott

Does this new test for setting up the grid values help with CMORizing tripolar data? If so, we can start the process of making the test another Python example for CMOR here.

Thank you so much!!! i think it works! If i can comment on what in your (cleaner) example was so helpful:

  • this was essential, it made it clear that cmor.grid is sufficient, with no need for a call to cmor.set_grid_mapping. Indeed, the information is enough to define the mapping between the coordinate systems, set_grid_mapping is from what i understand, about defining a formula to write to the netcdf file.
  • this part made me realize- THIS is how one avoids the error "y_deg not in table CMIP6_Omon.json". Naturally, before i read this line closely, the axis_ids i set were for that of [y_axis_id, x_axis_id, time_axis_id]

IMO, i feel my concerns are addressed. The example pointed to in this comment could essentially be copy/pasted into the examples section. Though perhaps for consistency, you'll want to do some style-matching, but that seems to me a nice-to-have.

@durack1
Copy link
Contributor

durack1 commented Jan 10, 2025

@ilaflott if you have a nice NOAA-GFDL example (native model data to CMOR) that we could reproduce in our demos, it would be great to grab it - a notebook would be ideal. There are numerous groups using MOMx, so having that on hand could help many of them, assuming they're also using a similar tripolar grid.

Once we have that in place, we can close out this issue with a PR/fix

@ilaflott
Copy link
Author

ilaflott commented Jan 10, 2025

I guess one more question to tack on while everyone is here, if you don't mind me checking my understanding.

but lets say instead of (lat lon time) for Omon / sos, we also had a vertical direction and we needed to do ips (interpolated pressures?) processing.

Would that basically change the lines here in example 6 from:

ips = cmor.zfactor(zaxis_id=cmorLev,
                   zfactor_name='ps',
                   axis_ids=[cmorTime, cmorLat, cmorLon],
                   units='Pa')

to:

cmor.set_table(grid_table)
cmorGrid = cmor.grid(foo,bar,baz...)
...
...
...
cmor.set_table(Omon_table)
ips = cmor.zfactor(zaxis_id=cmorLev,
                   zfactor_name='ps',
                   axis_ids=[cmorTime, cmorGrid],
                   units='Pa')

@ilaflott
Copy link
Author

ilaflott commented Jan 10, 2025

@ilaflott if you have a nice NOAA-GFDL example (native model data to CMOR) that we could reproduce in our demos, it would be great to grab it - a notebook would be ideal. There are numerous groups using MOMx, so having that on hand could help many of them, assuming they're also using a similar tripolar grid

So I need to check with a few others on my side before releasing data, even if just as a sample/example. But, from what i understand, I am working with post-processed MOMx data, and not "fresh-off-the-stove" MOMx data. I.e. if i give you what I have, might not generalize to non-postprocessed MOMx data.

But, really, the randomly generated data in @mauzey1's irregular grid generation example has the right mathematical properties- a lot of the extra stuff I am doing has to do with internal-convention-wrangling.

edit: on that note, i've followed up via email regarding the sample data

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants