-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactoring the recipe into dict-object #9
Conversation
…edstock into dict_obj_refactor
To close out the night with a success @cisaacstern |
But of course I could not stop myself from adding a few more iids for testing! |
…edstock into dict_obj_refactor
import xarray as xr
path = 'gs://leap-persistent-ro/data-library/cmip6-testing/a618127503-5790271747-1/CMIP6.ScenarioMIP.NIMS-KMA.UKESM1-0-LL.ssp245.r15i1p1f2.day.psl.gn.v20210427.zarr'
ds = xr.open_dataset(path, engine='zarr', chunks={})
ds gives @Timh37 the raw data for this batch is in |
This is awesome, I am beyond happy @cisaacstern thank you so much for your help today. |
@jbusecke this is incredible. Your gif selections had me actually lol'ing at my desk by myself. I'm so happy to see this. You and I have been iterating for a while to find the right design for this, and it's so rewarding to see that we're finally on a tractable path. |
Fantastic, will do that, thank you. By the looks of it that timeseries looks too short though - 21600 daily mean timesteps would be just short of 60 years. A complete 2015-2100 run would be longer? |
@jbusecke Checked the UKESM1-0-LL data and found that most of these datasets are incomplete (in terms of timesteps). As far as I can see the historical & SSP2-4.5, variant r14i1p1f2 stores are complete but the rest isn't. |
Thank for catching that @Timh37. I just picked a random example here:
and it seems like the list of URLs is incomplete. The good news is that this is probably an easy fix upstream (I made these with a notebook and a dev version of pangeo-forge-esgf), and the actual methodology seems to work here. I do not expect this to necessarily fail when we get all files. Ill investigate whats going on. Thanks for helping here! I would have missed this for a while |
Ok I think this is just a short run. I looked it up on the ESGF search web site and got this: |
@jbusecke Great, sounds like it's working as it should then. I don't think such an attribute exists but in principle it's possible to work out what length a simulation should have given its calendar and its start and end dates. If incomplete files are picked up as they are on ESGF that's fine with me, as my processing chain should be able to handle these and eventually filter them out for operations for which I require simulations to fully cover certain time periods. I guess something similar could (or may already) be implemented in xmip? We may want to come up with a way of extending otherwhise complete simulations running to 2099-12 instead of 2100-12 by a year. Some models seem to have f!=1 as their 'main' ScenarioMIP setting, including the UK one. Generally, I search for the 'ipf' with the most 'r' for each model, so that the 'ipf' may differ between models but never between variants or experiments of the same model. So that shouldn't be an issue. |
Closes #30 with jbusecke/pangeo-forge-esgf@d19e800 |
Just diagnosed some issues with "CMIP6.ScenarioMIP.MIROC.MIROC6.ssp245.r47i1p1f1.day.psl.gn.v20210917" This was affected by jbusecke/pangeo-forge-esgf#17. I will again delete the |
As discussed with @cisaacstern I have moved the requirements in the first comments to separate issues and will merge this to make reviewing changes easier. |
This PR is the result of another great pair-programming session with @cisaacstern just now.
I think it would be beneficial to wait for some upstream changes to be cleaned and merged, and merge this PR in a cleaner form.
StoreToZarr
to run Tests as separate stage PR🚧 There still seems to be an issue, waiting on Target store returned fromStoreToZarr
without waiting on completion ofStoreDatasetFragments
pangeo-forge/pangeo-forge-recipes#564.Fix implemented in In
StoreToZarr
, wait forStoreDatasetFragments
before returningtarget_store
pangeo-forge/pangeo-forge-recipes#574Dynamic chunking logic in pangeo-forge-recipes. EDIT: This will not ultimately be implemented in--> Implement dynamic chunking as plug in #35-recipes
, but instead needs to be refactored as a plugin.Upstream changes to pangeo-forge-runner that allow execution of dict-objects with unique jobnames.--> Rebase on newest-runner
main #36Finish the beam-refactor of pangeo-forge-esgf--> Refactor on pangeo-forge-esgf main #38Nice to haves (but not blocking this PR directly):
Cleaner recipe by enabling module imports (Support multiple file dependencies with--> Slim down recipes once module imports are supported #39--setup_file
option pangeo-forge/pangeo-forge-runner#92)Closes Label based runs not compatible with CMIP6 iids #5, Not able to trigger deploy-recipe-action without labels #6 Log successful writes and do not retry #12, Logging successful writes that do not pass the test. #15, Use Dataflow Prime to avoid memory issues? #16, Dynamically generate urls at recipe runtime #17, Avoid rerunning iids that are in the old catalog #18, Outputting a catalog of the stores written #20,
DetermineSchema throws
UnicodeEncodeError if en-dash in ds.attrs #29Won't close #11 anymore. See pangeo-forge/pangeo-forge-recipes#559 for upstream implementation suggestion.