-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
<version> #267
Comments
@wachsylon an option here would be to hardcode the version identifier, and in that case the line above would become:
You'd have to check to make sure that this correctly parses the path (that "v20171114" is a new subdir beneath <grid_label> When calling CMOR, you could generate your json input to replace and update the tag each time |
@wachsylon -- For CMIP6 the decision was made to independently assign versions at the granularity of individual "atomic datasets" (i.e., a single version number gets assigned to the full time-series of a single variable resulting from a single simulation). Thus, multiple version numbers will usually be associated with a single simulation. There is no plan to assign a single version number at a coarser granularity (e.g., all output from a single simulation). One reason for this is that if a mistake is found in a single variable, it can be withdrawn/updated without requiring republication of the rest of the output from that simulation. ESGF requires, however, that for an individual atomic dataset, all files (making up that datset) be put in a single directory (identified with the version number for that atomic dataset). By default this won't happen if with CMOR you write some files on one day and others on another. To ensure all files in the atomic datset are put in the same "version" subdirectory, please see the discussion at #210 (comment) (beginning at the July 25 entry, and reading through to the end). There you can see how to make sure all files in a single atomic dataset can be assigned the same version number even if they are written on different days. |
Thanks for the quick, detailed and helpful responses, I think I will use the option @durack1 proposed. I also will discuss this next week, I hope this issue can remain open for a few more days in case I have more questions. |
@wachsylon the issue isn't a problem with the CMOR code base, rather it's how users are providing inputs to CMOR so I will close, but feel free to continue to raise queries through this issue, we'll still see your questions and respond |
@dnadeau4 @taylor13 we've hit this problem with the input4MIPs data re-writing, it may be useful to consider checking hard-coded versions (e.g. |
Some relevant discussion also is found at #210 (comment) (begin at the July 25 entry, and read the following 2 comments too). We could help users by implementing 5 changes:
[not sure the name of the input file is correct in the above warning message.]
|
We have received the following email:
I think we should implement the warning and error messages suggested in #267 (comment) now to help guard against these problems. Can the priority of this issue be raised? |
@mauzey1 Would you please prioritize implementing these changes? Thank you. |
I'm transferring an email thread to here:
And my responses: Concerning the "CMOR_PRESERVE" behavior, we should ask @dnadeau4 if he remembers what was intended. The CMOR documentation at https://cmor.llnl.gov/mydoc_cmor3_api/ says
We should either correct the documentation or change CMOR's behavior to be consistent with it. Again, we should consult @dnadeau4 because I vaguely remember someone requesting a change in behavior from the original error exiting. Concerning a user-set version number, I think it may be o.k. if CMOR allows this when the file is being written, but when the PrePARE part of the code executes after the file is written, it should raise an error if the version number is inconsistent with the CMIP6 template (‘v%Y%m%d’). This presumes that PrePARE is run following each CMOR execution. Concerning version dates being in the future, I think we should not allow this (i.e., we should raise an error). |
Why do we need these strict rules? I understood that version is needed for the publication because, if an error is found in the published simulation, it should be possible to publish new versions of it. With this in mind, I think the only strict rule is that the version date is not beyond the date of publication. CMOR is not able to know when the files should be published. So the only required warning is that if the publication date is in the future, the user needs to be informed that the files cannot be published until then. Here it says that the version is "indicating approximate date of model output file". If there is a simulation done in 2017 which should be prepared for CMIP6 in 2018 with a version <v2017..>, why should CMOR exit for this? Also, there can be high resolution models whose simulations take a long time to be finalized. When it comes to the technical implementation, I think that the proposed checks and errors require an attribute for that should be checked. Right now, I believe that the technical implementation of such warnings in CMOR would probably be made by checking whether <ouput_path_template> is equal to the CMIP6 requested one but with an individual (tell me if I am wrong). If we assume that this is possible and CMOR gives errors for some cases,
|
When warning the user that the same file is present in another version that it is currently not writing to, should CMOR warn multiple times if the same file is found in multiple versions? Should this warning ignore version numbers that don't follow the CMIP6 template (‘v%Y%m%d’)? So CMOR should be able to use whatever version number the user specifies but PrePARE should raise an error if a version number is not in the correct format or is a future date? |
@ehogan I have now read through #246, but I'm not sure what I should be paying attention to. I can see they are somewhat related. In particular the CMOR3 documentation and error messages may be inconsistent:
Can you please confirm that the documentation is wrong? |
@wachsylon : I didn't follow the last part of your comment ("technical implementation"). If it is important, could you please re-explain? Regarding future dates, you've made a good case to allow future dates. I'll discuss with @doutriaux1 and @mauzey1, and then propose an algorithm. |
How should But I do not want to point this out to users because the users may be misled into changing the |
The above discussion concerns adding additional files to those already written (and all having been assigned a common version number. The check is to ensure that the "additional files" should have the same version number as the already-written files. I don't think CMOR should check "version" for the first file published in a series. Do you agree? |
Should CMOR retain the version number generated upon writing its first file, and then use that version for subsequent files while CMOR is still running? |
Not necessarily. If additional files are being added to those already written, the version number should be specified by the user ( in "_myVersion" ). Then CMOR should check that this version (date) is consistent with the existing files. If the user doesn't pass CMOR a version number, CMOR should set the version number to the current date and CMOR should perform the checks described in #267 (comment) above. For CMIP6, PrePARe should always check that version follows the template (‘v%Y%m%d’) (and it should do this as part of the CMOR execution as well as during publication). I now think (following above discussion) that the "within 3 days" and "within 4 weeks" rules should allow dates in the future as well as the past. |
Let's hold off on coding these changes. I spoke with Denis and we might have an alternative approach. |
I agree that CMOR should search for other versions and only tell the user if it finds some. I can not set an arbitrary version by specifying If that is, why not introducing such a keyword "my_version" which, if it is specified in the |
@taylor13 @durack1 @doutriaux1 |
We are cmorizing cmip6 data for EC-Earth and are running into the following issues:
Our approach to this is at the moment to sort out the version number after the cmorization process with scripts and for simplicity to keep the same version for all variables in one experiment. Should we need to fix some data later on, we will likely give a new version only to those variables that have been changed. For us, just being able to specify the version manually would be a great help. |
Hi,
in https://cmor.llnl.gov/mydoc_cmor3_c/ you recommend to set:
"output_path_template": "<mip_era><activity_id><institution_id><source_id><experiment_id><_member_id><table><variable_id><grid_label><version>"
where the version has the format "v
<date>
". If a User integrates CMOR in the operational workflow in which model output is produced over a time period of several days, model output will have different<version>
s although it originates from the same model run. This has lead to some confusions. How should a user find a file which contains a particular variable in this structure during the workflow? Furthermore, the directories needs to be merged and a unique version needs to be assigned to the whole experiment output eventually.I suggest three solutions:
<version>
. This depends on how often CMOR is used for small experiments completely finished within a day.<version>
from output_path_template when performing operational experiment simulations and adding the subdirectory before publication. If the<version>
build rule remains, I think this operational issue needs to be pointed out in the documentation for the users.<version>
.What is your opinion? Did I miss something? Is there another reason for the
<version>
build rule?Best regards,
Fabi
The text was updated successfully, but these errors were encountered: