-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CMOR_MAX_STRING #530
Comments
Originally, we had different string lengths for different attributes, but that go simplified, as I recall, so that all have the same length 1024 (internally in CMOR). I think there was some issue with these strings occupying lots of storage, so we didn't make it huge. I don't think there is a fundamental reason for limiting the attributes to 1024 characters, but it would be better to not have a bunch of unused space reserved for such things as source_id and experiment_id. Which attribute needs more space? |
We discussed to specify a paper for |
Would it be difficult to double the length of the |
There have been changes made in CMIP6_CV.json in cmip6-cmor-tables that add strings with lengths that exceed CMOR's current max length of 1024. This has caused tests in that repo to fail. I think we should consider increasing @durack1 @doutriaux1 Do you know why 1024 was selected for the maximum string length? Do you think it could be increased now? |
Trouble is CMOR_MAX_LENGTH is for almost all strings created by CMOR, and the internal table is already huge with wasted empty space. I don't think we want to double it. A better solution would be to define different groups of string variables with different max. length. |
@mauzey1 I tend to agree with @taylor13 that we should think a little about memory usage when the |
@taylor13 is right the tables size is now huge we should create groups: CMOR_TINY_LENGTH=16 Or someting like that. |
most can probably fit in the first 3 and it would save memory and speed up things |
@mauzey1 as part of the fix for this, we'll need to implement a test so that the nightly changes to |
I support the @doutriaux1 suggestion above. As I recall the original FORTRAN version was implemented that way. We would need to document the max length of each string under the control of users (or in the CV's read by CMOR). |
@taylor13 with a pure python implementation string lengths would not be fixed at compilation time, so the technical challenge would be removed (python can dynamically size any string as required), but we'd need to implement some guidance so ridiculous entries are flagged/warning at a minimum, and potentially error is particularly egregious cases |
Good point. I understood that we were dropping the FORTRAN option, but didn't realize that nothing would be written in C. Will pure python be fast enough? Can it be parallelized, or aren't we going to pursue that? |
@taylor13 @mauzey1 had planned to test the performance by leveraging the latest libraries associated with xarray, so DASK, etc. From my understanding, the current performance bottlenecks relate to the netcdf library, but this will be important to benchmark as we progress. I know that @matthew-mizielinski has been monitoring the CMOR performance on their new systems, so it will be important to keep track of this as we go |
Hi,
is there a reason why CMOR_MAX_STRING is 1024 or could we increase it?
cmor/include/cmor.h
Line 11 in 829a78a
Best regards,
Fabi
The text was updated successfully, but these errors were encountered: