Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ncdf4-based nc writing for disaggregation.R to avoid memory spike #630

Merged
merged 10 commits into from
Feb 28, 2024

Conversation

pascal-sauer
Copy link
Contributor

@pascal-sauer pascal-sauer commented Feb 21, 2024

🐦 Description of this PR 🐦

disaggregation.R was frequently causing crashes due to memory requirements. It seems there was a very brief memory spike when terra::writeCDF was called by write.magpie, this is however not visible via sacct/maxrss when running via maxmem (where it does not crash). It is a known issue that sacct/maxrss does not always catch brief memory spikes. Running without maxmem consistently leads to crashes nevertheless, whereas the approach in this PR never crashes when run without maxmem. By adding a bunch of rm calls the maxrss was reduced to 3.7GB from 4.9GB. (For completeness: the old raster based approach used 6.4GB and also took 2min/33% longer.)

We should also consider if we want to switch to this way of writing nc files in write.magpie in general in order to become independent of terra in that regard, and get more control over the written nc files at the same time.

🔧 Checklist for PR creator 🔧

  • Label pull request from the label list.

    • Low risk: Simple bugfixes (missing files, updated documentation, typos) or changes in start or output scripts
    • Medium risk: Uncritical changes in the model core (e.g. moderate modifications in non-default realizations)
    • High risk: Critical changes in model core or default settings (e.g. changing a model default or adjusting a core mechanic in the model)
  • Self-review own code

    • No hard coded numbers and cluster/country/region names.
    • The new code doesn't contain declared but unused parameters or variables.
    • magpie4 R library has been updated accordingly and backwards compatible where necessary.
    • scenario_config.csv has been updated accordingly (important if default.cfg has been updated)
  • Document changes

    • Add changes to CHANGELOG.md
    • Where relevant, put In-code documentation comments
    • Properly address updates in interfaces in the module documentations
    • run goxygen::goxygen() and verify the modified code is properly documented
  • Perform test runs

    • Low risk:
      • Run a compilation check via Rscript start.R --> "compilation check"
    • Medium risk:
      • Run test runs via Rscript start.R --> "test runs"
      • Check logs for errors/warnings
    • High risk:
      • Run test runs via Rscript start.R --> "test runs"
      • Check logs for errors/warnings
      • Default run from the PR target branch for comparison
      • Provide relevant comparison plots (land-use, emissions, food prices, land-use intensity,...)

📉 Performance changes 📈

  • Current develop branch default : ** mins
  • This PR's default : ** mins

🚨 Checklist for reviewer 🚨

  • PR is labeled correctly
  • Code changes look reasonable
    • No hard coded numbers and cluster/country/region names.
    • No unnecessary increase in module interfaces
    • model behavior/performance is satisfactory.
  • Changes are properly documented
    • CHANGELOG is updated correctly
    • Updates in interfaces have been properly addressed in the module documentations
    • In-code documentation looks appropriate
  • content review done (at least 1)
  • RSE review done (at least 1)

Copy link
Contributor

@pvjeetze pvjeetze left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quite nice now! And I also like that writing .nc files is less convoluted without relying on terra, which essentially also uses ncdf4. This way we also have a better understanding of what actually gets written in which way.

Like discussed earlier with @pascal-sauer, I wonder, however, whether it would not be more straightforward to implement this in magclass directly. Also, considering that other scripts like ./extra/disaggregation_LUH2.R also rely on magclass::write.magpie() for writing data rich .nc files.

From a topical prespective the disaggregation script might not be the right place to define the writing function for .nc files. It should mostly contain processes and tools directly related to the disaggregation, particularly as we have the luxury of helper packages like maglcass.

scripts/output/extra/disaggregation.R Outdated Show resolved Hide resolved
Copy link
Contributor

@pvjeetze pvjeetze left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again @pascal-sauer for this incredible and in the end quite extensive work!

scripts/output/extra/disaggregation_LUH2.R Outdated Show resolved Hide resolved
@pascal-sauer pascal-sauer merged commit 16c6786 into magpiemodel:develop Feb 28, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants