Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Record commit at package installation #3390

Merged

Conversation

infotroph
Copy link
Member

@infotroph infotroph commented Oct 3, 2024

Description

  • Adds a build_hash column to the pecan_version() output, reporting for each package the commit it was installed from and whether there were any uncommitted changes at install time.
  • pecan_version() output now has class "pecan_version_report", and a custom print method that summarizes versions and installation sources more compactly
  • To enable commit recording, all packages gain a new internal variable .build_hash, set at buildinstall time to the value of environment variable PECAN_GIT_REV (or to "unknown" if that is not set).
  • make install now exports PECAN_GIT_REV to each package build, including adding +mod to the commit hash for packages installed with uncommitted changes in the working tree.
  • pecan_version() looks the commit up from one of two places:
    • if the DESCRIPTION has a RemoteSha: field, we use that. This is true for packages installed from github using devtools or remotes or pak, or from R-universe using any method.
    • otherwise, we use the value of .build_hash from inside the package namespace. This is set for packages installed locally with make install.
    • Note that for packages installed locally without using Make (e.g. devtools::install("base/utils"), .build_hash will be used but will be set to "unknown" unless you manually set PECAN_GIT_REV to match the current commit before starting the build process.

Motivation and Context

Output from the existing version:

PEcAn.all::pecan_version() |> head(7)
                 package v1.8.0  installed
3              PEcAn.all  1.8.0 1.8.0.9000
4        PEcAn.allometry  1.7.3 1.7.3.9000
5      PEcAn.assim.batch  1.8.0 1.8.0.9000
7           PEcAn.BASGRA  1.8.0 1.8.0.9000
8        PEcAn.benchmark  1.7.3 1.7.3.9000
9           PEcAn.BIOCRO  1.7.3 1.7.3.9000
10           PEcAn.CABLE  1.7.3       <NA>
                                                        source
3                https://pecanproject.r-universe.dev (R 4.4.1)
4         local (/Users/chrisb/clones/pecan/modules/allometry)
5       local (/Users/chrisb/clones/pecan/modules/assim.batch)
7                                                        local
8         local (/Users/chrisb/clones/pecan/modules/benchmark)
9                https://pecanproject.r-universe.dev (R 4.4.1)
10                                                        <NA>

Output with these changes:

PEcAn.all::pecan_version() |> head(7)
 package               v1.8.0 installed               source              
 PEcAn.all             1.8.0  1.8.0.9000 (f78fd1faa2) local (/Users/chr...
 PEcAn.allometry       1.7.3  1.7.3.9000 (059a93d578) local (/Users/chr...
 PEcAn.assim.batch     1.8.0  1.8.0.9000 (059a93+mod) local (/Users/chr...
 PEcAn.BASGRA          1.8.0  1.8.0.9000 (unknown)    local
 PEcAn.benchmark       1.7.3  1.7.3.9000 (8248461346) local (/Users/chr...
 PEcAn.BIOCRO          1.7.3  1.7.3.9000 (08998d2c44) https://pecanproj...
 PEcAn.CABLE           1.7.3  NA                      NA                  

Information visible here:

  • The PEcAn packages in this R library were installed from at least 4 different commits: f78fd, 059a9, 82484, 08998
  • PEcAn.assim.batch was last installed with uncommitted changes ("+mod") on top of commit 059a93d
  • PEcAn.BASGRA was built without recording its hash (in this case by me running devtools::install without exporting PECAN_GIT_REV first, but this output can't distinguish between that and other manual build methods). If I had used make install, the hash and source path would have been recorded here.
  • PEcAn.BIOCRO is installed from R-Universe not from my on-disk repo; its build hash is taken from the RemoteSha field that R-Universe adds to DESCRIPTION at build time.
  • I do not have PEcAn.CABLE installed at all.

The truncation of source (local (/Users/chr...)and the concatenation of version and build hash into one string are only done during printing.
To see their full values, extract the column or coerce the output to a dataframe:

PEcAn.all::pecan_version() |> _$source |> head(7)
[1] "local (/Users/chrisb/clones/pecan/base/all)"           
[2] "local (/Users/chrisb/clones/pecan/modules/allometry)"  
[3] "local (/Users/chrisb/clones/pecan/modules/assim.batch)"
[4] "local"      
[5] "local (/Users/chrisb/clones/pecan/modules/benchmark)"  
[6] "https://pecanproject.r-universe.dev (R 4.4.1)"         
[7] NA

Review Time Estimate

  • Immediately
  • Within one week
  • When possible

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My change requires a change to the documentation.
  • My name is in the list of CITATION.cff
  • I have updated the CHANGELOG.md.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

On my machine this drops from ~3-4 sec for first call and ~2 after,
to ~2 sec for first call and ~1 after.
No longer feels "too slow" to me.
Probably not a big difference, but felt to me like it make the Make output a little easier to interpret
- use args provided at print time
- default to no rownames and left alignment
- truncate more
@infotroph
Copy link
Member Author

Preserving some discussion from Slack:

@mdietze asked: "[I] was wondering what the motivation was for these changes? How do you envision this information being used? Why do we need this?"

My answer: It’s for debugging issues encountered while running the development version — on systems that only update at point releases, packageVersion("PEcAn.foo") is enough to be sure what code ran, but if you’re pulling from develop then “x.y.z.9000” could be any commit since the last release and you often need more detail to track down what happened.

The motivation comes from me using these routinely in my non-PEcAn workflows: Often I need to debug (or otherwise reason about) a package I installed from GitHub, so I type sessioninfo::session_info() and immediately read off which commit it was installed from. I’ve been frequently finding myself wishing I could do the same with PEcAn packages installed from local Git.

(The reason I can read off the commit for packages installed from GitHub is that remotes::install_github and all its relatives add the commit to the package DESCRIPTION while they’re installing it, but I haven’t found a tool that automatically does the same when installing from disk. So I decided to implement it.)

It would be conceptually better to record it at build time, but R code is not parsed at that stage.
If we really want build-time recording, could consider saving it as data (data/*.R are run at build time)
@@ -0,0 +1,3 @@
# Set at package install time, used by pecan.all::pecan_version()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conceptually it would be preferable to set this at build time, so that someone could download a prebuilt package binary and still have it provide this information, but the getenv is not evaluated until the package is byte-compiled (which happens during installation).

(Scripts in data/ are evaluated at build rather than install, so I considered treating this information as an exported "dataset" rather than an internal variable. I decided against it because that would mean the hash was exported and hence visible to confuse users, but I'd be open to revisiting the idea if other folks think it'd be useful)

@infotroph infotroph merged commit 4faeac6 into PecanProject:develop Nov 2, 2024
16 of 22 checks passed
@infotroph infotroph deleted the memoize-commit-at-build-time branch November 2, 2024 01:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants