Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set up global caching of Roc assets #7517

Open
smores56 opened this issue Jan 15, 2025 · 3 comments
Open

Set up global caching of Roc assets #7517

smores56 opened this issue Jan 15, 2025 · 3 comments
Labels
can Relates to the Canonicalization compiler stage intermediate issue Likely good for someone who has completed a few other issues

Comments

@smores56
Copy link
Collaborator

smores56 commented Jan 15, 2025

We would like to set up a single global cache for all Roc assets, including compiler versions, packages, and build artifacts. We should first find an appropriate cache folder, one of the following (taken from our current approach):

  • The XDG_CACHE_HOME environment variable, if it's set.
  • Otherwise, ~/.cache on UNIX and %APPDATA% on Windows.

And then the Roc cache directory will be a folder named "roc" within that folder on Unix systems, and "Roc" on Windows systems. So ~/.cache/roc will be typical on UNIX, and %APPDATA%\\Roc will be typical on Windows.

It will have three subdirectories:

compiler/

note: this is a tentative plan that will be cleaned up later.

The compiler/ directory will be the simplest, and it will contain a flat collection of compiler binaries named after their respective versions, i.e. 0.1.0, or a commit hash for nightly releases. There will be one more executable named simply roc which is a symlink to the currently selected Roc compiler version. This folder will be populated by a future issue to manage compiler versions, which will be properly designed later, but has been at least partially discussed in this thread on Zulip, which surrounded this Google doc.

build/

For each Roc project in the user's filesystem, we will hash the main file for the project (main.roc for packages and platforms and the <app name>.roc for apps) and use that as the root folder for that project in the global cache. The next file level will be the Roc version (e.g. 0.1.0). And then the compile artifacts for the project will be stored in a flat collection within that version-named folder.

For each *.roc source file in the user's project, when caching, we should take the base64-encoded BLAKE3 hash of the source file's contents (the same hashing scheme we use for packaging) and store all cacheable artifacts for that source file (i.e. canonicalization info, type info, etc.) in build/<project hash>/<roc version>/<file content hash>. To manage the cache size, we plan the following strategy for when to write to/read from the cache:

  • for every file in the project, find the hash of its contents
  • if there is a file with that hash's name in the project's build cache folder, use it
  • if it is not there, compile the module in isolation and cache its artifacts in the project's build cache folder
  • all other files in the project's build cache folder should be deleted

When writing to the cache, we should first generate a random file in the system's temp directory, save the build artifacts to that file, and then atomically rename the file to the intended cache file. This will avoid two compiler instance writing to the same file and corrupting the contents.

packages/

All packages have their cache files in the packages/ subdirectory, and they follow the scheme ~/.cache/roc/packages/<repository website>/<archive hash>/.... For example, the v0.5.1 release of Weaver would go in ~/.cache/roc/packages/github.com/nqyqbOkpECWgDUMbY-rG9ug883TVbOimHZFHek-bQeI/.... This is the format we are using already.

Each package has two subdirectories, one for the packages source, and the other for its build artifacts.

src/

This directory will contain the uncompressed files in their provided directory structure from the archive downloaded from the internet.

build/

This directory works almost the same way as the primary build/ cache directory for user code. The difference is that we don't first store everything in a folder named by a hash of the project's main.roc file, since the source of the package is immutable. We also read all files and hash their contents, and load the cached artifacts we have, and calculate the rest. However, there's no need to look for files to delete given the immutability of the package. In the future, we can attempt to store some info per-package per-Roc version to avoid needing to read and hash all files per package every time.

Directory layout overview

For example the directory for the mentioned Weaver version would look like this inside of the ~/.cache/roc/ cache:

~/.cache/roc/
  compiler/
    roc
    0.1.0
    <commit hash>
  build/
    <project main.roc hash>/
      <roc version>/
        <build artifacts by file hash>
  packages/
    github.com/
      nqyqbOkpECWgDUMbY-rG9ug883TVbOimHZFHek-bQeI/
        src/
          <decompressed source files>
        build/
          <roc version>/
            <build artifacts by file hash>

Some notes:

  • All hashing (besides Git commit hashes) should following package URL hashing, which is a base64-encoded BLAKE3 hashing of the given data.
  • This design is not necessarily written in stone, so whoever implements this should expect the possibility of design discussion before a PR is merged. The implementer can lower the odds of needing to rewrite their work by double-checking that their approach makes sense in the original Zulip thread.
  • The current packaging cache should be combined/reworked into a singular cache as a separate crate, maybe named roc_cache in roc-lang/roc/crates/cache/.
@smores56 smores56 added can Relates to the Canonicalization compiler stage intermediate issue Likely good for someone who has completed a few other issues labels Jan 15, 2025
@chuckwondo
Copy link

May I suggest following the XDG Base Directory Specification, so as to avoid cluttering the home directory? This might mean splitting things into multiple dirs according to what the spec recommends, or at the very least, choosing one of the spec recommended dirs and shoving everything into it, but certainly not directly in a top-level .roc dir in the user's home dir.

@lishaduck
Copy link

May I suggest following the XDG Base Directory Specification, so as to avoid cluttering the home directory?

And may I add, to that, that I'd prefer if it was also respected on macOS? I use macOS, but opt into (the much prettier) XDG spec by setting the variables. I don't care where it is otherwise, just that if XDG vars are set, it'll obey them no matter the platform.

@smores56
Copy link
Collaborator Author

Updated the description:

  • Used the current XDG-compliant cache directory discovery mechanism
  • Used the current package folder format, which is less deeply nested than the prior suggestion
  • Store downloaded package source files uncompressed instead of in an archive
  • Write cache artifacts to a temp file first to avoid write conflicts and remove the need for lock files

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
can Relates to the Canonicalization compiler stage intermediate issue Likely good for someone who has completed a few other issues
Projects
None yet
Development

No branches or pull requests

3 participants