Set up global caching of Roc assets #7517
Labels
can
Relates to the Canonicalization compiler stage
intermediate issue
Likely good for someone who has completed a few other issues
We would like to set up a single global cache for all Roc assets, including compiler versions, packages, and build artifacts. We should first find an appropriate cache folder, one of the following (taken from our current approach):
And then the Roc cache directory will be a folder named "roc" within that folder on Unix systems, and "Roc" on Windows systems. So
~/.cache/roc
will be typical on UNIX, and%APPDATA%\\Roc
will be typical on Windows.It will have three subdirectories:
compiler/
note: this is a tentative plan that will be cleaned up later.
The
compiler/
directory will be the simplest, and it will contain a flat collection of compiler binaries named after their respective versions, i.e.0.1.0
, or a commit hash for nightly releases. There will be one more executable named simplyroc
which is a symlink to the currently selected Roc compiler version. This folder will be populated by a future issue to manage compiler versions, which will be properly designed later, but has been at least partially discussed in this thread on Zulip, which surrounded this Google doc.build/
For each Roc project in the user's filesystem, we will hash the main file for the project (
main.roc
for packages and platforms and the<app name>.roc
for apps) and use that as the root folder for that project in the global cache. The next file level will be the Roc version (e.g.0.1.0
). And then the compile artifacts for the project will be stored in a flat collection within that version-named folder.For each
*.roc
source file in the user's project, when caching, we should take the base64-encoded BLAKE3 hash of the source file's contents (the same hashing scheme we use for packaging) and store all cacheable artifacts for that source file (i.e. canonicalization info, type info, etc.) inbuild/<project hash>/<roc version>/<file content hash>
. To manage the cache size, we plan the following strategy for when to write to/read from the cache:When writing to the cache, we should first generate a random file in the system's temp directory, save the build artifacts to that file, and then atomically rename the file to the intended cache file. This will avoid two compiler instance writing to the same file and corrupting the contents.
packages/
All packages have their cache files in the
packages/
subdirectory, and they follow the scheme~/.cache/roc/packages/<repository website>/<archive hash>/...
. For example, the v0.5.1 release of Weaver would go in~/.cache/roc/packages/github.com/nqyqbOkpECWgDUMbY-rG9ug883TVbOimHZFHek-bQeI/...
. This is the format we are using already.Each package has two subdirectories, one for the packages source, and the other for its build artifacts.
src/
This directory will contain the uncompressed files in their provided directory structure from the archive downloaded from the internet.
build/
This directory works almost the same way as the primary
build/
cache directory for user code. The difference is that we don't first store everything in a folder named by a hash of the project'smain.roc
file, since the source of the package is immutable. We also read all files and hash their contents, and load the cached artifacts we have, and calculate the rest. However, there's no need to look for files to delete given the immutability of the package. In the future, we can attempt to store some info per-package per-Roc version to avoid needing to read and hash all files per package every time.Directory layout overview
For example the directory for the mentioned Weaver version would look like this inside of the
~/.cache/roc/
cache:Some notes:
roc_cache
inroc-lang/roc/crates/cache/
.The text was updated successfully, but these errors were encountered: