Skip to content

Commit

Permalink
Add instructions to run a local GitHub cache
Browse files Browse the repository at this point in the history
Write up instructions for the setup described in:
https://actuated.dev/blog/faster-self-hosted-cache

Signed-off-by: Han Verstraete (OpenFaaS Ltd) <[email protected]>
  • Loading branch information
welteki authored and alexellis committed Feb 16, 2024
1 parent e98f570 commit 5d2c923
Show file tree
Hide file tree
Showing 2 changed files with 234 additions and 2 deletions.
231 changes: 231 additions & 0 deletions docs/tasks/local-github-cache.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,231 @@
# Run a local GitHub Cache

The cache for GitHub Actions can speed up CI/CD pipelines. Hosted runners are placed close to the cache which means the latency is very low. Self-hosted runners can also make good use of this cache. Just like caching container images on the host in (a registry mirror)[/tasks/registry-mirror/], you can also get a speed boost over the hosted cache by running your own cache directly on the host.

To improve cache speeds with Actuated runners you can run a self-hosted S3 server and switch out the official [actions/cache@v3](https://github.com/actions/cache) with [tespkg/actions-cache@v1](https://github.com/tespkg/actions-cache). The tespkg/actions-cache@v1 can target S3 instead of the proprietary GitHub cache.

You can run the cache on every actuated server for the speed of communicating over a loopback network, or you can run it on a single dedicated server that's placed in the same region as the actuated servers, which will still be very quick.

> Note that if you have multiple actuated hosts consider running a single dedicated server for the cache. Subsequent jobs can be scheduled to different hosts so there is no guarantee the cache is populated when running a cache on every actuated server.
## Set up an S3 cache

There are a couple of options to run a self-hosted S3 server, most notably [Seaweedfs](https://github.com/seaweedfs/seaweedfs) and [Minio](https://min.io/).

This guide will cover the setup of Seaweedfs but any S3 compatible service will work in a very similar way.

### Install Seaweedfs

Seaweedfs is distributed as a static Go binary, so it can be installed with [arkade](https://github.com/alexellis/arkade), or from the [GitHub releases page](https://github.com/seaweedfs/seaweedfs/releases).

```sh
arkade get seaweedfs
sudo mv ~/.arkade/bin/seaweedfs /usr/local/bin
```

Define a secret key and access key to be used from the CI jobs in the `/etc/seaweedfs/s3.conf` file.

Generate a secret key: `openssl rand -hex 16 > secret_key`

```bash
export ACCESS_KEY="" # Replace with your access key
export SECRET_KEY="$(cat ~/secret_key)"

cat >> /tmp/s3.conf <<EOF
{
"identities": [
{
"name": "actuated",
"credentials": [
{
"accessKey": "$ACCESS_KEY",
"secretKey": "$SECRET_KEY"
}
],
"actions": [
"Admin",
"Read",
"List",
"Tagging",
"Write"
]
}
]
}
EOF

mkldir -p /etc/seaweedfs
sudo mv /tmp/s3.conf /etc/seaweedfs/s3.conf
```

Install and start Seaweedfs with a systemd unit file:

```bash
(
cat >> /tmp/seaweedfs.service <<EOF
[Unit]
Description=SeaweedFS
After=network.target
[Service]
User=root
ExecStart=/usr/local/bin/seaweedfs server -ip=192.168.128.1 -volume.max=0 -volume.fileSizeLimitMB=2048 -dir=/home/runner-cache -s3 -s3.config=/etc/seaweedfs/s3.conf
Restart=on-failure
[Install]
WantedBy=multi-user.target
EOF

mkdir -p /home/runner-cache
sudo mv /tmp/seaweedfs.service /etc/systemd/system/seaweedfs.service
sudo systemctl daemon-reload
sudo systemctl enable seaweedfs --now
)
```

We have set `-volume.max=0 -volume.fileSizeLimitMB=2048` to minimize the amount of space used and to allow large zip files of up to 2GB, but you can change this to suit your needs. See `seaweedfs server --help` for more options.

The `ip` only needs to be set to `192.168.128.1` if you are running the cache directly on the agent host. If you set up the cache to be accessible by multiple Actuated runner hosts use the appropriate interface IP address.

Check the status with:

```bash
sudo journalctl -u seaweedfs -f
```

## Use the self-hosted cache

To start using the local cache you will need to replace `actions/cache@v3` with `tespkg/actions-cache@v1` and add `tespkg/actions-cache` specific properties in addition to the `actions/cache` properties in your cache steps.

Some actions like [setup-node](https://github.com/actions/setup-node#caching-global-packages-data), [setup-python](https://github.com/actions/setup-python#caching-packages-dependencies), etc come with build-in support for the GitHub actions cache. They are not directly compatible with the self-hosted S3 cache and you will need to configure caching manually.

This is an example to manually configure caching for go:

```yaml
name: build

on: push

jobs:
build:
runs-on: actuated-4cpu-8gb
steps:
- name: Setup Golang
uses: actions/setup-go@v3
with:
go-version: ~1.21
cache: false
- name: Setup Golang caches
uses: tespkg/actions-cache@v1
with:
endpoint: "192.168.128.1"
port: 8333
insecure: true
accessKey: ${{ secrets.ACTIONS_CACHE_ACCESS_KEY }}
secretKey: ${{ secrets.ACTIONS_CACHE_SECRET_KEY }}
bucket: actuated-runners
region: local
use-fallback: true

# actions/cache compatible properties: https://github.com/actions/cache
path: |
~/.cache/go-build
~/go/pkg/mod
key: ${{ runner.os }}-go-${{ hashFiles('**/go.sum') }}
restore-keys: |
${{ runner.os }}-go-
```
`tespkg/actions-cache` specific properties:

* `use-fallback` - option means that if Seaweedfs is not installed on the host, or is inaccessible, the action will fall back to using the GitHub cache.
* `bucket` - the name of the bucket to use in Seaweedfs
* `region` - the bucket region - use `local` when running your own S3 cache locally.
* `accessKey` and `secretKey` - the credentials to use to access the bucket - we'd recommend using an organisation-level secret for this.
* `insecure` - use http instead of https. You may want to create a self-signed certificate for the S3 service and set `insecure: false` to ensure that the connection is encrypted. If you're running builds within private repositories, tampering is unlikely.

Checkout the list of `actions/cache` [examples](https://github.com/actions/cache/blob/main/examples.md) to configure caching for different languages and frameworks. Remember to replace `actions/cache@v3` with `tespkg/actions-cache@v1` and add the additional properties mentioned above.

### Caching the git checkout

Caching the git checkout can save a lot of time especially for large repos.

```yaml
jobs:
build:
runs-on: actuated-4cpu-8gb
steps:
- name: "Set current date as env variable"
shell: bash
run: |
echo "CHECKOUT_DATE=$(date +'%V-%Y')" >> $GITHUB_ENV
id: date
- uses: tespkg/actions-cache@v1
with:
endpoint: "192.168.128.1"
port: 8333
insecure: true
accessKey: ${{ secrets.ACTIONS_CACHE_ACCESS_KEY }}
secretKey: ${{ secrets.ACTIONS_CACHE_SECRET_KEY }}
bucket: actuated-runners
region: local
use-fallback: true
path: ./.git
key: ${{ runner.os }}-checkout-${{ env.CHECKOUT_DATE }}
restore-keys: |
${{ runner.os }}-checkout-
```

The cache key uses a week-year format, rather than a SHA. Why? Because a SHA would change on every build, meaning that a save and load would be performed on every build, using up more space and slowing things down. In this example, there's only 52 cache entries per year.

### Caching node_modules with pnpm

For Node.js projects, the node_modules folder and yarn cache can become huge and take a long time to download. Switching to a local S3 cache can help bring that time down.

This example uses [pnpm](https://pnpm.io/), a fast, disk space efficient replacement for npm and yarn.

```yaml
jobs:
build:
runs-on: actuated-4cpu-8gb
steps:
- name: Install PNPM
uses: pnpm/action-setup@v2
with:
run_install: |
- args: [--global, node-gyp]
- name: Get pnpm store directory
id: pnpm-cache
shell: bash
run: |
echo "STORE_PATH=$(pnpm store path)" >> $GITHUB_OUTPUT
- uses: tespkg/actions-cache@v1
with:
endpoint: "192.168.128.1"
port: 8333
insecure: true
accessKey: ${{ secrets.ACTIONS_CACHE_ACCESS_KEY }}
secretKey: ${{ secrets.ACTIONS_CACHE_SECRET_KEY }}
bucket: actuated-runners
region: local
use-fallback: true
path:
${{ steps.pnpm-cache.outputs.STORE_PATH }}
~/.cache
.cache
key: ${{ runner.os }}-pnpm-store-${{ hashFiles('**/pnpm-lock.yaml') }}
restore-keys: |
${{ runner.os }}-pnpm-store-
- name: Install dependencies
shell: bash
run: |
pnpm install --frozen-lockfile --prefer-offline
```

## Further reading

* From our blog: [Fixing the cache latency for self-hosted GitHub Actions](https://actuated.dev/blog/faster-self-hosted-cache)
* A primer on using the GitHub Actions cache: [Using caching in builds](/examples/github-actions-cache/)
5 changes: 3 additions & 2 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -100,7 +100,8 @@ nav:
- Tasks:
- Setup a Registry Mirror: tasks/registry-mirror.md
- Debug a job with SSH: tasks/debug-ssh.md
- Set-up the CLI: tasks/cli.md
- Set-up the CLI: tasks/cli.md
- Setup self-hosted cache: tasks/local-github-cache.md
- Monitoring: tasks/monitoring.md
- Troubleshooting: troubleshooting.md

Expand All @@ -115,10 +116,10 @@ nav:
- Publish an OpenFaaS function: examples/openfaas-publish.md
- Deploy a Helm chart: examples/openfaas-helm.md
- Custom VM sizes: examples/custom-vm-size.md
- Using caching in builds: examples/github-actions-cache.md
- Advanced examples:
- Run a KVM guest: examples/kvm-guest.md
- Multi-arch images with buildx: examples/multiarch-buildx.md
- GitHub Actions cache: examples/github-actions-cache.md

extra:
analytics:
Expand Down

0 comments on commit 5d2c923

Please sign in to comment.