Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ratelimiting error when downloading vulnerability db from ghcr.io #389

Open
HenrikDK opened this issue Sep 18, 2024 · 93 comments
Open

Ratelimiting error when downloading vulnerability db from ghcr.io #389

HenrikDK opened this issue Sep 18, 2024 · 93 comments

Comments

@HenrikDK
Copy link

Hi, we're using trivy to scan our containers, lately we've been seeing an increase number of rate-limiting errors when trivy is downloading the vulnerability database.

image

"2024-09-18T10:40:44Z FATAL Fatal error init error: DB error: failed to download vulnerability DB: database download error: oci download error: failed to fetch the layer: GET https://ghcr.io/v2/aquasecurity/trivy-db/blobs/sha256:11c57f2012b2ac112256f94aa404e1feb7e1b7a5787598946b87149115cdb43d: TOOMANYREQUESTS: retry-after: 129.163µs, allowed: 44000/minute"

My guess is this is a global ratelimit as i can't imagine our low number of devs are causing 700+ requests a second.

I have in the meantime discovered that these scans are only used for SBOM generation on our end so we don't need to download the vulnerability database everytime, but i though this issue should be raised as i can't imagine we are the only ones seeing these errors.

@simar7
Copy link
Member

simar7 commented Sep 18, 2024

Thanks for the report, we will look into it.

@benglewis
Copy link

I also saw this right now :/ Any ideas why?

baksetercx added a commit to 3lvia/core-github-actions-templates that referenced this issue Sep 19, 2024
baksetercx added a commit to 3lvia/core-github-actions-templates that referenced this issue Sep 19, 2024
baksetercx added a commit to 3lvia/core-github-actions-templates that referenced this issue Sep 19, 2024
@billhammond-dev
Copy link

billhammond-dev commented Sep 19, 2024

I believe this is currently causing problems with anyone using the trivy action. We have had to turn it off on some workflows. I'm not sure what the long term solution might be - if GH cannot increase the global rate limit for the artifact pull then maybe it needs to be in a public AWS S3 bucket or something similar?

@billhammond-dev
Copy link

@billhammond-dev
Copy link

billhammond-dev commented Sep 20, 2024

From My PR above, a workaround suggested by someone else:

- uses: aquasecurity/[email protected]
  with:
    ...
  env:
    TRIVY_DB_REPOSITORY: <something else than ghcr.io>
    TRIVY_JAVA_DB_REPOSITORY: <something else than ghcr.io>```

@nelsonleblanc-rl
Copy link

nelsonleblanc-rl commented Sep 20, 2024

Does anyone know how to get trivy-action to auth with a privately hosted trivy-db repo? I can get it working fine with normal trivy on local, but trivy-action does not work with either docker/login-action or the usual echo $GITHUB_TOKEN | docker login ghcr.io -u USERNAME --password-stdin

2024-09-20T16:39:01Z FATAL Fatal error init error: DB error: failed to download vulnerability DB: database download error: OCI repository error: 1 error occurred: * GET https://ghcr.io/token?scope=repository%3Aprivate-github-org%2Ftrivy-db%3Apull&service=ghcr.io: UNAUTHORIZED: authentication required

@billhammond-dev
Copy link

I was able to get it to work with ECR only using an OIDC login via the configure-aws-credentials action used right before the trivy action. It is not using docker to pull the artifact as it is not a docker image.

@9838183063
Copy link

I am poor student

@srenatus
Copy link

I have no long-term tests yet, but from my understanding of GH's rate limiting, just providing a token of any sort will give you higher quotas? If that's the case, the following should help:

      - name: Run Trivy scan on image
        uses: aquasecurity/[email protected]
        with:
          [... your config ...]
        env:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

@baksetercx
Copy link

I have no long-term tests yet, but from my understanding of GH's rate limiting, just providing a token of any sort will give you higher quotas? If that's the case, the following should help:

      - name: Run Trivy scan on image
        uses: aquasecurity/[email protected]
        with:
          [... your config ...]
        env:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

I've tried logging in to GHCR via docker/login-action before running Trivy CLI (not action), and I am still getting lots of 429 errors.

@nnellanspdl
Copy link

nnellanspdl commented Sep 23, 2024

From My PR above, a workaround suggested by someone else:

- uses: aquasecurity/[email protected]
  with:
    ...
  env:
    TRIVY_DB_REPOSITORY: <something else than ghcr.io>
    TRIVY_JAVA_DB_REPOSITORY: <something else than ghcr.io>```

So, if I understand this correctly:

I, as the consumer of this action, must download copies of these DBs and store them on my own registry. Then, I must pass environment variables to the action which point at my copies of the DBs. Is that correct?

How often are these DBs updated?

@ybelMekk
Copy link

ybelMekk commented Sep 23, 2024

@nnellanspdl think its at 00:00 every day? but im not sure.

But anyway this workaround is a hustle to host them self if u need to update them every day

@NicholasFiorentini
Copy link

I have no long-term tests yet, but from my understanding of GH's rate limiting, just providing a token of any sort will give you higher quotas? If that's the case, the following should help:

      - name: Run Trivy scan on image
        uses: aquasecurity/[email protected]
        with:
          [... your config ...]
        env:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

I've tried logging in to GHCR via docker/login-action before running Trivy CLI (not action), and I am still getting lots of 429 errors.

Same for me, it doesn't seem to have significant effects.

@NicholasFiorentini
Copy link

NicholasFiorentini commented Sep 23, 2024

I have no long-term tests yet, but from my understanding of GH's rate limiting, just providing a token of any sort will give you higher quotas? If that's the case, the following should help:

      - name: Run Trivy scan on image
        uses: aquasecurity/[email protected]
        with:
          [... your config ...]
        env:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

I'm trying with:

env:
    ACTIONS_RUNTIME_TOKEN: ${{ secrets.GITHUB_TOKEN }}

I spawned multiple parallel ci/cd actions, and this seems more reliable.

@BRONSOLO
Copy link

If anyone is going the route of uploading the Trivy DB to their own registry, I've had success using https://github.com/oras-project/setup-oras

Something like:

  vendor-trivy-db:
    runs-on: ubuntu-latest
    steps:
      - name: Vendor latest trivy db
        uses: oras-project/setup-oras@v1
      - run: |
          oras pull ghcr.io/aquasecurity/trivy-db:2
          oras login -u ${{ secrets.REGISTRY_USERNAME }} -p ${{ secrets.REGISTRY_TOKEN }} YOUR_REGISTRY
          oras push YOUR_REGISTRY \
            db.tar.gz:application/vnd.aquasec.trivy.db.layer.v1.tar+gzip \
            --artifact-type application/vnd.aquasec.trivy.config.v1+json

@eugentius
Copy link

eugentius commented Sep 23, 2024

I setup AWS ECR pull-throuhg cache for trivy-db and trivy-java-db , modified action:

      - name: Run Trivy vulnerability scanner
        uses: aquasecurity/[email protected]
        with:
          image-ref: ${{ env.DOCKER_IMAGE_TO_SCAN }}
          format: 'table'
          exit-code: '1' 
          ignore-unfixed: true
          vuln-type: 'os,library'
          severity: 'CRITICAL,HIGH'
        env:
          TRIVY_DB_REPOSITORY: <ECR_ID>.dkr.ecr.us-east-1.amazonaws.com/github/ghcr.io/aquasecurity/trivy-db
          TRIVY_JAVA_DB_REPOSITORY: <ECR_ID>.dkr.ecr.us-east-1.amazonaws.com/github/ghcr.io/aquasecurity/trivy-java-db
          TRIVY_DEBUG: true

but pulling of trivy-db fails with:

2024-09-23T16:16:12Z	INFO	Downloading DB...	repository="<ECR_ID>.dkr.ecr.us-east-1.amazonaws.com/github/ghcr.io/aquasecurity/trivy-db"
2024-09-23T16:16:12Z	DEBUG	No metadata file
2024-09-23T16:16:17Z	DEBUG	Credential error	err="failed to get authorization token: operation error ECR: GetAuthorizationToken, get identity: get credentials: failed to refresh cached credentials, no EC2 IMDS role found, operation error ec2imds: GetMetadata, canceled, context deadline exceeded"
2024-09-23T16:16:17Z	FATAL	Fatal error	init error: DB error: failed to download vulnerability DB: database download error: OCI repository error: 1 error occurred:
	* GET https://<ECR_ID>.dkr.ecr.us-east-1.amazonaws.com/v2/github/ghcr.io/aquasecurity/trivy-db/manifests/2: unexpected status code 401 Unauthorized: Not Authorized

Docker is logged-in.
If I run trivy binary locally or on runner, it works fine:

runner@runner-set-xs-djfqb-0:/tmp$ export TRIVY_DB_REPOSITORY=<ECR_ID>.dkr.ecr.us-east-1.amazonaws.com/github/ghcr.io/aquasecurity/trivy-db
runner@runner-set-xs-djfqb-0:/tmp$ export TRIVY_JAVA_DB_REPOSITORY=<ECR_ID>.dkr.ecr.us-east-1.amazonaws.com/github/ghcr.io/aquasecurity/trivy-java-db
runner@runner-set-xs-djfqb-0:/tmp$ trivy image    --format table --exit-code  1 --ignore-unfixed --vuln-type  os,library --severity  CRITICAL,HIGH  <ECR_ID>.dkr.ecr.us-east-1.amazonaws.com/my-awesome-app:1.23.0
2024-09-23T16:35:04Z	WARN	'--vuln-type' is deprecated. Use '--pkg-types' instead.
2024-09-23T16:35:04Z	INFO	Adding schema version to the DB repository for backward compatibility	repository="<ECR_ID>.dkr.ecr.us-east-1.amazonaws.com/github/ghcr.io/aquasecurity/trivy-db:2"
2024-09-23T16:35:04Z	INFO	Adding schema version to the Java DB repository for backward compatibility	repository="<ECR_ID>.dkr.ecr.us-east-1.amazonaws.com/github/ghcr.io/aquasecurity/trivy-java-db:1"
2024-09-23T16:35:04Z	INFO	[db] Need to update DB
2024-09-23T16:35:04Z	INFO	[db] Downloading DB...	repository="<ECR_ID>.dkr.ecr.us-east-1.amazonaws.com/github/ghcr.io/aquasecurity/trivy-db:2"
53.56 MiB / 53.56 MiB [------------------------------------------------------------------------------------------------------ 

Has somebody tried to pull trivy-db from AWS ECR using action?

@billhammond-dev
Copy link

Yes so you can pull from ECR pull through but only if you do an OIDC set-aws-credentials action first before the trivy action. Im not sure why yet that you cannot use anything but OIDC, or at least I can't seem to get regular role assumption to work. Docker login doesnt help you as the container doesnt try to pull the DB using docker commands.

If you try a docker pull you will get the unsupported media type error as the above post, as the artifact isnt an 'image'

@nbenmoody-tesouro
Copy link

Ah, thanks. I was logged in under the incorrect account when I posted originally. That's what I was wondering, @billhammond-dev !

@nbenmoody-tesouro
Copy link

This was my error, for anyone else who runs into it:

latest: Pulling from github/ghcr.io/aquasecurity/trivy-db
unsupported media type application/vnd.aquasec.trivy.config.v1+json

@nnellanspdl
Copy link

@nnellanspdl think its at 00:00 every day? but im not sure.

But anyway this workaround is a hustle to host them self if u need to update them every day

Thanks. Yes, this is a lot to ask of consumers of your action.

@Nava-JoshLong
Copy link

I'm guessing it would be too much work to update the logic for pulling the file to allow passing it the file directly? We could setup a workflow to pull and stash the image every X hours, and then in the workflow that uses the image, we pull the file from the stash to use. It'd lower the amount of hits by users, and we wouldn't need to host it in AWS and pay

@simar7
Copy link
Member

simar7 commented Sep 24, 2024

ACTIONS_RUNTIME_TOKEN

@NicholasFiorentini that's interesting, would you mind creating a PR to document this in the repo? If possible, could you also reference where this environment variable is documented?

@jpalomaki
Copy link

jpalomaki commented Sep 24, 2024

FWIW, here's a sample snippet for using AWS ECR pull through cache repositories using OIDC for AWS auth.

Pull through cache ECR repositories (for hosting the cached trivy DB artifacts) must be configured prior to running this workflow, see documentation.

- name: Setup AWS credentials
  uses: aws-actions/configure-aws-credentials@v4
  with:
    aws-region: ...
    role-to-assume: <role, assumable through OIDC, that can pull from the cache ECR repositories>

- id: ecr-login
  name: Login to ECR
  uses: aws-actions/amazon-ecr-login@v2

...

- name: Run trivy scan
  uses: aquasecurity/[email protected]
  with:
    ...
  env:
    TRIVY_DB_REPOSITORY: ${{ steps.ecr-login.outputs.registry }}/github/aquasecurity/trivy-db:2
    TRIVY_JAVA_DB_REPOSITORY: ${{ steps.ecr-login.outputs.registry }}/github/aquasecurity/trivy-java-db:1

Per AWS documentation:

When a cached image is pulled through the Amazon ECR private registry URI, Amazon ECR checks the upstream repository at least once every 24 hours to verify whether the cached image is the latest version. If there is a newer image in the upstream registry, Amazon ECR attempts to update the cached image. This timer is based off the last pull of the cached image.

@ktzsolt
Copy link

ktzsolt commented Oct 24, 2024

I also started to use aws public ecr so I set the following env vars:

TRIVY_DB_REPOSITORY=public.ecr.aws/aquasecurity/trivy-db
TRIVY_JAVA_DB_REPOSITORY=public.ecr.aws/aquasecurity/trivy-java-db
TRIVY_CHECKS_BUNDLE_REPOSITORY=public.ecr.aws/aquasecurity/trivy-checks

However for helm chart scanning with trivy config I get the following error:

ERROR	[misconfig] Falling back to embedded checks	err="failed to download built-in policies: download error: OCI repository error: 1 error occurred:\n\t* GET https://public.ecr.aws/aquasecurity/trivy-checks/manifests/latest: NOT_FOUND: resource not found: repo awspublic/aquasecurity/trivy-checks, tag latest not found\n\n"

Because trivy-checks repository is empty on aws ecr: https://gallery.ecr.aws/aquasecurity/trivy-checks
Could you please publish trivy-checks also to aws ecr?

abottchen added a commit to puppetlabs/puppet-dev-tools that referenced this issue Oct 24, 2024
The Trivy github action is currently experiencing a GCR rating limiting
issue pulling its vulnerability DB
(aquasecurity/trivy-action#389).  Until that
issue is resolved, or we work out a way to host it ourselves, this
commit will disable the scans.

In the meantime, we will have to run these scans manually.
sscheib added a commit to sscheib/ansible-role-file_deployment that referenced this issue Oct 25, 2024
This is a workaround for aquasecurity/trivy-action#389

Signed-off-by: Steffen Scheib <[email protected]>
@Wensworking
Copy link

Wensworking commented Oct 30, 2024

We are using trivy operator and upgraded to chart 0.24.1 and came across the same rate limit error from GHCR and I wanted to pull images from AWS ECR instead but trivy checks doesn't have any images now.

Can anyone please publish trivy-checks to aws ecr?

@mvdkleijn
Copy link

I find the apparent(!) lack of attention this issue is getting from Aqua security to be a little frustrating to be honest.

@herman-wong-cf
Copy link

We've had good luck using this as a workaround:
https://github.com/aquasecurity/trivy-action?tab=readme-ov-file#updating-caches-in-the-default-branch

Basically it's just using oras to pull the trivy-db and java-db into the Github Workflows cache and have trivy-action only use the cache.

@erewok
Copy link

erewok commented Oct 31, 2024

We used our own container registry in Azure and "artifact caching rules". (Admittedly, not a solution for all, but I respect that this open-source solution is provided for free by Aqua and that the infrastructure must be hosted somewhere...)

@ncalteen
Copy link

ncalteen commented Nov 1, 2024

In case it is useful to anyone else, I was working on a project where we encountered this issue. One workaround that was pretty straightforward (but not 100% robust) was to set up a simple GitHub Action that would pull the Trivy DB images once per day and publish them to GHCR in our organization. That way there was significantly lower chance of encountering a TOOMANYREQUESTS error. It's not 100% robust since the scheduled workflow could fail, but adding in some retry options shouldn't be too painful.

name: Cache Trivy DBs

on:
  schedule:
    - cron: '0 0 * * *'
  workflow_dispatch:

permissions:
  packages: write
  id-token: write

jobs:
  lint:
    name: Cache Trivy DBs
    runs-on: ubuntu-latest

    steps:
      - name: Install ORAS
        id: oras
        uses: oras-project/setup-oras@v1

      - name: Authenticate to GHCR
        id: ghcr
        run: |
          oras login ghcr.io \
            -u ${{ github.actor }} \
            -p ${{ github.token }}

      - name: Pull Trivy Images
        id: pull
        run: |
          oras pull ghcr.io/aquasecurity/trivy-db:2
          oras pull ghcr.io/aquasecurity/trivy-java-db:1

      - name: Push Trivy Images
        id: push
        run: |
          oras push ghcr.io/<owner>/<repo>/trivy-db:2 \
            db.tar.gz:application/vnd.aquasec.trivy.db.layer.v1.tar+gzip \
            --artifact-type application/vnd.aquasec.trivy.config.v1+json

          oras push ghcr.io/<owner>/<repo>/trivy-java-db:1 \
            javadb.tar.gz:application/vnd.aquasec.trivy.javadb.layer.v1.tar+gzip \
            --artifact-type application/vnd.aquasec.trivy.config.v1+json

From there, it's just a matter of ensuring calling workflows have packages: read permissions to the packages. This can be done with a GitHub app, but has been working fine with the built-in workflow token for us.

runs:
  steps:
    - name: Install ORAS
      id: oras
      uses: oras-project/setup-oras@v1

    - name: Authenticate to GHCR
      id: ghcr
      shell: bash
      run: |
        oras login ghcr.io \
          -u ${{ github.actor }} \
          -p ${{ github.token }}

    - name: Pull Trivy DBs from GHCR
      id: pull
      shell: bash
      run: |
        oras pull ghcr.io/<owner>/<repo>/trivy-db:2
        oras pull ghcr.io/<owner>/<repo>/trivy-java-db:1

    - name: Scan Container Image
      id: scan
      uses: aquasecurity/[email protected]
      env:
        TRIVY_DB_REPOSITORY: ghcr.io/<owner>/<repo>/trivy-db,public.ecr.aws/aquasecurity/trivy-db,ghcr.io/aquasecurity/trivy-db
        TRIVY_JAVA_DB_REPOSITORY: ghcr.io/<owner>/<repo>/trivy-java-db,public.ecr.aws/aquasecurity/trivy-java-db,ghcr.io/aquasecurity/trivy-java-db
        # Not 100% sure if these are required, but so far no issues.
        TRIVY_USERNAME: ${{ github.actor }}
        TRIVY_PASSWORD: ${{ github.token }}
      with:
        cache: true
        exit-code: 0
        format: table

tpendragon added a commit to pulibrary/dpul-collections that referenced this issue Nov 1, 2024
nijel added a commit to WeblateOrg/docker that referenced this issue Nov 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests