Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: improve databases documentation #7732

Open
wants to merge 21 commits into
base: main
Choose a base branch
from

Conversation

itaysk
Copy link
Contributor

@itaysk itaysk commented Oct 14, 2024

Description

  • document the recently added db fallback options
  • added checks db to db documentation
  • removed unnecessary and and confusing section from air-gapped document

@itaysk itaysk requested a review from knqyf263 as a code owner October 14, 2024 17:54
@itaysk
Copy link
Contributor Author

itaysk commented Oct 14, 2024

the release note says: "Databases are downloaded in priority order until one is successful". does this mean that trivy will keep cycling through list "until one is successful"? or if it reaches the end it will error out

@itaysk itaysk marked this pull request as draft October 20, 2024 08:56
@itaysk
Copy link
Contributor Author

itaysk commented Oct 20, 2024

I wanted to incorporate the rate limiting mitigation steps post into the documentation, but after re-reading the docs I see that there's a growing overlap between the "advanced netwrok scenatios" (aka air-gap) doc, and the "Database cofiguration" doc. The way I thought about it is that the configuration document only describes the flags avaialble and technical usage, and the air gap doc talks more about the scenario and provides recommendations. Does this make sense to you? if so, I'll need to refactor both documents accordingly

@itaysk itaysk marked this pull request as ready for review October 29, 2024 16:23
@itaysk
Copy link
Contributor Author

itaysk commented Oct 29, 2024

I ended up reorganizing the configuration/db doc and the air-gapped doc, such that the db doc now documents everything related to trivy's databases, including purpose, requirements, and configuration options, and the air-gapped explains how to setup trivy in a network constraint environment. I also incorporated parts from the yet-to-be-published db throttling announcement into the relevant documents.

@itaysk itaysk requested a review from simar7 October 29, 2024 16:27
| Misconfiguration | |
| Secret | |
| License | |
When you install Trivy, the installed artifact contains the scanner engine but is lacking relevant security information needed to make security detections and recommendations. These so called "databases" are fetched and maintained by Trivy automatically as needed, so normally you shouldn't notice or worry about them. However, some situations might require your attention to Trivy's network connectivity. This section elaborates on the database management mechanism and it's configuration options.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scanner engine but is lacking relevant security information needed to make security detections and recommendations. we repeat this in the air gap doc as well. Is this intentional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it's worth reintroducing the motivation since someone might be reading one doc and not the other. I you think it's unnesessary I can remove the intro from the air gap doc.

docs/docs/configuration/db.md Outdated Show resolved Hide resolved
docs/docs/configuration/db.md Outdated Show resolved Hide resolved

Following are official locations of Trivy databases:

| Registry | Image Address | Link
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we mention which one is the default?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that all official repos should be used by default, in the order specified. but I realize this is not the case.


| Registry | Image Address | Link
| --- | --- | ---
| GHCR | `ghcr.io/aquasecurity/trivy-db` | <https://ghcr.io/aquasecurity/trivy-db>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's more straightforward to aggregate them by artifacts so users can understand which address they can use for secondary.

  • trivy-db
    • ghcr.io/aquasecurity/trivy-db (default)
    • aquasec/trivy-db
    • public.ecr.aws/aquasecurity/trivy-db
  • trivy-java-db
    • ghcr.io/aquasecurity/trivy-java-db (default)
    • ...
      --

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with making the change, but just thinking users might have a mirror preference based on their infrastructure, for example an AWS customer would probably prefer ECR for all of the images.

docs/docs/configuration/db.md Outdated Show resolved Hide resolved
Trivy will attempt to pull images from the official registries in the order specified. In case of a transient errors (e.g. status 429 or 5xx), Trivy will fall back to alternative registries in the order specified.
You can specify additional alternative repositories as explained in the [configuring database locations section](#locations).

The Checks Database registry location option does not support fallback through multiple options. This is because in case of a failure pulling the trivy-chekcs DB, Trivy will use the embedded checks as a fallback.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We call it "checks bundle" in all other places. Do we want to rename it officially?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@simar7 I think you named it "checks bundle." What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah checks bundle is good with me.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@itaysk is calling it "checks database" in this PR. We should be consistent in what we call it within the project. Either is fine for me.

  • Checks Bundle
  • Checks Database

Note we already have a flag --checks-bundle-repository and need to rename it to --checks-db-repository if we'll go with checks database.

      --checks-bundle-repository string   OCI registry URL to retrieve checks bundle from (default "ghcr.io/aquasecurity/trivy-checks:1")

docs/docs/configuration/db.md Outdated Show resolved Hide resolved

Trivy is an open source project that relies on public free infrastructure. In case of extreme load, you may encounter rate limiting when Trivy attempts to update its databases. If you are facing rate-limiting issues:

1. Consider self-hosting the databases, or implementing a proxy-cache in your organization.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a link to the page about self-hosting?

```

### DB Repository
`Trivy` could also download the vulnerability database from an external OCI registry by using `--db-repository` option.
The flags accepts multiple values, which can be used to specify multiple alternative repository locations. In case of a transient errors (e.g. status 429 or 5xx), Trivy will fall back to alternative registries in the order specified.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The flags accepts multiple values, which can be used to specify multiple alternative repository locations. In case of a transient errors (e.g. status 429 or 5xx), Trivy will fall back to alternative registries in the order specified.
The flags accepts multiple values, which can be used to specify multiple alternative repository locations. See [Automatic fallback](#automatic-fallback) for details.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that the authoritative section is this one, which describes all the db flags, and the db locations section on top should point here, not the other way around.

docs/docs/configuration/db.md Outdated Show resolved Hide resolved
Copy link
Collaborator

@knqyf263 knqyf263 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want to add a new page for "Self-Hosting Databases". It can be used for air-gapped environments, but also useful for rate limits or network bandwidth. We can explain media types in the page.

https://github.com/aquasecurity/trivy/pull/7732/files#diff-f67fbe532cb6f906f5e2e056c318da6d0bc245f94c8cc1afbd7e72a61e2506b8R12

@itaysk
Copy link
Contributor Author

itaysk commented Oct 31, 2024

I think we want to add a new page for "Self-Hosting Databases".

I wasn't sure what would be left in the airgap doc if we move most of the content to a shared page. Also it's currently called "Air-Gapped Environments and Self-Hosting", so I moved the media types table based there.

@knqyf263
Copy link
Collaborator

knqyf263 commented Nov 1, 2024

I wasn't sure what would be left in the airgap doc if we move most of the content to a shared page.

The page on the air-gapped environment originally showed how to download and populate databases manually. Self-hosting is not directly related to the air-gapped environment, so we can just link to it from the air-gapped environment page.

# Air-Gapped Environments

## Manual cache population
...

## Self-Hosting
You can also use self-hosted databases. Please see [here](./self-host.md) for details.
# Self-Hosting
Self-hosting databases helps you with:

- Rate limits
- Reducing outgoing network traffic
- [Air-Gapped Environments](./air-gap.md)

## Copying databases into your registry
...

## Media types
...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants