-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(fs): use git commit hash as cache key for clean repositories #8278
Conversation
When scanning a git repository that is in a clean state, use the commit hash as the cache key instead of calculating it from blob info. This allows for better cache reuse and prevents unnecessary cache deletion.
- Introduced a new fixture in `internal/gittest/testdata/fixture.go` to clone a Git repository for unit tests. - Updated the `magefiles/magefile.go` to include the new fixture in the unit test dependencies. - Added the test repository path to `.gitignore` to prevent it from being tracked.
- Added NewServerWithRepository function to create a git server with an existing repository, including cloning and fetching all branches and tags. - Updated NewTestServer to check for the existence of the test repository and provide a clear error message if not found. - Modified the cloning process in fixture.go to include all branches and tags. - Removed unused test files.
Signed-off-by: knqyf263 <[email protected]>
Add debug logging when using the latest git commit hash for calculating the artifact cache key, providing more visibility into the cache key generation process.
Signed-off-by: knqyf263 <[email protected]>
Signed-off-by: knqyf263 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
But i think we need to update docs:
fs
may save/extract cache for clean repositories- by default we don't save cache for repositories (
--cache-backend memory
is default forfs
/repo
mode) - it works by default for client/server mode
- cache only works if the repository directory is the target of Trivy scan (but maybe that's redundant)
art.logger.Debug("Using the latest commit hash for calculating cache key", log.String("commit_hash", hash)) | ||
art.commitHash = hash | ||
} else { | ||
art.logger.Debug("Random cache key will be used", log.Err(err)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we don't need to show this log unless it's a git repository directory.
Because we will show the following log for each fs
scan
[fs] Random cache key will be used err="failed to open git repository: repository does not exist"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, it might make sense to add path
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated 3802124
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like there is a bug 😄 I'll fix it.
Signed-off-by: knqyf263 <[email protected]>
Signed-off-by: knqyf263 <[email protected]>
Signed-off-by: knqyf263 <[email protected]>
Signed-off-by: knqyf263 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Can we merge this PR?
Overview
Improve cache efficiency for Git repositories by using commit hash as cache key when the repository is in a clean state.
Description
When scanning a Git repository that is in a clean state, this change uses the commit hash as the cache key instead of calculating it from UUID. This allows for better cache reuse and prevents unnecessary cache deletion.
Currently, the file system scanner uses UUID to calculate the cache key, which effectively does not work as a cache key and is deleted after scanning with DeleteBlobs. However, if the directory is a Git repository, we can use the commit hash as a cache key, eliminating the need to delete the cache after scanning.
Key changes:
Related Issues
Checklist