Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feat] - cache duplicate creds during detection #2276

Draft
wants to merge 32 commits into
base: main
Choose a base branch
from

Conversation

ahrav
Copy link
Collaborator

@ahrav ahrav commented Jan 7, 2024

Description:

This PR introduces caching for AWS credentials in the detector, aiming to minimize redundant API calls with identical credentials. The cache uses the rawV2 value as its key and a bespoke cacheItem for its value, enabling the reconstruction of complete metadata for each result, such as VerificationErr, verified, and ExtraData.

In practical testing with the trufflehog repository, this implementation reduced AWS API calls significantly, from 707 to 256, while maintaining the same data processing volume, as indicated by unchanged chunk numbers. This approach holds potential for broader application across our most frequently used detectors, enhancing efficiency and performance.

Checklist:

  • Tests passing (make test-community)?
  • Lint passing (make lint this requires golangci-lint)?

@ahrav ahrav mentioned this pull request Jan 7, 2024
2 tasks
}
}

// Cache the result.
s.credsCache.Set(rawV2, cacheItem)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a pointer?

https://github.com/patrickmn/go-cache#usage (last example)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, using a pointer is the right approach. I'm considering implementing a constructor to ensure that we consistently return a pointer. While this approach doesn't offer substantial protection since it's within the same package, it sets the groundwork for potential future refactoring. This could involve moving it into a separate package, from which it can be imported by all other detectors

@@ -47,6 +51,11 @@ func New(opts ...func(*scanner)) *scanner {
opt(scanner)
}

scanner.credsCache = memory.New(
memory.WithExpirationInterval(1*time.Hour),
Copy link
Collaborator Author

@ahrav ahrav Jan 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are just placeholders for now, we could consider passing these in as args to New, or use a default value that seems reasonable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How big could the cache conceivably get in a given time frame?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good question. It would really depend on what was being scanned. We can run some tests to scan some repos and record some metrics to see what the size of this cache grows to be. Alternatively, we could use a LRU cache where we fix the size if needed.

if val, ok := s.credsCache.Get(rawV2); ok {
item, ok := val.(cacheItem)
if !ok {
continue
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is the right behavior, i'd like to maybe log this as an error? Not sure if constructing our custom logcontext.AddLogger make sense so we can do that?

Base automatically changed from feat-extend-memory-cache to main January 11, 2024 16:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants