Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC Set cachedir and backupcachedir as parameter for parallel instances of the tool with own database #4773

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

motto-phytec
Copy link

The cachedir and backup_cachedir is always on the default value for the data sources. (~/.cache/cve-bin-tool)
For the cvedb is the cachedir configurable. In the data_sources, the cve_scanner or others use always the default cachedir path.
If you set the cachedir for the CVEDB to an other, then the cvedb access in line 441
purl2cpe_conn = sqlite3.connect(self.cachedir / "purl2cpe/purl2cpe.db")
fails, because the sql connection for purl2cpe use the DISK_LOCATION_DEFAULT instead of the self.cachedir of the cvedb.

The motivation for this RFC patch is to run different instances of the cve-bin-tool with their own cachedir for the instances.
In parallel operation, there is actually a risk of collisions when accessing the database if a task starts later and wants to update the database. Then the first task makes a rollback of the database, which corrupt the complete database.

This RFC patch try to set the cachedir in all affected files, but there are many affected files and dependencies.
I am not sure, if this is the right approach to realize a better parallel processing.

The cachedir and backupcachedir was always on the default value for the
data sources. For the cvedb is the cachedir configurable.
With this patch the cve-bin-tool can run in different instances
with a own cache for the CVE information.

Additional the patch is a workaround for the cvedb access in line 441 purl2cpe_conn = sqlite3.connect(self.cachedir / "purl2cpe/purl2cpe.db"), which fails if cvedb has an other cachedir as the purl2cpe with the DISK_LOCATION_DEFAULT.

Signed-off-by: Maik Otto <[email protected]>
@terriko
Copy link
Contributor

terriko commented Feb 6, 2025

I think this is probably a change we should have.

BUT for parallel working, our standard recommendation is that you not allow scan jobs to update the database to avoid this problem, so you might want to switch to doing that.

That is:

  1. have one database update job that will do updates as needed. I usually have it scan against a blank or predictable small binary like so:
cve-bin-tool ~/blank.csv

Our docs actually recommend you use -u now here, but that will take longer as it needs to refresh all data instead of just getting updates then: https://cve-bin-tool.readthedocs.io/en/latest/how_to_guides/multiple_scans_at_once.html

  1. In every parallel scan job, use -u never as an option so that the scan job will not attempt to update the database.

I'm wondering if we should make some changes to make this happen better or make it more obvious to users that this is the recommended solution. I'll open an issue so someone could maybe work on that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants