Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: [bug description] Unable to create database - out of memory #4710

Open
anthonyharrison opened this issue Jan 22, 2025 · 5 comments
Open
Labels
bug Something isn't working

Comments

@anthonyharrison
Copy link
Contributor

Description

Attempting to create an initial database results in the cve-bin-tool process being killed with out of memory message

To reproduce

cve-bin-tool -u now -n json-mirror afile

Expected behaviour:

Database is created

Actual behaviour:

Process is killed part way through the database load and the database file is not created

Version/platform info

Version of CVE-bin-tool( e.g. output of cve-bin-tool --version): 3.4
Installed from pypi or github? pypi
Operating system: Linux/Windows (other platforms are unsupported but feel free to report issues anyhow)
WSL2 on Windows 11
Python version (e.g. python3 --version): 3.10.12
Running in any particular CI environment we should know about? (e.g. Github Actions) Running in WSL2 (10GB RAM)

@anthonyharrison anthonyharrison added the bug Something isn't working label Jan 22, 2025
@anthonyharrison
Copy link
Contributor Author

Disabling data sources OSV, GAD and RSD allowed the database to be created.

@terriko
Copy link
Contributor

terriko commented Jan 22, 2025

I think this is a duplicate of #4592 . But I'm still not sure what the right fix is.

@terriko terriko closed this as completed Jan 22, 2025
@terriko
Copy link
Contributor

terriko commented Jan 23, 2025

Actually, I'm going to re-open this because I think it's the more concisely described of the several issues that are related to this problem.

Some stuff I know so far:

Some conjecture:

  • I strongly suspect that the failures are related to doing a json.load() on the NVD 2024 data, which is much larger than any previous year and still growing a little bit right now.

Next steps:

  • I'm going to switch the cache job over to python 3.13 and see if that helps make the problem occur less frequently. It's not a long-term solution but it's a few minutes of work and worth a shot. If that works I'll probably move longtests too.
  • Medium term solution may involve chopping the 2024 data up into monthly chunks on the mirror, then adjusting cve-bin-tool to load those from the mirror.
  • We may also want to look into pre-processing options in cve-bin-tool itself (likely using Rust for performance reasons).
  • We should also look into whether we can make memory improvements in cve-bin-tool. Not sure what form that would take, but I won't be shocked if there are places we could be more memory efficient during our processing.

I'm open to more suggestions, most of that was from a quick brainstorming session this morning.

@terriko terriko reopened this Jan 23, 2025
@Snehallaldas
Copy link

Snehallaldas commented Jan 23, 2025

Update the database in stages for each data source instead of all at once..
This spreads memory usage across separate runs.

@anthonyharrison
Copy link
Contributor Author

@terriko I think the issue might be with the OSV database load as removing this data source solved the problem. Committing every 1000 records or so rather than one big commit at the end may also be a useful improvement.

Given the number of records and continued growtyh, I think we may be getting to the stage of relooking at the database architecture. Might be a bit too ambitious for GSOC 2025 but maybe some useful work could be done to move things along

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants