Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce memory usage during GISAID ingest #456

Open
atc3 opened this issue Nov 30, 2021 · 0 comments
Open

Reduce memory usage during GISAID ingest #456

atc3 opened this issue Nov 30, 2021 · 0 comments
Assignees
Labels

Comments

@atc3
Copy link
Member

atc3 commented Nov 30, 2021

GISAID ingest steps are pushing 100 GB of RAM... probably because we're loading the entire metadata set into memory before saving it to disk.

Solution:

  • Each 100K sequences, flush the metadata CSV chunk to disk (don't write header?)
  • Separate job for concatenating chunks, adding CSV header
  • See how much the accession ID <-> sequence hash map takes up in memory. If it's a lot, then we can move this to SQLite?
@atc3 atc3 added the data label Nov 30, 2021
@atc3 atc3 self-assigned this Nov 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant