Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How long does data processing take? #251

Closed
RumitAP opened this issue Jun 24, 2022 · 1 comment
Closed

How long does data processing take? #251

RumitAP opened this issue Jun 24, 2022 · 1 comment

Comments

@RumitAP
Copy link

RumitAP commented Jun 24, 2022

Good Afternoon,

I am benchmarking our HPC platform on a config with no GPU's (48 virtual cores and 128 GB memory). How long does data processing usually take and how long does training usually take?

I took a loot at the tail of the stdout (it has been running since Wednesday) just now and see that it says the following:

Processing ./input/day_0.npz
Load 10508943/199563535 Split: 1 Label True: 0 Stored: 0

Does this mean that day_0 hasnt even finished processing yet or is it just on a recently started epoch? I cant look at the whole stdout right now since the file is about 57 GB big.

Thank you,
Rumit

@mnaumovfb
Copy link
Contributor

mnaumovfb commented Sep 6, 2022

I apolgize for the delay in response. The pre-processing takes multiple days, please refer to the discussion in #119 and #58 for more details.

Also, I wanted to highlight the discussion in #199 (comment) "If you are interested in dealing with a large dataset, you will need to have uninterrupted access to a machine/node with at least 256GB RAM and 8TB of disk space for a long period of time (weeks). My advice is to try working with smaller dataset before trying the larger one."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants