-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about GPU Usage, Training Time, and Dataset Size #6
Comments
Thank you for your interest in ConsisID. We used 40 NVIDIA H100 GPUs with a total batch size of 80, and trained for 1800 steps. The training dataset consists of approximately 140,000 video clips. More details can be found in our report. |
but only a single 80G graphics card is needed for training |
Since the above mentions that 40 NVIDIA H100 GPUs are used for training, how long will it take to train on a single 80G gpu such as A100? Will it affect the model performance? |
Fro Q1, the specific speed difference may require actual testing. For Q2, ideally, it will not affect performance, but in reality, the global batch size (40x GPU vs 1x GPU) may affect convergence. |
请问您用40张H100GPU训练了多长时间大概。 |
About 7~8 hours. |
好的感谢您的回答,然后我还想问下,请问你们训练的时候是单精度训练还是双精度训练,因为我想用H800显卡跑,但是H800在双精度上性能比较差。 |
用的bf16训练 |
Nice work! I have a few questions regarding the details of your experiment.
Thanks in advance!
The text was updated successfully, but these errors were encountered: