seconds/iteration is fast in first epoch, gets slower every subsequent epoch #8659
Replies: 4 comments 11 replies
-
Do you have any data cached across epochs that you need to reset? Does the performance dramatically drop off at each epoch boundary, or does it gradually slow down during training execution? |
Beta Was this translation helpful? Give feedback.
-
@tchaton Any updates on this ? what probably is causing this error ? |
Beta Was this translation helpful? Give feedback.
-
@tchaton Are you interested at all in improving your shitty lib? Are you satisfied ignoring critical dummy issues ? |
Beta Was this translation helpful? Give feedback.
-
@angadkalra Hello. Have you found out what was the cause of the issue? |
Beta Was this translation helpful? Give feedback.
-
I'm training a resnet101 3D model on grayscale images, 224x224x320, 16-bit precision, on VM with 4xV100 GPUs, using DDP and num_workers = 4. Batch size is 8 (2 per GPU). My first epoch goes very fast with 2.5s/it, but every epoch after that gets slower and slower and at epoch 6 I'm getting 6.5s/it. Any idea why this is happening or tips to speed up?
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions