You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The difference seems to be due to regression in the batch size used (2 -> 1). This could be related to the switch to the "block" bucketing mode instead of "none". There was an increase in memory usage for other models resulting in OOM and here it seems it resulted in a smaller batch size that works. @kiya00, could you please take a look at this regression and find out what has caused this?
🐛 Bug
Here are recently found regressions:
To Reproduce
All parameters to benchmark_litgpt.py are visible in the attached image.
Environment
system.device_product_name DGXH100
system.gpu_driver_version 535.129.03
libraries.cuda 12.6.98.001
libraries.pip.lightning 2.4.0.dev20240728
libraries.pip.lightning-thunder 0.2.0.dev0
libraries.pip.lightning-utilities 0.11.8
libraries.pip.litgpt 0.4.11
libraries.pip.nvfuser 0.2.22+gitba4f7d4
libraries.pip.pytorch-lightning 2.4.0
libraries.pip.torch 2.6.0a0+gita9b4989
libraries.pip.torchao 0.6.1
libraries.pip.torchmetrics 1.5.1
libraries.pip.torchvision 0.19.0a0+d23a6e1
The text was updated successfully, but these errors were encountered: