Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ThunderFX is significantly slower than 2 weeks ago for 3 models #1428

Open
mpatel31415 opened this issue Nov 12, 2024 · 1 comment
Open

ThunderFX is significantly slower than 2 weeks ago for 3 models #1428

mpatel31415 opened this issue Nov 12, 2024 · 1 comment
Assignees
Labels
mixology Issues that the mixology team has surfaced

Comments

@mpatel31415
Copy link
Contributor

🐛 Bug

Here are recently found regressions:

image

To Reproduce

All parameters to benchmark_litgpt.py are visible in the attached image.

Environment

system.device_product_name DGXH100
system.gpu_driver_version 535.129.03
libraries.cuda 12.6.98.001
libraries.pip.lightning 2.4.0.dev20240728
libraries.pip.lightning-thunder 0.2.0.dev0
libraries.pip.lightning-utilities 0.11.8
libraries.pip.litgpt 0.4.11
libraries.pip.nvfuser 0.2.22+gitba4f7d4
libraries.pip.pytorch-lightning 2.4.0
libraries.pip.torch 2.6.0a0+gita9b4989
libraries.pip.torchao 0.6.1
libraries.pip.torchmetrics 1.5.1
libraries.pip.torchvision 0.19.0a0+d23a6e1

@IvanYashchuk IvanYashchuk added the mixology Issues that the mixology team has surfaced label Nov 12, 2024
@IvanYashchuk
Copy link
Collaborator

The difference seems to be due to regression in the batch size used (2 -> 1). This could be related to the switch to the "block" bucketing mode instead of "none". There was an increase in memory usage for other models resulting in OOM and here it seems it resulted in a smaller batch size that works. @kiya00, could you please take a look at this regression and find out what has caused this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
mixology Issues that the mixology team has surfaced
Projects
None yet
Development

No branches or pull requests

3 participants