Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parallelize gpu test #1042

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

parallelize gpu test #1042

wants to merge 5 commits into from

Conversation

josephleekl
Copy link
Contributor

@josephleekl josephleekl commented Jan 17, 2025

Before submitting

Please complete the following checklist when submitting a PR:

  • All new features must include a unit test.
    If you've fixed a bug or added code that should be tested, add a test to the
    tests directory!

  • All new functions and code must be clearly commented and documented.
    If you do make documentation changes, make sure that the docs build and
    render correctly by running make docs.

  • Ensure that the test suite passes, by running make test.

  • Add a new entry to the .github/CHANGELOG.md file, summarizing the
    change, and including a link back to the PR.

  • Ensure that code is properly formatted by running make format.

When all the above are checked, delete everything above the dashed
line and fill in the pull request template.


Context:

Description of the Change:
Some quick test result:

Python Tests (lightning_kokkos, kokkos-4.5.00, model-CUDA)
24m22s -> 17m35s

Python Tests (lightning_gpu, cuda-12)
1h21m29s -> 2h12m34s

Python Tests (lightning_tensor, cuda-12)
36m15s -> 24m2s

Benefits:

Possible Drawbacks:

Related GitHub Issues:

@josephleekl josephleekl added the ci:use-gpu-runner Enable usage of GPU runner for this Pull Request label Jan 17, 2025
Copy link
Contributor

Hello. You may have forgotten to update the changelog!
Please edit .github/CHANGELOG.md with:

  • A one-to-two sentence description of the change. You may include a small working example for new features.
  • A link back to this PR.
  • Your name (or GitHub username) in the contributors section.

@josephleekl josephleekl added the draft Indicates that the PR is still in draft mode, but needs CIs. label Jan 17, 2025
@josephleekl josephleekl marked this pull request as ready for review January 17, 2025 22:48
@josephleekl josephleekl added ci:use-gpu-runner Enable usage of GPU runner for this Pull Request and removed ci:use-gpu-runner Enable usage of GPU runner for this Pull Request draft Indicates that the PR is still in draft mode, but needs CIs. labels Jan 18, 2025
Copy link

codecov bot commented Jan 18, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 90.08%. Comparing base (290c986) to head (11a1c3f).

❗ There is a different number of reports uploaded between BASE (290c986) and HEAD (11a1c3f). Click for more details.

HEAD has 49 uploads less than BASE
Flag BASE (290c986) HEAD (11a1c3f)
56 7
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1042      +/-   ##
==========================================
- Coverage   98.10%   90.08%   -8.02%     
==========================================
  Files         233      112     -121     
  Lines       39079    16990   -22089     
==========================================
- Hits        38339    15306   -23033     
- Misses        740     1684     +944     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -206,7 +206,8 @@ jobs:
OMP_PROC_BIND: false
run: |
DEVICENAME=`echo ${{ matrix.pl_backend }} | sed "s/_/./g"`
PL_DEVICE=${DEVICENAME} python -m pytest tests/ $COVERAGE_FLAGS
Copy link
Member

@multiphaseCFD multiphaseCFD Jan 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @josephleekl . Do we want to install pytest-xdist? How can we confiugre the maping between cpu cores and gpu devices or instances (MIG)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note here: benchmarks are required for this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! pytest-xdist is already installed from requirements-dev.txt . We're just using the single GPU device, but might be possible that if we partition into MIG and map directly, it will be more performant.

I have posted some preliminary timings in the PR description; for now GPU tests seem to take longer... I'll investigate more when I have time!

@josephleekl josephleekl added draft Indicates that the PR is still in draft mode, but needs CIs. do not merge Do not merge PR until this label is removed labels Jan 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci:use-gpu-runner Enable usage of GPU runner for this Pull Request do not merge Do not merge PR until this label is removed draft Indicates that the PR is still in draft mode, but needs CIs.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants