-
Notifications
You must be signed in to change notification settings - Fork 525
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deterministic version of CUDA forces and stresses kernels #3693
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## devel #3693 +/- ##
=======================================
Coverage 82.15% 82.15%
=======================================
Files 511 511
Lines 47364 47364
Branches 2952 2952
=======================================
Hits 38910 38910
Misses 7561 7561
Partials 893 893 ☔ View full report in Codecov by Sentry. |
It seems that the current change cannot make the GPU CI pass. |
The errors are unclear to me. It is not a compilation problem (which I would have fixed asap) but something deep in python. One file is just a text file, the two other files are source code. The there is no change in the API, no memory leak (I checked for this), etc.... I can also compile deepmd in containers etc.... NB : I use this docker file
|
5d042fd
to
dd7729b
Compare
Something is wrong with this patch. It works with master but fails with the devel branch. I am missing something. |
dd7729b
to
39ceb1a
Compare
Warning Rate Limit Exceeded@mtaillefumier has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 22 minutes and 59 seconds before requesting another review. How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. WalkthroughThe recent updates focus on enhancing CUDA kernels for molecular dynamics simulations by optimizing memory access, thread configurations, and introducing flags for radial calculations. Additionally, a new guide on achieving deterministic results with DeepMD has been introduced, improving reproducibility across different hardware setups. Changes
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
Sorry for the delay. I think I found the root of the problem. A missing synchronization barrier. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
e2626d9
to
f8a0b31
Compare
- Remove atomic operations in the forces and stress kernels. - Use template programming to minimize code duplication. Authors : - A. Sedova (ORNL) - M. Taillefumier (ETH Zurich / CSCS)
f8a0b31
to
13c83f8
Compare
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Taillefumier Mathieu <[email protected]>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Taillefumier Mathieu <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
I found why this PR fails. I missed something important when I modified these kernels and I know how to correct them now. I will close the PR for the moment so that I can push changes without triggering the CI/CD all the time. |
Calculations of the forces and stress are deterministic on GPU. It does not imply that the DeepMD code is deterministic by default as TensorFlow also requires to be set up properly either at runtime or during the initialization phase.
To obtain the same model parameters, add the following variables to the job scripts
Details of the changes:
Authors :
See PR #3684 for comments
Summary by CodeRabbit
New Features
Refactor
Documentation
Bug Fixes