-
Notifications
You must be signed in to change notification settings - Fork 525
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deterministic version of CUDA forces and stresses kernels #3684
Conversation
- Remove the use of atomic operations in the forces and stress kernels. - Use template programming to minimize code duplication. Authors : - Mathieu Taillefumier (ETH Zurich / CSCS) - Ada Sedova (ORNL)
for more information, see https://pre-commit.ci
Please rebase the PR against the |
// search the index of the atom i in the local neighbor list of atom j | ||
for (atom_id_position = 0; atom_id_position < nnei; atom_id_position++) { | ||
if (nei_nei_list_[atom_id_position] == atom_id) { | ||
break; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The complexity of the index searching is of order N_nei, which does not present in the atomic operation implementation. Does it have an observable side effect on the performance of the prod_force
operator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could not observe any side effect except fluctuations of 5 % in performance in my miniapp. We are speaking about us here as well. The gain is that these operators are deterministic by default which is worth the 5% potential penalty (or less) introduced by this code change.
sorry for this. It is not possible to change the source branch without creating a new PR. I can do it now or after the discussion to avoid reviewing the same code twice. |
I don't think we can review the code until all unit tests pass. The current PR blocks the CI from being triggered. |
I opened a new PR #3693 on the develop branch starting the latest devel branch as well. Same title, the same content |
Calculations of the forces and stress are deterministic on GPU. It does not imply that the DeepMD code is deterministic by default as TensorFlow also requires to be set up properly either at runtime or during the initialization phase.
To obtain the same model parameters, add the following variables to the job scripts
Details of the changes:
Authors :