Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deterministic version of CUDA forces and stresses kernels #3684

Closed
wants to merge 2 commits into from

Conversation

mtaillefumier
Copy link

Calculations of the forces and stress are deterministic on GPU. It does not imply that the DeepMD code is deterministic by default as TensorFlow also requires to be set up properly either at runtime or during the initialization phase.

To obtain the same model parameters, add the following variables to the job scripts

export TF_DETERMINISTIC_OPS=1
export TF_INTER_OP_PARALLELISM_THREADS=0
export TF_INTER_OP_PARALLELISM_THREADS=0

Details of the changes:

  • Remove the use of atomic operations in the forces and stress kernels.
  • Use template programming to minimize code duplication and minor refactoring

Authors :

- Remove the use of atomic operations in the forces and stress kernels.
- Use template programming to minimize code duplication.

Authors :
- Mathieu Taillefumier (ETH Zurich / CSCS)
- Ada Sedova (ORNL)
@njzjz njzjz changed the base branch from master to devel April 18, 2024 17:46
@njzjz njzjz added the Test CUDA Trigger test CUDA workflow label Apr 18, 2024
@github-actions github-actions bot removed the Test CUDA Trigger test CUDA workflow label Apr 18, 2024
@njzjz
Copy link
Member

njzjz commented Apr 18, 2024

Please rebase the PR against the devel branch. We don't accept PRs on other branches.

Comment on lines +101 to +106
// search the index of the atom i in the local neighbor list of atom j
for (atom_id_position = 0; atom_id_position < nnei; atom_id_position++) {
if (nei_nei_list_[atom_id_position] == atom_id) {
break;
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The complexity of the index searching is of order N_nei, which does not present in the atomic operation implementation. Does it have an observable side effect on the performance of the prod_force operator?

Copy link
Author

@mtaillefumier mtaillefumier Apr 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could not observe any side effect except fluctuations of 5 % in performance in my miniapp. We are speaking about us here as well. The gain is that these operators are deterministic by default which is worth the 5% potential penalty (or less) introduced by this code change.

@mtaillefumier
Copy link
Author

mtaillefumier commented Apr 19, 2024

Please rebase the PR against the devel branch. We don't accept PRs on other branches.

sorry for this. It is not possible to change the source branch without creating a new PR. I can do it now or after the discussion to avoid reviewing the same code twice.

@njzjz
Copy link
Member

njzjz commented Apr 19, 2024

sorry for this. It is not possible to change the source branch without creating a new PR. I can do it now or after the discussion to avoid reviewing the same code twice.

I don't think we can review the code until all unit tests pass. The current PR blocks the CI from being triggered.

@mtaillefumier
Copy link
Author

I opened a new PR #3693 on the develop branch starting the latest devel branch as well. Same title, the same content

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants