Distributed Data Parallel communication hook #667
Replies: 1 comment 2 replies
-
Hi David, Also, have you checked out our submission API in https://github.com/mlcommons/algorithmic-efficiency/blob/main/submissions/template/submission.py and example implementations (https://github.com/mlcommons/algorithmic-efficiency/blob/main/reference_algorithms/paper_baselines/adamw/pytorch/submission.py#L93). The idea is that submitters are free to implement each of the submission APIs as they wish. Our workload loss functions return 'unreduced' loss values so I believe you should be able to compute and perform calculations on the gradients per shard. @runame can you confirm? |
Beta Was this translation helpful? Give feedback.
-
Hello,
I am working on a simple idea for a submission to this contest. My idea requires a communication hook to be registered for the distributed data parallel model from pytorch. Essentially, I want to calculate the gradient, then perform some calculation on the gradient separately on each GPU, then all_reduce the results. I do not think that this violates the spirit of the rules, but please let me know if you agree. Thank you for your time.
-David Tweedle
Beta Was this translation helpful? Give feedback.
All reactions