You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am working with a model that consists of two separate encoders (one for RGB and one for Depth) and an MLP head for action prediction. The outputs of the two encoders are concatenated before being passed into the MLP head. I am using accelerator.prepare() to prepare only the two encoders and the MLP head, but the concatenation (torch.cat) is not included in prepare(). I am wondering how this affects distributed training. Specifically:
Will the concatenation operation run on a single GPU if it is not prepared?
Could this cause device mismatches between prepared and unprepared module?
How does gradient synchronization behave in this scenario?
Thanks in advance for your help!
The text was updated successfully, but these errors were encountered:
I am working with a model that consists of two separate encoders (one for RGB and one for Depth) and an MLP head for action prediction. The outputs of the two encoders are concatenated before being passed into the MLP head. I am using accelerator.prepare() to prepare only the two encoders and the MLP head, but the concatenation (torch.cat) is not included in prepare(). I am wondering how this affects distributed training. Specifically:
Thanks in advance for your help!
The text was updated successfully, but these errors were encountered: