Partial accelerator.prepare() Usage #3422

zhanglixuan0720 · 2025-03-06T07:54:40Z

I am working with a model that consists of two separate encoders (one for RGB and one for Depth) and an MLP head for action prediction. The outputs of the two encoders are concatenated before being passed into the MLP head. I am using accelerator.prepare() to prepare only the two encoders and the MLP head, but the concatenation (torch.cat) is not included in prepare(). I am wondering how this affects distributed training. Specifically:

Will the concatenation operation run on a single GPU if it is not prepared?
Could this cause device mismatches between prepared and unprepared module?
How does gradient synchronization behave in this scenario?

Thanks in advance for your help!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Partial accelerator.prepare() Usage #3422

Partial accelerator.prepare() Usage #3422

zhanglixuan0720 commented Mar 6, 2025

Partial accelerator.prepare() Usage #3422

Partial accelerator.prepare() Usage #3422

Comments

zhanglixuan0720 commented Mar 6, 2025