Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partial accelerator.prepare() Usage #3422

Open
zhanglixuan0720 opened this issue Mar 6, 2025 · 0 comments
Open

Partial accelerator.prepare() Usage #3422

zhanglixuan0720 opened this issue Mar 6, 2025 · 0 comments

Comments

@zhanglixuan0720
Copy link

I am working with a model that consists of two separate encoders (one for RGB and one for Depth) and an MLP head for action prediction. The outputs of the two encoders are concatenated before being passed into the MLP head. I am using accelerator.prepare() to prepare only the two encoders and the MLP head, but the concatenation (torch.cat) is not included in prepare(). I am wondering how this affects distributed training. Specifically:

  1. Will the concatenation operation run on a single GPU if it is not prepared?
  2. Could this cause device mismatches between prepared and unprepared module?
  3. How does gradient synchronization behave in this scenario?

Thanks in advance for your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant