Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add PP tracer + DP test #379

Open
wants to merge 2 commits into
base: gh/kwen2501/4/base
Choose a base branch
from
Open

Conversation

kwen2501
Copy link
Contributor

@kwen2501 kwen2501 commented Jun 1, 2024

Stack from ghstack (oldest at bottom):

Separate the addition of 2D test from original PR #362 for easier review and landing.

Also changed .items() to named_children() for more general submodule access. See similar changes in #371.

[ghstack-poisoned]
@kwen2501 kwen2501 mentioned this pull request Jun 1, 2024
kwen2501 added a commit that referenced this pull request Jun 1, 2024
ghstack-source-id: e26b222d093e1405277e45f42d93f05e58286980
Pull Request resolved: #379
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jun 1, 2024
@kwen2501
Copy link
Contributor Author

kwen2501 commented Jun 1, 2024

CI should pass after pytorch/pytorch#127607 is landed.

Comment on lines +427 to +429
# Q1: what is the contract of `fully_shard`? Would it transform
# module in place? @awgu indicates that it would. Then, we shouldn't
# need to register the module with its parent again.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed

Comment on lines +430 to +433
# Q2: does this requirement come from Activation Checkpointing, and
# maybe this line too?
# `transformer_block = torch.compile(transformer_block`)
# If that's the case, should this line be moved above to after AC?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think both of these lines included module wrappers, so AC and compile are the reasons why we have to reassign the module into the parent.

transformer_block = checkpoint_wrapper(transformer_block, ac_config)
transformer_block = torch.compile(transformer_block, dynamic=False)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think your point is good though. I think we have some unnecessary assignments back to the parent module here.

Separate the addition of 2D test from original PR #362 for easier review and landing.

Also changed `.items()` to `named_children()` for more general submodule access. See similar changes in #371. 

[ghstack-poisoned]
kwen2501 added a commit that referenced this pull request Jun 3, 2024
ghstack-source-id: e26b222d093e1405277e45f42d93f05e58286980
Pull Request resolved: #379
@tianyu-l tianyu-l force-pushed the gh/kwen2501/4/base branch from 8de2449 to 8f6cf62 Compare August 16, 2024 21:00
@tianyu-l tianyu-l force-pushed the gh/kwen2501/4/head branch from 4f9bad3 to 490652c Compare August 16, 2024 21:00
tianyu-l pushed a commit that referenced this pull request Aug 16, 2024
ghstack-source-id: e26b222d093e1405277e45f42d93f05e58286980
Pull Request resolved: #379
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants