Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible issue in Accelerate FSDP Documentation #3195

Open
Quicksilver466 opened this issue Oct 24, 2024 · 1 comment
Open

Possible issue in Accelerate FSDP Documentation #3195

Quicksilver466 opened this issue Oct 24, 2024 · 1 comment

Comments

@Quicksilver466
Copy link

Hi,

I was reading the accelerate documentation on FSDP (Documentation Page Link), when I noticed that the section which goes over all the keys and their values in the generated configs, there might have a jumble up while explaining the following 2 keys: fsdp_transformer_layer_cls_to_wrap and fsdp_auto_wrap_policy.

Details

fsdp_transformer_layer_cls_to_wrap: Only applicable for Transformers. When using fsdp_auto_wrap_policy=TRANSFORMER_BASED_WRAP, a user may provide a comma-separated string of transformer layer class names (case-sensitive) to wrap, e.g., BertLayer, GPTJBlock, T5Block, BertLayer,BertEmbeddings,BertSelfOutput. This is important because submodules that share weights (e.g., embedding layers) should not end up in different FSDP wrapped units. Using this policy, wrapping happens for each block containing Multi-Head Attention followed by a couple of MLP layers. Remaining layers including the shared embeddings are conveniently wrapped in same outermost FSDP unit. Therefore, use this for transformer-based models. You can use the model._no_split_modules for Transformer models by answering yes to Do you want to use the model's _no_split_modules to wrap. It will try to use model._no_split_modules` when possible.

I maybe wrong, in which case I have a doubt, if we select model._no_split_modules, then is mentioning the fsdp_transformer_layer_cls_to_wrap in the config necessary ? Since accelerate is anyway going to only wrap the layers which have Mulit-headed attention block followed by MLP, is specifically giving the class name even required ?

@muellerzr
Copy link
Collaborator

Correct, fsdp_transformer_layer_cls_to_wrap is optional if model._no_split_modules exists, we'll auto figure that out. (So if the model comes from transformers)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants