You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was reading the accelerate documentation on FSDP (Documentation Page Link), when I noticed that the section which goes over all the keys and their values in the generated configs, there might have a jumble up while explaining the following 2 keys: fsdp_transformer_layer_cls_to_wrap and fsdp_auto_wrap_policy.
Details
fsdp_transformer_layer_cls_to_wrap: Only applicable for Transformers. When using fsdp_auto_wrap_policy=TRANSFORMER_BASED_WRAP, a user may provide a comma-separated string of transformer layer class names (case-sensitive) to wrap, e.g., BertLayer, GPTJBlock, T5Block, BertLayer,BertEmbeddings,BertSelfOutput. This is important because submodules that share weights (e.g., embedding layers) should not end up in different FSDP wrapped units. Using this policy, wrapping happens for each block containing Multi-Head Attention followed by a couple of MLP layers. Remaining layers including the shared embeddings are conveniently wrapped in same outermost FSDP unit. Therefore, use this for transformer-based models. You can use the model._no_split_modules for Transformer models by answering yes to Do you want to use the model's _no_split_modules to wrap. It will try to use model._no_split_modules` when possible.
I maybe wrong, in which case I have a doubt, if we select model._no_split_modules, then is mentioning the fsdp_transformer_layer_cls_to_wrap in the config necessary ? Since accelerate is anyway going to only wrap the layers which have Mulit-headed attention block followed by MLP, is specifically giving the class name even required ?
The text was updated successfully, but these errors were encountered:
Correct, fsdp_transformer_layer_cls_to_wrap is optional if model._no_split_modules exists, we'll auto figure that out. (So if the model comes from transformers)
Hi,
I was reading the accelerate documentation on FSDP (Documentation Page Link), when I noticed that the section which goes over all the keys and their values in the generated configs, there might have a jumble up while explaining the following 2 keys: fsdp_transformer_layer_cls_to_wrap and fsdp_auto_wrap_policy.
Details
fsdp_transformer_layer_cls_to_wrap: Only applicable for Transformers. When using fsdp_auto_wrap_policy=TRANSFORMER_BASED_WRAP, a user may provide a comma-separated string of transformer layer class names (case-sensitive) to wrap, e.g., BertLayer, GPTJBlock, T5Block, BertLayer,BertEmbeddings,BertSelfOutput. This is important because submodules that share weights (e.g., embedding layers) should not end up in different FSDP wrapped units. Using this policy, wrapping happens for each block containing Multi-Head Attention followed by a couple of MLP layers. Remaining layers including the shared embeddings are conveniently wrapped in same outermost FSDP unit. Therefore, use this for transformer-based models. You can use the model._no_split_modules for Transformer models by answering yes to Do you want to use the model's _no_split_modules to wrap. It will try to use model._no_split_modules` when possible.
I maybe wrong, in which case I have a doubt, if we select
model._no_split_modules
, then is mentioning thefsdp_transformer_layer_cls_to_wrap
in the config necessary ? Since accelerate is anyway going to only wrap the layers which have Mulit-headed attention block followed by MLP, is specifically giving the class name even required ?The text was updated successfully, but these errors were encountered: