Enable torch.autocast with ZeRO #6993

tohtana · 2025-02-03T07:19:20Z

This PR supports torch.autocast with ZeRO. You need to enable torch.autocast in DeepSpeed config.

"torch_autocast": {
  "enabled": true,
  "dtype": "bfloat16"
}

You don't need to explicitly call torch.autocast in your code. The grad scaler is also applied in the DeepSpeed optimizer.

All the parameters are maintained in FP32 but certain operators and layers are computed in the specified dtype. With ZeRO enabled, the communication (allreduce/reduce_scatter) for specified layers are also done in the specified dtype (List of modules).
You cannot enable fp16 or bf16 with torch.autocast.

(Currently working on ZeRO3)

tohtana and others added 6 commits January 31, 2025 21:41

add autocast support and ds_config item

29c6bd0

prepare ipg buckets for multiple dtypes

c5400cb

switch communication data type

96573b6

add gradscaler

0e52edb

Merge branch 'master' into tohtana/support_autocast

2202960

fix import and formatting

b294eae

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable torch.autocast with ZeRO #6993

Enable torch.autocast with ZeRO #6993

tohtana commented Feb 3, 2025

Enable torch.autocast with ZeRO #6993

Are you sure you want to change the base?

Enable torch.autocast with ZeRO #6993

Conversation

tohtana commented Feb 3, 2025