You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We chose to use Hugging Face Accelerate to implement support for both DeepSpeed and FSDP. On Nvidia cards, this was a drop-in replacement for a fully custom training loop. However, minor training loop changes may be needed to run on Intel Gaudi and AMD Instinct cards.
One example of this difference is that on Gaudi cards, one switches:
- model.to(device_id)
+ model.to(device_hpu)
And the Intel Gaudi Pytorch bridge handles moving the model to the correct GPU according to the assigned process rank.
This is different enough from the vanilla Nvidia + torch training loop that Accelerate might not support it directly.
We need to scope:
What is the current state of potential support for running distributed training on AMD / Intel
Is it possible to non-invasively patch Accelerate to make it work on Intel / AMD
How can we propagate those changes to the upstream
We chose to use Hugging Face Accelerate to implement support for both DeepSpeed and FSDP. On Nvidia cards, this was a drop-in replacement for a fully custom training loop. However, minor training loop changes may be needed to run on Intel Gaudi and AMD Instinct cards.
One example of this difference is that on Gaudi cards, one switches:
And the Intel Gaudi Pytorch bridge handles moving the model to the correct GPU according to the assigned process rank.
This is different enough from the vanilla Nvidia + torch training loop that Accelerate might not support it directly.
We need to scope:
Contributing tasks:
The text was updated successfully, but these errors were encountered: