Data and model are not sent to the correct device when multiple devices are being used #575
Open
1 of 2 tasks
Labels
bug
Something isn't working
1. System Info
Hi @WenjieDu,
As you asked in PR #563, I am creating this issue as I noticed a bug about moving data with models to different GPUs. If the device passed to the model is not a
torch.device()
object (i.e., a string ascuda:2
or an integer as2
) the function_send_data_to_given_device()
does not behave correctly:You can see that the first if branch checks if the
self.device
object is atorch.device
, else everything is moved tocuda()
, that without specification iscuda:0
or the first device available, thus moving the data to a different device, leading to the model being oncuda:2
and the data oncuda:0
, crashing.Let me know if there is any additional information you might need about this.
2. Information
3. Reproduction
Steps to reproduce the behavior:
device='cuda:1'
or another device that is notcuda:0
. Whatever is not atorch.device
instance makes the model crash, but with0
orcuda:0
it works anyway as0
is the default device.4. Expected behavior
I would have expected the internal code to the model to handle a numerical value or string value, possibly converting it to
torch.device
if necessary, to move both the model and the subsequent passed data for training to the same device.The text was updated successfully, but these errors were encountered: