Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] - TPU training not working in Google Colab #2670

Open
jbelhamc1 opened this issue Feb 14, 2025 · 0 comments
Open

[BUG] - TPU training not working in Google Colab #2670

jbelhamc1 opened this issue Feb 14, 2025 · 0 comments
Labels
bug Something isn't working gpu Question or bug occuring with gpu triage Issue waiting for triaging

Comments

@jbelhamc1
Copy link

Describe the bug
I am aiming to use TPUs to train on Google Colab.
After update to Google Colab and pytorch xla wheels, the following code no longer runs as I believe TPUs don't run in nodes, they are intrinsically linked to the VMs they are running on. :

To Reproduce

!pip install cloud-tpu-client==0.10 https://storage.googleapis.com/tpu-pytorch/wheels/torch_xla-1.9-cp37-cp37m-linux_x86_64.whl
!pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchtext==0.10.0 -f https://download.pytorch.org/whl/cu111/torch_stable.html
!pip install pyyaml==5.4.1

Expected behavior
Could alternative code be provided and the documentation updated? I am currently trying to use:

!pip install torch~=2.6.0 'torch_xla[tpu]~=2.6.0' \
  -f https://storage.googleapis.com/libtpu-releases/index.html \
  -f https://storage.googleapis.com/libtpu-wheels/index.html

!pip install cloud-tpu-client==0.10
    
# Install the latest PyTorch packages (using CUDA 12.6 builds)
!pip install torch==2.6.0 torchvision==0.21.0 torchtext==0.18 \
    -f https://download.pytorch.org/whl/cu118/torch_stable.html

# Install the latest version of PyYAML
!pip install pyyaml==6.0

Whilst this then shows the TPU as available, training is incredibly slow at around 0.18 it/second (when it works) and when running the code here , it never gets past cell [11] as it gets stuck here for unknown reasons and no progress bar even shows up.

System (please complete the following information):

  • Python version: [e.g. 3.11.11]
  • darts version [e.g. 0.32.0]

Additional context
The code in "To Reproduce" is taken from here.

@jbelhamc1 jbelhamc1 added bug Something isn't working triage Issue waiting for triaging labels Feb 14, 2025
@madtoinou madtoinou added the gpu Question or bug occuring with gpu label Feb 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working gpu Question or bug occuring with gpu triage Issue waiting for triaging
Projects
None yet
Development

No branches or pull requests

2 participants