Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test reverting of CUDA installation on Windows staging #840

Merged
merged 2 commits into from
Sep 27, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 0 additions & 6 deletions config/imagesets.yml
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,6 @@ generic-worker-freebsd:
genericWorker:
config:
ed25519SigningKeyLocation: /etc/generic-worker/ed25519_key
enableInteractive: true
idleTimeoutSecs: 15
shutdownMachineOnIdle: true
shutdownMachineOnInternalError: true
Expand All @@ -96,7 +95,6 @@ generic-worker-ubuntu-24-04:
genericWorker:
config:
ed25519SigningKeyLocation: /etc/generic-worker/ed25519_key
enableInteractive: true
idleTimeoutSecs: 15
shutdownMachineOnIdle: true
shutdownMachineOnInternalError: true
Expand All @@ -118,7 +116,6 @@ generic-worker-ubuntu-24-04-arm64:
genericWorker:
config:
ed25519SigningKeyLocation: /etc/generic-worker/ed25519_key
enableInteractive: true
idleTimeoutSecs: 15
shutdownMachineOnIdle: true
shutdownMachineOnInternalError: true
Expand All @@ -134,10 +131,8 @@ generic-worker-ubuntu-24-04-staging:
genericWorker:
config:
ed25519SigningKeyLocation: /etc/generic-worker/ed25519_key
enableInteractive: true
idleTimeoutSecs: 15
shutdownMachineOnIdle: true
shutdownMachineOnInternalError: false
workerTypeMetadata:
machine-setup:
maintainer: [email protected]
Expand Down Expand Up @@ -198,7 +193,6 @@ generic-worker-win2022-staging:
idleTimeoutSecs: 15
livelogExecutable: C:\generic-worker\livelog.exe
shutdownMachineOnIdle: true
shutdownMachineOnInternalError: true
taskclusterProxyExecutable: C:\generic-worker\taskcluster-proxy.exe
workerTypeMetadata:
machine-setup:
Expand Down
6 changes: 0 additions & 6 deletions config/projects/bugbug.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,6 @@ bugbug:
workerConfig:
genericWorker:
config:
enableInteractive: true
maxTaskRunTime: 87500
batch:
owner: [email protected]
Expand All @@ -30,7 +29,6 @@ bugbug:
workerConfig:
genericWorker:
config:
enableInteractive: true
maxTaskRunTime: 87500
compute-smaller:
owner: [email protected]
Expand All @@ -45,7 +43,6 @@ bugbug:
workerConfig:
genericWorker:
config:
enableInteractive: true
maxTaskRunTime: 87500
compute-small:
owner: [email protected]
Expand All @@ -60,7 +57,6 @@ bugbug:
workerConfig:
genericWorker:
config:
enableInteractive: true
maxTaskRunTime: 87500
compute-large:
owner: [email protected]
Expand All @@ -75,7 +71,6 @@ bugbug:
workerConfig:
genericWorker:
config:
enableInteractive: true
maxTaskRunTime: 87500
compute-super-large:
owner: [email protected]
Expand All @@ -90,7 +85,6 @@ bugbug:
workerConfig:
genericWorker:
config:
enableInteractive: true
maxTaskRunTime: 87500
secrets:
bugbug/deploy: true
Expand Down
4 changes: 0 additions & 4 deletions config/projects/mozci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -38,10 +38,6 @@ mozci:
cloud: gcp
minCapacity: 0
maxCapacity: 5
workerConfig:
genericWorker:
config:
enableInteractive: true
secrets:
testing: true
production: true
Expand Down
4 changes: 0 additions & 4 deletions config/projects/relman.yml
Original file line number Diff line number Diff line change
Expand Up @@ -62,10 +62,6 @@ relman:
cloud: gcp
minCapacity: 0
maxCapacity: 50
workerConfig:
genericWorker:
config:
enableInteractive: true

secrets:
bugzilla-dashboard-backend/deploy-production: true
Expand Down
23 changes: 10 additions & 13 deletions config/projects/taskcluster.yml
Original file line number Diff line number Diff line change
Expand Up @@ -90,9 +90,11 @@ taskcluster:
workerConfig:
genericWorker:
config:
enableInteractive: true
shutdownMachineOnInternalError: false
# this pool isn't in regular use, and when we use it, it is
# typically for testing bare metal stuff, so nice to have
# longer timeout
idleTimeoutSecs: 3600
shutdownMachineOnInternalError: false
# Use c5.metal to test kvm
instanceTypes:
m5d.metal: 1
Expand All @@ -104,11 +106,6 @@ taskcluster:
cloud: gcp
minCapacity: 0
maxCapacity: 50
workerConfig:
genericWorker:
config:
enableInteractive: true
shutdownMachineOnInternalError: false

gw-ubuntu-24-04-arm64:
owner: [email protected]
Expand All @@ -117,10 +114,6 @@ taskcluster:
cloud: gcp
minCapacity: 0
maxCapacity: 5
workerConfig:
genericWorker:
config:
enableInteractive: true
machineType: "zones/{zone}/machineTypes/t2a-standard-4"

gw-ubuntu-staging-aws:
Expand All @@ -136,7 +129,7 @@ taskcluster:
# While iterating on the image building process for this worker
# pool, useful for workers not to die immediately...
idleTimeoutSecs: 3600
enableInteractive: true
shutdownMachineOnInternalError: false

gw-ubuntu-staging-google:
owner: [email protected]
Expand All @@ -159,7 +152,7 @@ taskcluster:
# While iterating on the image building process for this worker
# pool, useful for workers not to die immediately...
idleTimeoutSecs: 3600
enableInteractive: true
shutdownMachineOnInternalError: false

gw-windows-2022:
owner: [email protected]
Expand All @@ -176,12 +169,16 @@ taskcluster:
cloud: azure
minCapacity: 0
maxCapacity: 10
# Currently staging uses a GPU pool so we can also test GPU related changes
vmSizes:
Standard_NV12s_v3: 1
workerConfig:
genericWorker:
config:
# While iterating on the image building process for this worker
# pool, useful for workers not to die immediately...
idleTimeoutSecs: 3600
shutdownMachineOnInternalError: false

gw-windows-2022-gpu:
owner: [email protected]
Expand Down
1 change: 1 addition & 0 deletions generate/workers.py
Original file line number Diff line number Diff line change
Expand Up @@ -588,6 +588,7 @@ def generic_worker(wp, **cfg):
"genericWorker": {
"config": {
"enableD2G": True,
"enableInteractive": True,
"idleTimeoutSecs": 600,
"wstAudience": "communitytc",
"wstServerURL": "https://community-websocktunnel.services.mozilla.com",
Expand Down
7 changes: 5 additions & 2 deletions imagesets/generic-worker-win2022-staging/bootstrap.ps1
Original file line number Diff line number Diff line change
Expand Up @@ -225,8 +225,11 @@ if ($hasNvidiaGpu) {
Start-Process "C:\nvidia_driver.exe" -ArgumentList "-s", "-noreboot" -Wait -NoNewWindow -RedirectStandardOutput "C:\nvidia-install-stdout.txt" -RedirectStandardError "C:\nvidia-install-stderr.txt"
# install CUDA
# https://github.com/taskcluster/community-tc-config/issues/713
$client.DownloadFile("https://developer.download.nvidia.com/compute/cuda/12.6.1/local_installers/cuda_12.6.1_560.94_windows.exe", "C:\cuda_installer.exe")
Start-Process "C:\cuda_installer.exe" -ArgumentList "-s", "-noreboot" -Wait -NoNewWindow -RedirectStandardOutput "C:\cuda-install-stdout.txt" -RedirectStandardError "C:\cuda-install-stderr.txt"

# Test removing this in staging to see if it fixes things...
# $client.DownloadFile("https://developer.download.nvidia.com/compute/cuda/12.6.1/local_installers/cuda_12.6.1_560.94_windows.exe", "C:\cuda_installer.exe")
# Start-Process "C:\cuda_installer.exe" -ArgumentList "-s", "-noreboot" -Wait -NoNewWindow -RedirectStandardOutput "C:\cuda-install-stdout.txt" -RedirectStandardError "C:\cuda-install-stderr.txt"

}

# now shutdown, in preparation for creating an image
Expand Down