Skip to content

Universal checkpoint for zero stage 3 #10192

Universal checkpoint for zero stage 3

Universal checkpoint for zero stage 3 #10192

Triggered via pull request June 10, 2024 16:13
Status Failure
Total duration 25m 4s
Artifacts

nv-torch-latest-v100.yml

on: pull_request
Fit to window
Zoom out
Zoom in

Annotations

1 error
unit-tests
The self-hosted runner: ds-nv-v100-cu117-runner-c14a1249 lost communication with the server. Verify the machine is running and has a healthy network connection. Anything in your workflow that terminates the runner process, starves it for CPU/Memory, or blocks its network access can cause this error.