Universal checkpoint for zero stage 3 #10192
nv-torch-latest-v100.yml
on: pull_request
unit-tests
19m 53s
Annotations
1 error
unit-tests
The self-hosted runner: ds-nv-v100-cu117-runner-c14a1249 lost communication with the server. Verify the machine is running and has a healthy network connection. Anything in your workflow that terminates the runner process, starves it for CPU/Memory, or blocks its network access can cause this error.
|