Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Testing farm job canceled before configured timeout #209

Open
mcattamoredhat opened this issue Jul 19, 2024 · 8 comments · Fixed by #212 or #226
Open

Testing farm job canceled before configured timeout #209

mcattamoredhat opened this issue Jul 19, 2024 · 8 comments · Fixed by #212 or #226
Assignees
Labels
type: bug Something isn't working

Comments

@mcattamoredhat
Copy link

Type of issue

Bug Report

Description

We have seen in our downstream CI several testing-farm jobs canceled after 6h 0m . Although configured timeout default value is 480m in action inputs.

The error log message doesn't not provide any details, just the message Request was canceled on user request.

This is an example of the issue https://github.com/virt-s1/rhel-edge/actions/runs/9963311207/job/27529080681 edge-rhel-94-x86 job is using default timeout value of 480m

API request output is https://api.testing-farm.io/v0.1/requests/ee761663-f05f-43c2-84d9-673545b0f037

pipeline.log shows some tests failing:

| RHEL-9.4.0-Nightly:x86_64:/tmt/plans/edge-test/edge-x86-simplified-installer | ERROR       | guest-setup.pre-artifact-installation  | guest setup | https://artifacts.osci.redhat.com/testing-farm/ee761663-f05f-43c2-84d9-673545b0f037/guest-setup-e58d3804-fbd3-4214-aff4-7e12debd843d/guest-setup-output-pre-artifact-installation.txt                                                                                                                       |
| RHEL-9.4.0-Nightly:x86_64:/tmt/plans/edge-test/edge-x86-simplified-installer | ERROR       | guest-setup.post-artifact-installation | guest setup | https://artifacts.osci.redhat.com/testing-farm/ee761663-f05f-43c2-84d9-673545b0f037/guest-setup-e58d3804-fbd3-4214-aff4-7e12debd843d/guest-setup-output-post-artifact-installation.txt

Nevertheless guest pre/post installation logs don't have any failing playbook tasks.

May you please provide some help?

Reproducer

No response

@github-actions github-actions bot added the type: bug Something isn't working label Jul 19, 2024
@jamacku jamacku self-assigned this Jul 19, 2024
@jamacku
Copy link
Member

jamacku commented Jul 19, 2024

This is very weird. @mcattamoredhat, could you please reproduce the issue with debug logging enabled?

And I agree the current log message could be better. I'll try to extend it with more information.

@jamacku
Copy link
Member

jamacku commented Jul 23, 2024

So, this is a limitation of GitHub-hosted runners. From GitHub doc:

Job execution time - Each job in a workflow can run for up to 6 hours of execution time. If a job reaches this limit, the job is terminated and fails to complete.

Also, see this Discussion: https://github.com/orgs/community/discussions/25700#discussioncomment-3248791

@jamacku jamacku added the type: bug Something isn't working label Jul 23, 2024
@jamacku
Copy link
Member

jamacku commented Jul 23, 2024

We can check if the execution time is greater than the timeout input and only then cancel the TF request.

@mcattamoredhat
Copy link
Author

Hi @jamacku, although I've changed to sclorg/testing-farm-as-github-action v3.1.0, I still have this issue in a few tests such as https://github.com/virt-s1/rhel-edge/actions/runs/10553424096 (iot-f39-x86)
Is there something I am missing? May you please provide some guidance? Thanks!

@jamacku
Copy link
Member

jamacku commented Aug 27, 2024

@mcattamoredhat, I may have missed something. I'll have a look. It should work without any additional configuration from your side.

@jamacku jamacku reopened this Aug 27, 2024
@jamacku
Copy link
Member

jamacku commented Aug 30, 2024

The problem might be that the Job run for 5h 59min 56s and then it was killed by runner. But we are expecting 6h.

I'll adjust the value.

@mcattamoredhat
Copy link
Author

Hi @jamacku, our CI has detected some PRs failing due to this issue, despite we already updated our workflows to use sclorg/[email protected]
Examples of this can be found at edge-rhel-95-x86 and iot-rawhide-x86
Is there any chance to check this? Even open again this issue? Am I missing something?

@jamacku
Copy link
Member

jamacku commented Oct 7, 2024

Hmm, there might still be some bug on our side.

@jamacku jamacku reopened this Oct 7, 2024
This was referenced Oct 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug Something isn't working
Projects
None yet
2 participants