-
Notifications
You must be signed in to change notification settings - Fork 287
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check whether build passes. Do not reuse builds across tests. #445
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, include the PR description in the commit as well, github might not be here forever (if this reads somebody from GitHub, be aware that I wish it will be here forever, but you know... Microsoft... well, anyway...)
For the change itself, I do understand the kill/stop
part, but I don't understand why you need to re-define evaluate_build_result
function. I understand what it does, but when do we need to remove a container image when the build failed? Then nothing should be built, no? Or when to expect the build fails?
[test] |
That is related to this: sclorg/s2i-ruby-container#556 The problem with The thing that I am trying to achieve with that It would fail early and not run any tests on that image, as it doesn't make sense to even try and run the tests on an image that failed to build. Otherwise, we would run the tests and fail on podman trying to start an image that doesn't exist for every test that depends on that image. It would work still, but I don't think it's a clean solution. Also, it helps with debugging as you will immediately see that the build failed, not the tests itself. Both changes checking for build to pass and cleaning the images between the tests are redundant with each other. One of them would fix the underlying issue, but in my opinion each has a slightly different reason. Checking for build to pass allows us to be more verbose for debugging and stopping early, resulting in faster tests. Cleaning images is removing the possibility to start an image that we didn't want to test completely. Problem here is that I missed the part that should actually skip the test if the build failed, I will add that. |
@SlouchyButton thanks for explaining, I think at least part of this explanation would be also worth keeping in the commit messages. More thoughts:
|
Love the idea, would definitely help with debugging as we would be able to pinpoint exactly where the build failed. It would also allow us to continue with the 'fail early, fail gracefully' approach. The thing is that since this is a test-lib function, we will have to go through every single container and check all uses of that function before changing it, as this change seems like it could be breaking. Another thing I am a bit scared about is that we have already pushed Bash far beyond what it can handle in some places (sclorg/container-common-scripts#380) and we are just adding more and more stuff onto it. This specifically shouldn't really be that complicated, but since we are changing test-lib, there is always a higher chance of breaking stuff, especially since this change would be modifying how that function works substantially. I would still continue with this idea honestly, I like it, it will be a helpful feature to have. We should create a new issue for https://github.com/sclorg/container-common-scripts |
Currently, this PR removes the images after the test, so after merging that will be happening.
From how I understand that improvement we would basically make |
I don't think it would need to be breaking. The function |
The thing I had in mind is whether we could be modifying the final result with such subtest results everywhere. After looking at this again I realized the |
Cool. One more thing that bothers me is that the RHEL8 v18 tests fail repeatadly, while not seeing any nodejs failures in the nightly -- so I tried a different PR to see whether the failure is connected with this change or not: #446 |
Or is it possible that it failed silently before this fix? that could explain it... |
I don't think so. We added tests for builds failing, but the build passed. The only thing that failed was
And it seems it failed due to the last command in the test function is The test function is interesting tho and I am not sure if it even does what it is supposed to do, since this code:
AFAIK starts the test app, then waits for CID, but then when Am I missing something here? @hhorak Furthermore, the test on rhel8 for version 18 passes locally for me. |
[test] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pull request LGTM, but some nitpicks.
CentOS Stream 22 issue #452 |
Pull Request validationFailed🔴 Failed or pending statuses - Success🟢 Review - Reviewed by a member |
[test] |
Testing Farm results
|
Rebase it please against master. |
84aac2c
to
b37e426
Compare
@phracek done |
[test] |
EDIT: mb, didn't notice it was already mentioned and linked to the issue, so disregard this In
It seems that the sample app uses OpenSSL engine library, but that one got deprecated/removed in F41. I'd guess that C10s already embraced this change. Tests pass for Fedora as we are using F40 image that is not affected. Ref: https://fedoraproject.org/wiki/Changes/OpensslDeprecateEngine |
C10S issue is tracked here: #452 |
Fixes: #444
This contains the same change as in ruby container PR sclorg/s2i-ruby-container#556.
The other change is related to ruby container issue sclorg/s2i-ruby-container#557.
Ruby container issue has a more thorough description on what's specifically the problem.
Reason why
docker kill
has to be changed todocker stop
intest/test-lib-nodejs.sh
L: 176 is thatdocker kill
doesn't wait for the container to stop. It forcefully kills it and continues. This then creates a race condition with the clearing function. When thedocker rmi -f
is called, the previous kill call hasn't stopped the container fully yet, so thedocker rmi -f
will see the existing container. It will try to delete it, because we are forcing image deletion, but before it goes through the internal process the container gets stopped and deleted by the previousdocker kill
call.When the
docker rmi -f
actually reaches the part where it wants to delete the container, it doesn't exist anymore and fails.docker stop
on the other hand won't work asynchronously and will try to gracefully stop the container and in case it doesn't respond it will forcefully kill and continue only after the successful kill happened.