Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Live-ISO test failing #2110

Open
lentzi90 opened this issue Dec 9, 2024 · 24 comments
Open

Live-ISO test failing #2110

lentzi90 opened this issue Dec 9, 2024 · 24 comments
Assignees
Labels
kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. triage/accepted Indicates an issue is ready to be actively worked on.
Milestone

Comments

@lentzi90
Copy link
Member

lentzi90 commented Dec 9, 2024

Which jobs are failing?

Periodic E2E: https://github.com/metal3-io/baremetal-operator/actions/workflows/e2e-test-periodic-main.yml

Which tests are failing?

live-iso: https://github.com/metal3-io/baremetal-operator/actions/runs/12227994617/job/34105542268

Since when has it been failing?

Fist failure was 4th December

Jenkins link

No response

Reason for failure (if possible)

The BMH cannot be provisioned because the "image is not valid for use".

From Ironic logs:

2024-12-05 14:23:05.425 1 DEBUG ironic.common.images [None req-f1216348-13a7-4a46-bfcb-8de7d88b3fb1 - - - - - -] Image http://192.168.222.1/sysrescue-out.iso downloaded in 2.58 seconds. fetch_into /usr/lib/python3.9/site-packages/ironic/common/images.py:386
2024-12-05 14:23:05.430 1 DEBUG oslo_utils.imageutils.format_inspector [None req-f1216348-13a7-4a46-bfcb-8de7d88b3fb1 - - - - - -] Format inspector for vmdk does not match, excluding from consideration (Signature KDMV not found: b'3\xed\x90\x90') _process_chunk /usr/lib/python3.9/site-packages/oslo_utils/imageutils/format_inspector.py:1365
2024-12-05 14:23:05.435 1 DEBUG oslo_utils.imageutils.format_inspector [None req-f1216348-13a7-4a46-bfcb-8de7d88b3fb1 - - - - - -] Format inspector for vhdx does not match, excluding from consideration (Region signature not found at 30000) _process_chunk /usr/lib/python3.9/site-packages/oslo_utils/imageutils/format_inspector.py:1365
2024-12-05 14:23:05.437 1 ERROR ironic.common.images [None req-f1216348-13a7-4a46-bfcb-8de7d88b3fb1 - - - - - -] Security: The requested user image for the deployment node image cache failed to be able to be parsed by the image format checker: Multiple formats detected: iso,gpt: oslo_utils.imageutils.format_inspector.ImageFormatError: Multiple formats detected: iso,gpt
2024-12-05 14:23:05.512 1 ERROR ironic.conductor.utils [None req-f1216348-13a7-4a46-bfcb-8de7d88b3fb1 - - - - - -] Node f9ba65c4-987a-42b7-8965-1787c849a3f5 failed deploy step {'step': 'deploy', 'priority': 100, 'argsinfo': None, 'interface': 'deploy'}: The requested image is not valid for use.: ironic.common.exception.InvalidImage: The requested image is not valid for use.

Anything else we need to know?

We thought first that the issue was that we did not specify the image hash, but this has been ruled out in #2103.

Label(s) to be applied

/kind failing-test
One or more /area label. See https://github.com/metal3-io/baremetal-operator/labels for the list of labels.

@metal3-io-bot metal3-io-bot added kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. needs-triage Indicates an issue lacks a `triage/foo` label and requires one. labels Dec 9, 2024
@lentzi90
Copy link
Member Author

lentzi90 commented Dec 9, 2024

/triage accepted

@metal3-io-bot metal3-io-bot added triage/accepted Indicates an issue is ready to be actively worked on. and removed needs-triage Indicates an issue lacks a `triage/foo` label and requires one. labels Dec 9, 2024
@tuminoid
Copy link
Member

tuminoid commented Dec 9, 2024

Added to CI tracker.

@tuminoid
Copy link
Member

tuminoid commented Dec 9, 2024

IIRC @Rozzii said this coincides with new ironic build on the same date.

@tuminoid tuminoid added this to the BMO - v0.9 milestone Dec 9, 2024
@lentzi90
Copy link
Member Author

lentzi90 commented Dec 9, 2024

I wonder if it could be related to this: https://bugs.launchpad.net/nova/+bug/2091114
We did a temporary workaround for it in CAPO.

@Rozzii
Copy link
Member

Rozzii commented Dec 9, 2024

I was expecting this originally: https://opendev.org/openstack/ironic/commit/669304bc0c6b2762c872b71297480c4b4ffdb554

@Rozzii
Copy link
Member

Rozzii commented Dec 9, 2024

Based on the quay artifacts, I would expect the root cause landed in ironic between Nov 6 and Dec 3.

@lentzi90
Copy link
Member Author

lentzi90 commented Dec 9, 2024

I think you are correct that this came with oslo.utils.
The specific commit that introduced it in oslo.utils is here: openstack/oslo.utils@91af49b
Then there was some related changes that looks like they would be relevant to the live-iso use-case: openstack/oslo.utils@3d4ae16
I haven't figured out yet how we are supposed to "allow" multiple formats.

@Rozzii
Copy link
Member

Rozzii commented Dec 9, 2024

I think in the test system we supposed to have the safety check diasbled as we don't really care, but I have thought that enabling all the formats is the default.

@tuminoid
Copy link
Member

tuminoid commented Dec 9, 2024

BMO 0.8 virtualmedia works, where ironic is pinned to 26.0: #2111

@Rozzii
Copy link
Member

Rozzii commented Dec 12, 2024

I have created a bug report on Ironic side and continued working on this: https://bugs.launchpad.net/ironic/+bug/2091611

We could discuss reverting the pinning #2112 as there is a possibility to turn off the feature that initiates the problematic image format inspection, so that we could continue the testing of other Ironic features.

@tuminoid
Copy link
Member

This is marked for 0.9. We have 2 rounds of workarounds implemented now, but the actual fix is pending. Should we close this as implemented in 0.9 and create another for 0.10 for the proper fix?

@lentzi90
Copy link
Member Author

Sure we can do that or just change the milestone. Doesn't matter to me

@tuminoid
Copy link
Member

It seems the revert is still failing, meaning main/0.9 is still broken. I guess whatever fix is going to be needs to be in 0.9 later on so let's actually keep 0.9 here.

@Rozzii
Copy link
Member

Rozzii commented Jan 8, 2025

/assign @Rozzii

@Rozzii
Copy link
Member

Rozzii commented Jan 9, 2025

Eventually in the Ironic community we have landed on this fix:
https://review.opendev.org/c/openstack/ironic/+/938363

@iurygregory
Copy link
Member

Since the workaround in ironic is merged, should we https://review.opendev.org/c/openstack/ironic/+/938363, should we un-pin the ironic image? we need for #2229

@lentzi90
Copy link
Member Author

Yes we should, but the tests are failing on the revert. I am debugging it at the moment

@dtantsur
Copy link
Member

@lentzi90 it's not the same test that is failing, correct? I see the live ISO one passing, what fails is [It] provisions a BMH, applies detached and status annotations, then deprovisions.

@dtantsur
Copy link
Member

2025-02-24 08:38:45.115 1 DEBUG ironic.drivers.modules.agent_base [-] deploy command status for node 155e7ecd-923d-4355-9456-2a398fddc33f on step {'step': 'write_image', 'priority': 80, 'argsinfo': None, 'interface': 'deploy'}: {'id': 'c6b68d87-b5cf-431e-bfe2-661eeb74b45f', 'command_name': 'execute_deploy_step', 'command_status': 'FAILED', 'command_error': {'type': 'CommandExecutionError', 'code': 500, 'message': 'Command execution failed', 'details': 'Unexpected error while running command.\nCommand: /opt/ironic-python-agent/bin/python3 -m oslo_concurrency.prlimit --as=2147483648 -- [\'env\', \'LC_ALL=C\', \'LANG=C\', \'qemu-img\', \'info\', \'/tmp/cirros-0.6.2-x86_64-disk.img\', \'--output=json\']\nExit code: 1\nStdout: \'\'\nStderr: "python3 -m oslo_concurrency.prlimit: failed to execute [\'env\', \'LC_ALL=C\', \'LANG=C\', \'qemu-img\', \'info\', \'/tmp/cirros-0.6.2-x86_64-disk.img\', \'--output=json\']: [Errno 2] No such file or directory\\n"'}, 'command_result': None} process_next_step /usr/lib/python3.9/site-packages/ironic/drivers/modules/agent_base.py:1094

This is Ironic logs but the error is from IPA, interesting.

@dtantsur
Copy link
Member

dtantsur commented Feb 25, 2025

Ramdisk log collection seems broken in that run :(

sed: -e expression #1, char 266: unknown option to `s'

EDIT: metal3-io/ironic-image#630

@dtantsur
Copy link
Member

@lentzi90 I think I tracked this down to a bug in IPA: https://review.opendev.org/c/openstack/ironic-python-agent/+/942690

@lentzi90
Copy link
Member Author

Thank you @dtantsur ! This has been driving me crazy!

@Rozzii
Copy link
Member

Rozzii commented Feb 26, 2025

Thank you @dtantsur !

@lentzi90
Copy link
Member Author

lentzi90 commented Feb 26, 2025

I think I found another issue. 😦

While Ironic was pinned, we switched to kind because minikube was causing flakes. That obviously means we have not tested this with the latest ironic-image and it seems it does not work.

The difference is how Ironic is exposed to the BMHs. With minikube, we were running everything in the same network, but with kind, we have Ironic in a separate network. This means that we rely on the IRONIC_EXTERNAL_CALLBACK_URL for the callback and something about this has changed.
I think the issue is that we pass the base URL here. That would make IPA use the internal address instead of the external. I will try to test and fix this today.

Edit: Please disregard for now. I think this was my broken environment. It was not properly cleaned up before I tried the newer image.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. triage/accepted Indicates an issue is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

6 participants