Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spacecmd doesn't work after migration to Uyuni container migration #9339

Open
ppanon2022 opened this issue Oct 9, 2024 · 9 comments
Open
Labels
bug Something isn't working P5

Comments

@ppanon2022
Copy link

ppanon2022 commented Oct 9, 2024

Problem description

I don't seem to be able to use spacecmd from within the container context with a config file. After running mgrctl term,

uyuni-server:~ # spacecmd
Welcome to spacecmd, a command-line interface to Spacewalk.

Type: 'help' for a list of commands
      'help <cmd>' for command-specific help
      'quit' to quit


ERROR: Failed to connect to http://<server.fqdn>/rpc/api

I also tried to use localhost and uyuni-server.mgr.internal as the server value, but neither worked.
I do see some SELinux errors but they are for SEL context re-labeling, which seems to try to happen as a result of the spacecmd. It's possible it's misreporting the error. With no tcpdump packet capture available, it's not possible to be sure whether it's actually sending a packet or if it's very coarse-grained exception handling. It does take a long time to return the error.

/var/log/audit # audit2allow -a


#============= container_init_t ==============

#!!!! This avc is a constraint violation.  You would need to modify the attributes of either the source or target types to allow this access.
#Constraint rule:
#       constrain dir { create relabelfrom relabelto } ((u1 == u2 -Fail-)  or (t1 == can_change_object_identity -Fail-) ); Constraint DENIED

#       Possible cause is the source user (system_u) and target user (unconfined_u) are different.
#       Possible cause is the source level (s0:c303,c621) and target level (s0) are different.
allow container_init_t container_file_t:dir relabelfrom;

#!!!! This avc is a constraint violation.  You would need to modify the attributes of either the source or target types to allow this access.
#Constraint rule:
#       constrain file { create relabelfrom relabelto } ((u1 == u2 -Fail-)  or (t1 == can_change_object_identity -Fail-) ); Constraint DENIED

#       Possible cause is the source user (system_u) and target user (unconfined_u) are different.
#       Possible cause is the source level (s0:c303,c621) and target level (s0) are different.

While I could add a policy to allow the relabeling, that seems like it would significantly reduce the security of the container with no assurance it's actually the root cause.

Steps to reproduce

  1. Migrate Uyuni installation to container
  2. Recreate spacecmd config file in /root/.spacecmd/config, including server, username, password, and nossl key-value pairs
  3. Run spacecmd

This fails to connect with nossl set to either true or false
...

Uyuni version

Running 2024.08 container

Uyuni proxy version (if used)

No response

Useful logs

See description

Additional information

Also see my more detailed testing in these replies to the migration Github discussion

@ppanon2022 ppanon2022 added bug Something isn't working P5 labels Oct 9, 2024
@ppanon2022
Copy link
Author

ppanon2022 commented Oct 10, 2024

Attempting to telnet to port 80 on the host FQDN from within the container just hangs. Perhaps some sort of firewall rule, though there seems to be none on the host.

Attempting to use localhost to bypass any firewalling shows a redirect to uyuni-server.mgr.internal, but then trying to connect to uyuni-server.mgr.internal and do a get /pub shows the exact same 301 reply message.

uyuni-server:~ # telnet localhost 80
Trying ::1...
Connected to localhost.
Escape character is '^]'.
GET /pub
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>301 Moved Permanently</title>
</head><body>
<h1>Moved Permanently</h1>
<p>The document has moved <a href="http://uyuni-server.mgr.internal/pub/">here</a>.</p>
</body></html>
Connection closed by foreign host.

@agraul
Copy link
Member

agraul commented Oct 10, 2024

It's not a firewall, it's the so-called "hairpin problem". By using the FQDN of the server from within the container, the network packets go container -> host -> container. This back-and-forth is the problem.

Inside the container localhost should be used instead of the FQDN.

@ppanon2022
Copy link
Author

ppanon2022 commented Oct 10, 2024

localhost doesn't work for me either. You can see the results with that server name (as well as trying uyuni-server.mgr.internal which probably shouldn't do a container/host hairpin) and using telnet to test the HTTP response for more detail in these replies to the migration Github discussion. I get a 404 error if I try to use curl to access the URL, and yet the Apache config files look the same as for the other Uyuni server that does not have the same problem.

@ppanon2022
Copy link
Author

Here's a thought that occurred to me and perhaps you can confirm if it could be an issue. We have two Uyuni servers. On one of them I set up sssd AD integration on the host Linux, and that's the one where connecting to localhost/loopback URLs from the container context fails. On the other system, the container localhost connections work, but trying to run the host sssd AD connection fails. Could there be a conflict somehow, where maybe sssd on the first server host is locking up some network resources and prevented the container from deploying properly somehow, and where the container being deployed correctly in the second case is somehow preventing sssd from running in that host?

If so, would disabling/purging sssd on the first host prior to upgrading the container to 2024.10 allow podman to fix the localhost network issue when deploying the update? What sort of interaction would be happening and is a container update close enough to a deployment that it could fix whatever may be broken for the localhost connections with the Uyuni 2024.08 server container deployment?

@ppanon2022
Copy link
Author

ppanon2022 commented Oct 24, 2024

Tried to upgrade to 2024.10 after removing the sssd packages from the host and got this error

# mgradm upgrade podman
12:04PM INF Welcome to mgradm
12:04PM INF Executing command: podman
12:04PM INF Computed image name is registry.opensuse.org/uyuni/server:latest
12:04PM INF Pull Policy is always. Presence of RPM image will be checked and if it's not present it will be pulled from registry
12:04PM INF Cannot find RPM image for registry.opensuse.org/uyuni/server:latest
12:04PM INF Running podman pull registry.opensuse.org/uyuni/server:latest
Trying to pull registry.opensuse.org/uyuni/server:latest...
Getting image source signatures
Copying blob a344d7b096ee done   |
Copying blob 642e3fc0d4ca done   |
Copying blob ba4705fe3f3b done   |
Copying config d78bb09cff done   |
Writing manifest to image destination
d78bb09cff782bfdae5724c727fac4393f5560b21cc437a26c30357cc5fb3e03
Error: cannot inspect podman values: cannot inspect data: cannot read config: While parsing config: line `carmd-nv-uyuni1.sierrawireless.local` doesn't match format

What config file would that come from and how could it not be correct? It seems like a pretty standard/legal hostname format.

@digdilem-work
Copy link

This may be a fix: #9348 (comment)

@aric89
Copy link

aric89 commented Oct 25, 2024

I think this is related to my issue as well. It seems to me I'm getting a Cobbler config issue which isn't allowing taskomatic to get a CobblerToken.

This is from the rhn_taskomatic_daemon.log

2024-10-25 17:00:00,166 [DefaultQuartzScheduler_Worker-9] ERROR com.redhat.rhn.taskomatic.task.CobblerSyncTask - RuntimeExceptionError trying to sync to cobbler: We had an error trying to login.
com.redhat.rhn.manager.kickstart.cobbler.NoCobblerTokenException: We had an error trying to login.
at com.redhat.rhn.manager.kickstart.cobbler.CobblerLoginCommand.login(CobblerLoginCommand.java:57) ~[rhn.jar:?]
at com.redhat.rhn.frontend.integration.IntegrationService.authorize(IntegrationService.java:115) ~[rhn.jar:?]
at com.redhat.rhn.frontend.integration.IntegrationService.getAuthToken(IntegrationService.java:69) ~[rhn.jar:?]
at com.redhat.rhn.manager.kickstart.cobbler.CobblerCommand.(CobblerCommand.java:61) ~[rhn.jar:?]
at com.redhat.rhn.manager.kickstart.cobbler.CobblerCommand.(CobblerCommand.java:82) ~[rhn.jar:?]
at com.redhat.rhn.manager.kickstart.cobbler.CobblerDistroSyncCommand.(CobblerDistroSyncCommand.java:48) ~[rhn.jar:?]
at com.redhat.rhn.taskomatic.task.CobblerSyncTask.execute(CobblerSyncTask.java:85) ~[rhn.jar:?]
at com.redhat.rhn.taskomatic.task.RhnJavaJob.execute(RhnJavaJob.java:56) ~[rhn.jar:?]
at com.redhat.rhn.taskomatic.TaskoJob.doExecute(TaskoJob.java:240) ~[rhn.jar:?]
at com.redhat.rhn.taskomatic.TaskoJob.runTemplate(TaskoJob.java:193) ~[rhn.jar:?]
at com.redhat.rhn.taskomatic.TaskoJob.execute(TaskoJob.java:145) ~[rhn.jar:?]
at org.quartz.core.JobRunShell.run(JobRunShell.java:202) ~[quartz-2.3.0.jar:?]
at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573) ~[quartz-2.3.0.jar:?]
Caused by: redstone.xmlrpc.XmlRpcFault: <class 'cobbler.cexceptions.CX'>:'login failed (taskomatic_user)'
at redstone.xmlrpc.XmlRpcClient.handleResponse(XmlRpcClient.java:444) ~[redstone-xmlrpc-client-1.1_20071120.jar:?]
at redstone.xmlrpc.XmlRpcClient.endCall(XmlRpcClient.java:376) ~[redstone-xmlrpc-client-1.1_20071120.jar:?]
at redstone.xmlrpc.XmlRpcClient.invoke(XmlRpcClient.java:165) ~[redstone-xmlrpc-client-1.1_20071120.jar:?]
at com.redhat.rhn.manager.kickstart.cobbler.CobblerXMLRPCHelper.invokeMethod(CobblerXMLRPCHelper.java:70) ~[rhn.jar:?]
at com.redhat.rhn.manager.kickstart.cobbler.CobblerLoginCommand.login(CobblerLoginCommand.java:52) ~[rhn.jar:?]
... 12 more
2024-10-25 17:00:00,166 [DefaultQuartzScheduler_Worker-9] ERROR com.redhat.rhn.taskomatic.task.CobblerSyncTask - re-throwing exception since we havent yet.
2024-10-25 17:00:00,167 [DefaultQuartzScheduler_Worker-9] ERROR com.redhat.rhn.taskomatic.task.CobblerSyncTask - Executing a task threw an exception: com.redhat.rhn.manager.kickstart.cobbler.NoCobblerTokenException
com.redhat.rhn.manager.kickstart.cobbler.NoCobblerTokenException: We had an error trying to login.
at com.redhat.rhn.manager.kickstart.cobbler.CobblerLoginCommand.login(CobblerLoginCommand.java:57) ~[rhn.jar:?]
at com.redhat.rhn.frontend.integration.IntegrationService.authorize(IntegrationService.java:115) ~[rhn.jar:?]
at com.redhat.rhn.frontend.integration.IntegrationService.getAuthToken(IntegrationService.java:69) ~[rhn.jar:?]
at com.redhat.rhn.manager.kickstart.cobbler.CobblerCommand.(CobblerCommand.java:61) ~[rhn.jar:?]
at com.redhat.rhn.manager.kickstart.cobbler.CobblerCommand.(CobblerCommand.java:82) ~[rhn.jar:?]
at com.redhat.rhn.manager.kickstart.cobbler.CobblerDistroSyncCommand.(CobblerDistroSyncCommand.java:48) ~[rhn.jar:?]
at com.redhat.rhn.taskomatic.task.CobblerSyncTask.execute(CobblerSyncTask.java:85) ~[rhn.jar:?]
at com.redhat.rhn.taskomatic.task.RhnJavaJob.execute(RhnJavaJob.java:56) ~[rhn.jar:?]
at com.redhat.rhn.taskomatic.TaskoJob.doExecute(TaskoJob.java:240) ~[rhn.jar:?]
at com.redhat.rhn.taskomatic.TaskoJob.runTemplate(TaskoJob.java:193) ~[rhn.jar:?]
at com.redhat.rhn.taskomatic.TaskoJob.execute(TaskoJob.java:145) ~[rhn.jar:?]
at org.quartz.core.JobRunShell.run(JobRunShell.java:202) ~[quartz-2.3.0.jar:?]
at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573) ~[quartz-2.3.0.jar:?]
Caused by: redstone.xmlrpc.XmlRpcFault: <class 'cobbler.cexceptions.CX'>:'login failed (taskomatic_user)'
at redstone.xmlrpc.XmlRpcClient.handleResponse(XmlRpcClient.java:444) ~[redstone-xmlrpc-client-1.1_20071120.jar:?]
at redstone.xmlrpc.XmlRpcClient.endCall(XmlRpcClient.java:376) ~[redstone-xmlrpc-client-1.1_20071120.jar:?]
at redstone.xmlrpc.XmlRpcClient.invoke(XmlRpcClient.java:165) ~[redstone-xmlrpc-client-1.1_20071120.jar:?]
at com.redhat.rhn.manager.kickstart.cobbler.CobblerXMLRPCHelper.invokeMethod(CobblerXMLRPCHelper.java:70) ~[rhn.jar:?]
at com.redhat.rhn.manager.kickstart.cobbler.CobblerLoginCommand.login(CobblerLoginCommand.java:52) ~[rhn.jar:?]
... 12 more

@ppanon2022
Copy link
Author

ppanon2022 commented Oct 26, 2024

This may be a fix: #9348 (comment)

That was certainly a problem. Thank you very much for the pointer. After deleting the duplicate java.hostname and re-running the mgradm upgrade podman, I got much farther. I also got a lot of collation mismatch errors on the database schema portion of the upgrade.

2024-10-25 17:02:39.795 PDT   [50]LOG:  redirecting log output to logging collector process
2024-10-25 17:02:39.795 PDT   [50]HINT:  Future log output will appear in directory "log".
Schema update...
WARNING:  database "uyuni" has a collation version mismatch
DETAIL:  The database was created using collation version 2.31, but the operating system provides version 2.38.
HINT:  Rebuild all objects in this database that use the default collation and run ALTER DATABASE uyuni REFRESH COLLATION VERSION, or build PostgreSQL with the right library version.
report_db_host = localhost
WARNING:  database "reportdb" has a collation version mismatch
DETAIL:  The database was created using collation version 2.31, but the operating system provides version 2.38.
HINT:  Rebuild all objects in this database that use the default collation and run ALTER DATABASE reportdb REFRESH COLLATION VERSION, or build PostgreSQL with the right library version.
WARNING:  database "uyuni" has a collation version mismatch
...
INSERT 0 0
Stopping Postgresql...

I ran
su postgresql
psql uyuni
REINDEX DATABASE uyuni;
ALTER DATABASE uyuni REFRESH COLLATION VERSION;
\c reportdb
REINDEX DATABASE reportdb;
ALTER DATABASE reportdb REFRESH COLLATION VERSION;

which should address that.
and then exited and reconnected to the reportdb database to do the same to that database.

Unfortunately, even after all that, I still have the issue that localhost loopbacks calls don't all work properly.

@ppanon2022
Copy link
Author

ppanon2022 commented Oct 26, 2024

I think this is related to my issue as well. It seems to me I'm getting a Cobbler config issue which isn't allowing taskomatic to get a CobblerToken.

I agree, We're also seeing that, with the side effect that deleting and adding/registering systems doesn't work because the cobbler calls made by those processes fail, and abort those functions. Therefore this is a pretty serious issue because it breaks important functionality. Just running spacecmd on a different server doesn't fix the cobbler issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working P5
Projects
None yet
Development

No branches or pull requests

4 participants