-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Red Hat ODF External Storage incompatible with Multus/nmstate when accessing Red Hat Ceph Storage #518
Comments
@SanjalKatiyar Can you pls take a look? |
Hi @yellowpattern, That's why you only see "IBM FlashSystem Storage" in the external dropdown. Before ODF 4.15 we only support single Ceph based system in a cluster, so you cannot create multiple. |
There is a restriction when creating one internal and one external system together. The internal system must be in the |
On a 4.16 cluster, yes, the external ceph thing appears. When I try to run the downloaded script on RHEL 9.4 Ceph cluster, I get this:
Are there any requirements/restrictions on the ceph pool's profile? |
Yes, that's correct - on both the 4.14 & 4.16 clusters there's a ceph instance that uses local storage (to ODF) as the backing store, now I am trying to activate a ceph pool via a ceph cluster (external). |
Using a different script (from https://blog.oddbit.com/post/2021-08-23-external-ocs/#:~:text=Red%20Hat%E2%80%99s%20OpenShift%20Data%20Foundation%20%28formerly%20%E2%80%9COpenShift%20Container,OpenShift%20cluster%20to%20an%20externally%20managed%20Ceph%20cluster.) to generate JSON for external ceph, I get: StorageCluster ocs-external-storagecluster violates policy 299 - "unknown field "apiGroup"" |
Ok, missing steps in pool creation were: And creating a profile for rgw access:
This then lets the downloaded python script from ODF 4.16 run BUT the above warning comes up: StorageCluster ocs-external-storagecluster violates policy 299 - "unknown field "apiGroup"" |
I moved on, create a PVC and pod to consume it, now it is clear something is really missing: failed to provision volume with StorageClass "ocs-external-storagecluster-ceph-rbd": rpc error: code = InvalidArgument desc = failed to fetch monitor list using clusterID (openshift-storage-extended): missing configuration for cluster ID "openshift-storage-extended" Added .data.clusterID to rook-ceph-mon-endpoints in openshift-storage-extended (seemed like the only place it could belong.) |
Does the ODF external storage console work when the VLAN for Ceph is attached via multus/nmstate and isn't reachable via the user's web browser? |
This warning seems harmless to me, a field not supported by the CRD must have been added to the StorageCluster CR, it should get filtered out after the CR's creation.
I would highly recommend to always follow the official ODF docs: https://docs.redhat.com/en/documentation/red_hat_openshift_data_foundation/4.16/html/deploying_openshift_data_foundation_in_external_mode/overview-of-deploying-in-external-mode_rhodf for the deployment and verification purposes. Also, can you raise a support ticket on the RH portal so that the team can help you out with your setup ?? |
cc @parth-gr |
@yellowpattern can you share the JSON output that you get after running the release version python script |
sudo python3 /tmp/ceph-external-cluster-details-exporter.py --rbd-data-pool-name=my-rbd |
All clusters show up like this.
$ oc get StorageSystem -n openshift-storage
Thank you for that link, it is incredibly helpful. I didn't come across it in my googling (which always has an IBM link as #1.) |
@yellowpattern I hope using the official doc link it successfully made a connection, anything else Is missing? |
It may be the coming weekend again before I get back to this. This was a look to see what direction we should go with ODF/ceph. The restrictions on external ceph weren't clear from the UI, so the feedback here has been really good, thanks! I do have a Red Hat ticket for this, they're much slower than the community in responding. On Red Hat's web site I came across this: |
Ok, I've come back to this. |
if I create an external ceph storage system first, I then cannot create a local storage. But have no fear, ODF still lets me connect to a external storage platform - IBM FlashSystem Storage. Looks like IBM has taken full ownership of ODF storage. |
Over the weekend I ran into a few problems using odf & ceph, created some Red Hat bugs to match, which I expect will all be closed within a week because the last of which required reinstalling openshift from scratch to fix so the cluster where it all happened no longer exists. The above issue (cannot create local storage after creating external ceph) was one, no observability of available external Red Hat Ceph storage was another (can you imagine using NFS and being told you could only do "df" on the NFS server?), but the creme de la creme was removing ODF doesn't remove its knowledge of local disks, so a combination of removing nodes and re-installing ODF resulted in ODF thinking I had twice the storage I did. And there's no way to edit or view the nodes/disks it believes exists. Brilliant. Had to re-install OpenShift to properly reset ODF. I tried real hard to find that cached knowledge with a scan of all CRDs (takes a while even on a fresh cluster) but even editing some localdiscovery CRDs didn't fix it. |
Trying anew today: and "ceph df" includes: |
can you try manually, |
Thanks!
ok, I know how to fix that..
After that I was able to generate the JSON and import that into ODF but that results in an error:
It might be best to follow up with Red Hat from here because I need to work out if a Network Attachment Def is required for a pod to be able to access that VLAN because it isn't reachable from the machine net. Is the ODF thing that wants to connect to ceph above running as a daemon set or deployment or...? |
Adding a Network-Attachment-Definition to ocs-operator.v4.17.3-rhodf was required to get the correct template updated to provide access to the Ceph Cluster.
|
Adding a NNCP (to define the VLAN) and then adding a NAD to ocs-operator.v4.17.3-rhodf and rook-ceph-operator.v4.17.3-rhodf was required to get "ocs-external-storagecluster" to show up as "PHASE: Ready". That wasn't a trivial edit (finding the right location to update). Trying to provision a volume for noobaa hangs: Is that an ODF problem? |
Patching csi-cephplugin-provisioner and csi-rbdplugin-provisioner was also required. Although they're deployed by ODF as being owned by rook-ceph-operator (which is owned by rook-ceph-operator.v4.17.3-rhodf, the yaml for them doesn't appear at the top level.
But....
Almost but not quite.... |
The final piece in this puzzle was adding an IP# to the node's bridge for the VLAN. A ran up a sample pod and filled some files with garbage ...
|
Would you like to explain the problem in detail that got fixed,
Do you think this would be a good addon to our documentation |
@parth-gr The headline for the documentation needs to be that RHODF is incompatible with multus/nmstate because the patches required to deployment/csi-cephfsplugin-provisioner and deployment/csi-rbdplugin-provisioner cannot be persisted (or at least I don't know how to make them persistent) across a reboot/restart/redeploy. The problem here is: How do you get RHODF to connect to a Ceph cluster that's connected to a network/vlan and isn't available from the machine network? Maybe there was a checkbox I missed on the odf/external "install" page. Changes required - these two are once off: These need to be made each time the cluster boots or the provisioners are updated/restarted: My NADs all looked like this: On the network side, I did:
Looking back on it, you could just use IPAM for the whole thing and not do static IP# assignment for the nodes. Haven't included IPAM cleanup. |
On the Red Hat Ceph Storage server: Then I could do:
|
Using ODF 4.14.13, the only "external" storage system that gets presented is "IBM FlashSystem Storage". I know IBM bought Redhat, but this even prevents me from using Redhat's ceph solution with OpenShift.
How do I fix this?
The text was updated successfully, but these errors were encountered: