Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Red Hat ODF External Storage incompatible with Multus/nmstate when accessing Red Hat Ceph Storage #518

Open
yellowpattern opened this issue Jan 25, 2025 · 32 comments

Comments

@yellowpattern
Copy link

yellowpattern commented Jan 25, 2025

Using ODF 4.14.13, the only "external" storage system that gets presented is "IBM FlashSystem Storage". I know IBM bought Redhat, but this even prevents me from using Redhat's ceph solution with OpenShift.

Image

How do I fix this?

@yellowpattern yellowpattern changed the title Cannot create any external storage except for IBM FlashStore System Cannot create any external storage except for IBM FlashSystem Storage Jan 25, 2025
@iamniting
Copy link
Member

@SanjalKatiyar Can you pls take a look?

@SanjalKatiyar
Copy link
Contributor

Hi @yellowpattern,
Looks like you already have one Ceph base system deployed (or some misconfiguration). You can cross-verify this by running:
oc get StorageSystem -n openshift-storage & oc get StorageCluster -n openshift-storage.

That's why you only see "IBM FlashSystem Storage" in the external dropdown.

Before ODF 4.15 we only support single Ceph based system in a cluster, so you cannot create multiple.
Only from ODF 4.15 onwards we support multiple (two to be precise) Ceph based systems in a same cluster where you can create one internal and one external systems together.

@iamniting
Copy link
Member

Before ODF 4.15 we only support single Ceph based system in a cluster, so you cannot create multiple. Only from ODF 4.15 onwards we support multiple (two to be precise) Ceph based systems in a same cluster where you can create one internal and one external systems together.

There is a restriction when creating one internal and one external system together. The internal system must be in the openshift-storage namespace, and the external system must be in the openshift-storage-extended namespace.

@yellowpattern
Copy link
Author

yellowpattern commented Jan 27, 2025

On a 4.16 cluster, yes, the external ceph thing appears. When I try to run the downloaded script on RHEL 9.4 Ceph cluster, I get this:

sudo python3 ~/ceph-external-cluster-details-exporter.py --rbd-data-pool-name my-rbd
Traceback (most recent call last):
File "/tmp/ceph-external-cluster-details-exporter.py", line 2089, in <module>
rjObj.main()
File "/tmp/ceph-external-cluster-details-exporter.py", line 2069, in main
generated_output = self.gen_json_out()
File "/tmp/ceph-external-cluster-details-exporter.py", line 1738, in gen_json_out
self._gen_output_map()
File "/tmp/ceph-external-cluster-details-exporter.py", line 1578, in _gen_output_map
self.init_rbd_pool(self._arg_parser.rbd_data_pool_name)
File "/tmp/ceph-external-cluster-details-exporter.py", line 1352, in init_rbd_pool
rbd_inst.pool_init(ioctx, True)
File "rbd.pyx", line 2016, in rbd.RBD.pool_init
rbd.InvalidArgument: [errno 22] RBD invalid argument (error initializing pool)

Are there any requirements/restrictions on the ceph pool's profile?

@yellowpattern
Copy link
Author

yellowpattern commented Jan 27, 2025

Hi @yellowpattern, Looks like you already have one Ceph base system deployed (or some misconfiguration). You can cross-verify this by running: oc get StorageSystem -n openshift-storage & oc get StorageCluster -n openshift-storage.

Yes, that's correct - on both the 4.14 & 4.16 clusters there's a ceph instance that uses local storage (to ODF) as the backing store, now I am trying to activate a ceph pool via a ceph cluster (external).

@yellowpattern
Copy link
Author

Using a different script (from https://blog.oddbit.com/post/2021-08-23-external-ocs/#:~:text=Red%20Hat%E2%80%99s%20OpenShift%20Data%20Foundation%20%28formerly%20%E2%80%9COpenShift%20Container,OpenShift%20cluster%20to%20an%20externally%20managed%20Ceph%20cluster.) to generate JSON for external ceph, I get:

StorageCluster ocs-external-storagecluster violates policy 299 - "unknown field "apiGroup""

@yellowpattern
Copy link
Author

Ok, missing steps in pool creation were:
ceph osd pool set my-rbd allow_ec_overwrites true
rbd pool init -p my-rbd

And creating a profile for rgw access:

ceph auth add client.csi-rgw-provisioner-my-rbd mgr "allow rw" mon "profile rgw" osd "profile rgw pool=my-rbd

This then lets the downloaded python script from ODF 4.16 run BUT the above warning comes up:

StorageCluster ocs-external-storagecluster violates policy 299 - "unknown field "apiGroup""

@yellowpattern
Copy link
Author

yellowpattern commented Jan 27, 2025

I moved on, create a PVC and pod to consume it, now it is clear something is really missing:

failed to provision volume with StorageClass "ocs-external-storagecluster-ceph-rbd": rpc error: code = InvalidArgument desc = failed to fetch monitor list using clusterID (openshift-storage-extended): missing configuration for cluster ID "openshift-storage-extended"

Added .data.clusterID to rook-ceph-mon-endpoints in openshift-storage-extended (seemed like the only place it could belong.)

@yellowpattern
Copy link
Author

Does the ODF external storage console work when the VLAN for Ceph is attached via multus/nmstate and isn't reachable via the user's web browser?

@SanjalKatiyar
Copy link
Contributor

SanjalKatiyar commented Jan 27, 2025

StorageCluster ocs-external-storagecluster violates policy 299 - "unknown field "apiGroup""

This warning seems harmless to me, a field not supported by the CRD must have been added to the StorageCluster CR, it should get filtered out after the CR's creation.

Using a different script (from https://blog.oddbit.com/post/2021-08-23-external-ocs/#:~:text=Red%20Hat%E2%80%99s%20OpenShift%20Data%20Foundation%20%28formerly%20%E2%80%9COpenShift%20Container,OpenShift%20cluster%20to%20an%20externally%20managed%20Ceph%20cluster.) to generate JSON for external ceph, I get:

I would highly recommend to always follow the official ODF docs: https://docs.redhat.com/en/documentation/red_hat_openshift_data_foundation/4.16/html/deploying_openshift_data_foundation_in_external_mode/overview-of-deploying-in-external-mode_rhodf for the deployment and verification purposes.

Also, can you raise a support ticket on the RH portal so that the team can help you out with your setup ??

@SanjalKatiyar
Copy link
Contributor

failed to provision volume with StorageClass "ocs-external-storagecluster-ceph-rbd": rpc error: code = InvalidArgument desc = failed to fetch monitor list using clusterID (openshift-storage-extended): missing configuration for cluster ID "openshift-storage-extended"

cc @parth-gr

@parth-gr
Copy link
Member

@yellowpattern can you share the JSON output that you get after running the release version python script

@yellowpattern
Copy link
Author

sudo python3 /tmp/ceph-external-cluster-details-exporter.py --rbd-data-pool-name=my-rbd
[{"name": "rook-ceph-mon-endpoints", "kind": "ConfigMap", "data": {"data": "mm1-box-1=10.0.36.26:6789", "maxMonId": "0", "mapping": "{}"}}, {"name": "rook-ceph-mon", "kind": "Secret", "data": {"admin-secret": "admin-secret", "fsid": "fd9c1e26-da6e-11ef-8593-3cecef103636", "mon-secret": "mon-secret"}}, {"name": "rook-ceph-operator-creds", "kind": "Secret", "data": {"userID": "client.healthchecker", "userKey": "AQBxc5dntj7pBxAAgGr1eRGdK+853qSEYWpTFA=="}}, {"name": "monitoring-endpoint", "kind": "CephCluster", "data": {"MonitoringEndpoint": "10.0.36.26", "MonitoringPort": "9283"}}, {"name": "rook-csi-rbd-node", "kind": "Secret", "data": {"userID": "csi-rbd-node", "userKey": "AQBxc5dn71w1CBAA+maj2W4uRErZ5O2XRvZjyQ=="}}, {"name": "rook-csi-rbd-provisioner", "kind": "Secret", "data": {"userID": "csi-rbd-provisioner", "userKey": "AQBxc5dnV/d/CBAAo6V822I8Vp5cWhy5v4gljA=="}}, {"name": "rook-ceph-dashboard-link", "kind": "Secret", "data": {"userID": "ceph-dashboard-link", "userKey": "https://10.0.36.26:8443/"}}, {"name": "ceph-rbd", "kind": "StorageClass", "data": {"pool": "my-rbd", "csi.storage.k8s.io/provisioner-secret-name": "rook-csi-rbd-provisioner", "csi.storage.k8s.io/controller-expand-secret-name": "rook-csi-rbd-provisioner", "csi.storage.k8s.io/node-stage-secret-name": "rook-csi-rbd-node"}}]

@yellowpattern
Copy link
Author

All clusters show up like this.

oc get StorageSystem -n openshift-storage

$ oc get StorageSystem -n openshift-storage
NAME STORAGE-SYSTEM-KIND STORAGE-SYSTEM-NAME
ocs-storagecluster-storagesystem storagecluster.ocs.openshift.io/v1 ocs-storagecluster

I would highly recommend to always follow the official ODF docs: https://docs.redhat.com/en/documentation/red_hat_openshift_data_foundation/4.16/html/deploying_openshift_data_foundation_in_external_mode/overview-of-deploying-in-external-mode_rhodf for the deployment and verification purposes.

Thank you for that link, it is incredibly helpful. I didn't come across it in my googling (which always has an IBM link as #1.)

@parth-gr
Copy link
Member

@yellowpattern I hope using the official doc link it successfully made a connection, anything else Is missing?

@yellowpattern
Copy link
Author

It may be the coming weekend again before I get back to this. This was a look to see what direction we should go with ODF/ceph. The restrictions on external ceph weren't clear from the UI, so the feedback here has been really good, thanks! I do have a Red Hat ticket for this, they're much slower than the community in responding. On Red Hat's web site I came across this:
https://access.redhat.com/solutions/7001808
suggesting there was an unfixed bug and "fix" by adjusting YAML somewhere but in an earlier version, leaving me unsure. The main question now is if ODF GR will work with Ceph-External or if ODF GR only works with local-disk-storage.

@yellowpattern
Copy link
Author

Ok, I've come back to this.
I'm unsure if "rbd-metadata-ec-pool-name" is right or if my understanding is correct. In what I've read, the metadata pool must be replicated (and not erasure-coded.) Or maybe the option just has me confused - when rbd-data-pool-name is ec, a different metadata pool that's not ec is required, right?

@yellowpattern
Copy link
Author

It is "connected" (or at least I believe it is) but there's no visibility of the ceph cluster storage:

Image

@yellowpattern
Copy link
Author

Image

@yellowpattern
Copy link
Author

Image

@yellowpattern
Copy link
Author

if I create an external ceph storage system first, I then cannot create a local storage. But have no fear, ODF still lets me connect to a external storage platform - IBM FlashSystem Storage. Looks like IBM has taken full ownership of ODF storage.

@yellowpattern
Copy link
Author

Over the weekend I ran into a few problems using odf & ceph, created some Red Hat bugs to match, which I expect will all be closed within a week because the last of which required reinstalling openshift from scratch to fix so the cluster where it all happened no longer exists. The above issue (cannot create local storage after creating external ceph) was one, no observability of available external Red Hat Ceph storage was another (can you imagine using NFS and being told you could only do "df" on the NFS server?), but the creme de la creme was removing ODF doesn't remove its knowledge of local disks, so a combination of removing nodes and re-installing ODF resulted in ODF thinking I had twice the storage I did. And there's no way to edit or view the nodes/disks it believes exists. Brilliant. Had to re-install OpenShift to properly reset ODF. I tried real hard to find that cached knowledge with a scan of all CRDs (takes a while even on a fresh cluster) but even editing some localdiscovery CRDs didn't fix it.
Summary: I assume that I can connect an external ceph to OpenShift 4.17 because I get a "green tick" in a particular window but that's very small comfort when compared to using disk resources via other means.

@yellowpattern
Copy link
Author

yellowpattern commented Feb 3, 2025

Trying anew today:
python3 cephext.py --rbd-data-pool-name my-rbd --monitoring-endpoint 10.0.1.2,10.0.1.3 --rbd-metadata-ec-pool-name my-rbd-repl --cephfs-data-pool-name my-cephfs-data --cephfs-metadata-pool-name my-cephfs-metadata
Traceback (most recent call last):
File "/tmp/cephext.py", line 2112, in <module>
rjObj.main()
File "/tmp/cephext.py", line 2092, in main
generated_output = self.gen_json_out()
File "/tmp/cephext.py", line 1754, in gen_json_out
self._gen_output_map()
File "/tmp/cephext.py", line 1592, in _gen_output_map
self.init_rbd_pool(self._arg_parser.rbd_data_pool_name)
File "/tmp/cephext.py", line 1345, in init_rbd_pool
rbd_inst.pool_init(ioctx, True)
File "rbd.pyx", line 2016, in rbd.RBD.pool_init

and "ceph df" includes:
my-rbd 17 32 8 KiB 474 12 KiB 0 6.5 TiB
my-rbd-repl 19 32 19 B 5 8 KiB 0 5 TiB
my-cephfs-metadata 26 32 2.4 KiB 22 96 KiB 0 3.3 TiB
my-cephfs-data 27 512 0 B 0 0 B 0 5 TiB

@parth-gr
Copy link
Member

parth-gr commented Feb 3, 2025

can you try manually,
rbd pool init my-rbd and the see the results

@yellowpattern
Copy link
Author

Thanks!

# rbd pool init my-rbd
2025-02-04T10:12:54.622+1100 7fe9a53fd640 -1 librbd::image::ValidatePoolRequest: handle_overwrite_rbd_info: pool missing required overwrite support
rbd: error registered application: (22) Invalid argument

ok, I know how to fix that..

# osd pool set my-cephfs-data allow_ec_overwrites true
set pool 17 allow_ec_overwrites to true
# rbd pool init my-rbd
#

After that I was able to generate the JSON and import that into ODF but that results in an error:

> oc get storagecluster -n openshift-storage
NAME AGE PHASE EXTERNAL CREATED AT VERSION
ocs-external-storagecluster 23s Error true 2025-02-03T23:20:37Z 4.17.3

> oc describe storagecluster -n openshift-storage
...
Last Heartbeat Time: 2025-02-03T23:21:47Z
Last Transition Time: 2025-02-03T23:20:57Z
Message: Error while reconciling: dial tcp 10.0.1.3:9283: i/o timeout
Reason: ReconcileFailed
Status: False

It might be best to follow up with Red Hat from here because I need to work out if a Network Attachment Def is required for a pod to be able to access that VLAN because it isn't reachable from the machine net. Is the ODF thing that wants to connect to ceph above running as a daemon set or deployment or...?

@yellowpattern
Copy link
Author

yellowpattern commented Feb 4, 2025

Adding a Network-Attachment-Definition to ocs-operator.v4.17.3-rhodf was required to get the correct template updated to provide access to the Ceph Cluster.

$ oc describe storagecluster -n openshift-storage
...
Message: CephCluster resource is not reporting status
Reason: CephClusterStatus
Status: False
Type: Available
...
Message: External CephCluster is trying to connect: Attempting to connect to an external Ceph cluster
Reason: ExternalClusterStateConnecting
Status: True
Type: ExternalClusterConnecting

openshift-storage 113s Warning ReconcileFailed cephcluster/ocs-external-storagecluster-cephcluster failed to reconcile CephCluster "openshift-storage/ocs-external-storagecluster-cephcluster". failed to reconcile cluster "ocs-external-storagecluster-cephcluster": failed to configure external ceph cluster: failed to get external ceph mon version: failed to run 'ceph version'. . timed out: exit status 1

@yellowpattern
Copy link
Author

Adding a NNCP (to define the VLAN) and then adding a NAD to ocs-operator.v4.17.3-rhodf and rook-ceph-operator.v4.17.3-rhodf was required to get "ocs-external-storagecluster" to show up as "PHASE: Ready". That wasn't a trivial edit (finding the right location to update).

Trying to provision a volume for noobaa hangs:
openshift-storage 4s Normal ExternalProvisioning persistentvolumeclaim/db-noobaa-db-pg-0 Waiting for a volume to be created either by the external provisioner 'openshift-storage.rbd.csi.ceph.com' or manually by the system administrator. If volume creation is delayed, please verify that the provisioner is running and correctly registered.

Is that an ODF problem?

@yellowpattern
Copy link
Author

yellowpattern commented Feb 4, 2025

Patching csi-cephplugin-provisioner and csi-rbdplugin-provisioner was also required. Although they're deployed by ODF as being owned by rook-ceph-operator (which is owned by rook-ceph-operator.v4.17.3-rhodf, the yaml for them doesn't appear at the top level.

$ oc get pvc -A
NAMESPACE NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS VOLUMEATTRIBUTESCLASS AGE
openshift-storage db-noobaa-db-pg-0 Bound pvc-97c1f9ca-8311-4509-acfc-20ff19f6dd8b 50Gi RWO ocs-external-storagecluster-ceph-rbd <unset> 8m35s

But....

5m10s Normal ExternalProvisioning persistentvolumeclaim/db-noobaa-db-pg-0 Waiting for a volume to be created either by the external provisioner 'openshift-storage.rbd.csi.ceph.com' or manually by the system administrator. If volume creation is delayed, please verify that the provisioner is running and correctly registered.
5m2s Normal ProvisioningSucceeded persistentvolumeclaim/db-noobaa-db-pg-0 Successfully provisioned volume pvc-97c1f9ca-8311-4509-acfc-20ff19f6dd8b
5m2s Normal Provisioning persistentvolumeclaim/db-noobaa-db-pg-0 External provisioner is provisioning volume for claim "openshift-storage/db-noobaa-db-pg-0"
4m47s Normal SuccessfulAttachVolume pod/noobaa-db-pg-0 AttachVolume.Attach succeeded for volume "pvc-97c1f9ca-8311-4509-acfc-20ff19f6dd8b"
2m46s Warning FailedMount pod/noobaa-db-pg-0 MountVolume.MountDevice failed for volume "pvc-97c1f9ca-8311-4509-acfc-20ff19f6dd8b" : rpc error: code = DeadlineExceeded desc = context deadline exceeded
38s Warning FailedMount pod/noobaa-db-pg-0 MountVolume.MountDevice failed for volume "pvc-97c1f9ca-8311-4509-acfc-20ff19f6dd8b" : rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0011-openshift-storage-0000000000000013-77a2913e-fa62-4e94-b8d0-ca428e1bf114 already exists

Almost but not quite....

@yellowpattern
Copy link
Author

The final piece in this puzzle was adding an IP# to the node's bridge for the VLAN.
On a reboot I need to edit deployment/csi-cephfsplugin-provisioner and deployment/csi-rbdplugin-provisioner because I can't see how to patch their spec.template.metadata with the NAD annotation.

A ran up a sample pod and filled some files with garbage ...

--- POOLS ---
POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL
my-rbd 17 32 5.1 GiB 1.84k 7.6 GiB 0.04 6.5 TiB

@parth-gr
Copy link
Member

parth-gr commented Feb 4, 2025

@yellowpattern

Would you like to explain the problem in detail that got fixed,

Adding a NNCP (to define the VLAN) and then adding a NAD to ocs-operator.v4.17.3-rhodf and rook-ceph-operator.v4.17.3-rhodf was required to get "ocs-external-storagecluster" to show up as "PHASE: Ready".

Do you think this would be a good addon to our documentation

@yellowpattern
Copy link
Author

yellowpattern commented Feb 4, 2025

@parth-gr The headline for the documentation needs to be that RHODF is incompatible with multus/nmstate because the patches required to deployment/csi-cephfsplugin-provisioner and deployment/csi-rbdplugin-provisioner cannot be persisted (or at least I don't know how to make them persistent) across a reboot/restart/redeploy.

The problem here is: How do you get RHODF to connect to a Ceph cluster that's connected to a network/vlan and isn't available from the machine network? Maybe there was a checkbox I missed on the odf/external "install" page.

Changes required - these two are once off:
rook-ceph-operator.v4.17.3-rhodf.spec.install.spec.deployments[rook-ceph-operator].spec.template.metadata.annotation - add k8s.v1.cni.cncf.io/networks for your NAD
ocs-operator.v4.17.3-rhodf.spec.install.spec.deployments[ocs-operator].spec.template.metadata.annotation - add k8s.v1.cni.cncf.io/networks for your NAD

These need to be made each time the cluster boots or the provisioners are updated/restarted:
csi-cephfsplugin-provisioner.spec.template.metadata.annotations - add k8s.v1.cni.cncf.io/networks for your NAD
csi-rbdplugin-provisioner.spec.template.metadata.annotations - add k8s.v1.cni.cncf.io/networks for your NAD

My NADs all looked like this:
k8s.v1.cni.cncf.io/networks: |-
[
{
"name":"rhodf-ceph",
"namespace":"openshift-storage"
}
]

On the network side, I did:

  • nncp - VLAN definition (no IP#) on the node
    apiVersion: nmstate.io/v1
    kind: NodeNetworkConfigurationPolicy
    metadata:
    name: rhodf-node-1
    spec:
    desiredState:
    interfaces:
    - description: RHODF VLAN using eno1
    name: eno1.55
    state: up
    type: vlan
    ipv4:
    enabled: false
    ipv6:
    enabled: false
    vlan:
    base-iface: eno1
    id: 55
    nodeSelector:
    node-role.kubernetes.io/worker: ""
  • nncp - bridge attached to the above VLAN with a static IP# for each worker node where ceph can/will run
    apiVersion: nmstate.io/v1
    kind: NodeNetworkConfigurationPolicy
    metadata:
    name: rhodf-bridge
    spec:
    desiredState:
    interfaces:
    - bridge:
    options:
    stp:
    enabled: false
    port:
    - name: eno1.55
    vlan: {}
    ipv4:
    address:
    - ip: 10.0.1.51
    prefix-length: 24
    auto-dns: false
    auto-gateway: false
    auto-routes: false
    dhcp: false
    enabled: true
    ipv6:
    enabled: false
    name: br-rhodf
    state: up
    type: linux-bridge
    nodeSelector:
    kubernetes.io/hostname: <nodename>
  • NAD (in openshift-storage namespace) - CNI config:
    apiVersion: k8s.cni.cncf.io/v1
    kind: NetworkAttachmentDefinition
    metadata:
    name: rhodf-ceph
    namespace: openshift-storage
    spec:
    config: |-
    "cniVersion": "0.4.0",
    "name": "rhodf-ceph",
    "plugins": [
    {
    "type": "ipvlan",
    "master": "br-rhodf",
    "mtu": 1500,
    "linkInContainer": false,
    "ipam":
    {
    "type": "whereabouts",
    "range": "10.0.1.0/24",
    "range_start": "10.0.1.200",
    "range_end": "10.0.1.229"
    }
    }
    ]
    }

Looking back on it, you could just use IPAM for the whole thing and not do static IP# assignment for the nodes. Haven't included IPAM cleanup.

@yellowpattern
Copy link
Author

On the Red Hat Ceph Storage server:
ceph osd pool create my-cephfs-data erasure
ceph osd pool set my-cephfs-data allow_ec_overwrites true
ceph osd pool set my-cephfs-data bulk true
ceph osd pool create my-cephfs-metadata
ceph fs new my-cephfs my-cephfs-metadata my-cephfs-data --force
ceph osd pool create my-rbd erasure
ceph osd pool application enable my-rbd rbd
ceph rbd pool init my-rbd
ceph orch apply mds my-cephfs

Then I could do:

sudo python ceph-external-cluster-details-exporter.py y --rbd-data-pool-name my-rbd --monitoring-endpoint 10.0.1.2,10.0.1.3 --rbd-metadata-ec-pool-name my-rbd-repl --cephfs-data-pool-name my-cephfs-data --cephfs-metadata-pool-name my-cephfs-metadata

@yellowpattern yellowpattern changed the title Cannot create any external storage except for IBM FlashSystem Storage Red Hat ODF external storage incompatible with Multus/nmstate Feb 4, 2025
@yellowpattern yellowpattern changed the title Red Hat ODF external storage incompatible with Multus/nmstate Red Hat ODF External Storage incompatible with Multus/nmstate when accessing Red Hat Ceph Storage Feb 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants