-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
provider: persistent storage reporting should accurately reflect available Ceph space #146
Comments
Hurricane providerThis is the also reason Hurricane provider reports Akash-Provider currently reports
Ceph config - 2 replicas:
PVC
The provider should have calculated its available persistent storage using
useful ceph commands
Europlots providerAkash Provider calculates the available persistent storage as However, in fact the provider should have
Europlots is using Ceph reports Taking the no. of replicas into account:
UPDATE: Here is the actual used space by the PVC on Europlots:
PROVISIONED: That explains the discrepancy between the actually available disk space
|
Yet another observation from H100 OblivusThis is mainly just for the record so we have more raw data to work with. H100 Oblivus reports a negative of
Total Provisioned: Yet provider reports a negative of
|
Noticed this a while back and have been tracking it... then the capacity shown went off the charts.
another day
this was when one of the ceph OSD nodes was down.
afterwards when the node was back online, it went back to the former.
not sure what it should be reporting exactly... but there are currently 4200 GiB allocated for each ceph node, minus mons and whatever else ceph stuff. would be interesting to know, if the reported minus capacity from H100 Oblivius, affects leases from console. let me know, if there is anything i can help with for troubleshooting this issue. |
The current reporting of persistent storage available space by the provider, based on Ceph's
MAX AVAIL
, is not accurate.This is due to Ceph's
MAX AVAIL
being a dynamic value that representsMAX - USED
, and it decreases as storage is used. Consequently, this results in the provider sometimes reporting less available space than actually exists.A key point of confusion arises with Kubernetes' PV (Persistent Volume) system. In Kubernetes, when a PV or PVC (Persistent Volume Claim) is created, it doesn't immediately reserve physical space in Ceph. Therefore, Ceph's
MAX AVAIL
doesn't change upon the creation of these volumes, leading to a discrepancy. It's only when data is actually written to these volumes that Ceph'sMAX AVAIL
decreases accordingly.To provide a more accurate view of the available space, the provider should modify its display metrics.
Instead of relying on Ceph's
MAX AVAIL
, it should calculate the actual available space as[Total MAX space of Ceph] - [Reserved space in K8s (PV/PVC)]
.Here,
Total MAX space of Ceph
should be considered as the entire storage capacity of the Ceph cluster without deducting the Ceph'sUSED
amount (as what Ceph'sMAX AVAIL
does now) or the space reserved by Kubernetes PV/PVC.This approach will give a more realistic representation of the available storage, accounting for the Kubernetes-reserved space.
NOTE: Ceph's
USED
is theSTORED x No_Replicas
in Ceph, which means the available persistent storage can easily go negative as soon as more than half of space gets written to the persistent storage (with two replicas), or a quarter of that (with three replicas). See the example case from Hurricane provider is below (two replicas).Tested Provider / Inventory Operator Versions
0.4.6
0.4.7
0.4.8-rc0
Scenario Illustration
10Gi
Persistent StorageMAX AVAIL
from Ceph;10Gi
but quickly reverted (during bid/accepting bid/sending-manifest; so I presume some inner akash-provider mechanics)ceph df
also reportsMAX AVAIL
=>30Gi
.9Gi
of Data to PV9Gi
of DataMAX AVAIL
asMAX - USED
.MAX AVAIL
aligned with Ceph's calculationMAX - USED
, i.e.(21 Gi - 9 Gi = 12 Gi).
The text was updated successfully, but these errors were encountered: