Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

csi-wrapper: azuredisk-csi-driver support #2122

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

daniel-weisse
Copy link
Contributor

As a follow up to #2108 and #2106, this PR adds the required changes to enable the csi-wrapper for the azuredisk-csi-driver.
Changes are required in two places:

  • When normalizing the volume ID to use it as the name of the peerpod volume, we must account for the volume ID being an Azure resource ID, which contains / characters, which are illegal for K8s resource names. This is fixed by using the base name of the resource ID instead.
  • When reproducing ControllerPublishVolume for the Pod VM, we cannot use the peerpod volume VMID field to replace NodeID of the ControllerPublishVolumeRequest as, once again, it's the full Azure resource ID, and the azuredisk-csi-driver only expects the name of the VM to publish the volume on. This is also fixed by using the base name instead.

It also includes examples for using the azuredisk-csi-driver to create a Pod consuming a dynamically provisioned PVC or a statically provisioned PVC.

@daniel-weisse daniel-weisse requested a review from a team as a code owner October 18, 2024 09:23
Copy link
Contributor

@mkulke mkulke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, I'll take a look into the PR today. do you have any specific "how-to-test" instructions?


// The azure csi driver requires the nodeID to be just the name of the VM,
// not the full Azure Resource ID, as it is saved in the PeerpodVolume object
if azureVMRegexp.MatchString(vsiID) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we expect situations where VMID wouldn't be full resource id?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for Azure, but other providers might have VM IDs containing slashes, and also have a CSI driver which expects that full VM ID with slashes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. any chance we don't have to rely on heuristics to be aware of the cloudprovider here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so.
As far as I can tell the, neither the peerpod volume spec nor the ControllerService save the active cloud provider anywhere.

@daniel-weisse
Copy link
Contributor Author

daniel-weisse commented Oct 23, 2024

do you have any specific "how-to-test" instructions?

Following the instructions from the README should be enough

edit: I just doubled checked against the official CAA set up for Azure, which deploys Peer Pod VMs in its own separate resource group.
This will cause issues because the CSI driver cannot attach disks it created in the AKS resource group to the VMs in the other resource group.
To fix this, change all references of $AZURE_RESOURCE_GROUP in the azure deployment guide to $AKS_RG.

I'll check if its easily possible to configure the csi driver to create the disks outside the AKS resource group

@mkulke
Copy link
Contributor

mkulke commented Oct 23, 2024

do you have any specific "how-to-test" instructions?

Following the instructions from the README should be enough

edit: I just doubled checked against the official CAA set up for Azure, which deploys Peer Pod VMs in its own separate resource group. This will cause issues because the CSI driver cannot attach disks it created in the AKS resource group to the VMs in the other resource group. To fix this, change all references of $AZURE_RESOURCE_GROUP in the azure deployment guide to $AKS_RG.

I'll check if its easily possible to configure the csi driver to create the disks outside the AKS resource group

thanks for the headsup. the $AKS_RG is sort of managed, i.e. if a cluster is removed the RG is discarded, too. So, we avoid putting resources in that RG if that's feasible. especially for disks we might not want this

@daniel-weisse
Copy link
Contributor Author

Managed to come up with a somewhat clean solution that allows following the documented way of setting up CAA on AKS.
The PR should now be fully functional following the instructions from the docs for Azure and the README from this PR

@mkulke
Copy link
Contributor

mkulke commented Oct 23, 2024

I'm currently following the guide using a fresh AKS + CAA installation from main and I don't seem to get the azuredisk-csi-driver to work (Deploy azuredisk-csi-driver on the cluster + Option A). Somehow the mount is not set up properly:

for the azurefile pv:

kubectl exec nginx-pv -c nginx -- mount | grep mount-path
//....file.core.windows.net/pvc-40dd3dcf-603c-4877-a3e4-b489482bfc44 on /mount-path type cifs (rw,relatime,vers=3.1.1,cache=strict,username=...,uid=0,noforceuid,gid=0,noforcegid,addr=...,file_mode=0777,dir_mode=0777,soft,persistenthandles,nounix,serverino,mapposix,mfsymlinks,reparse=nfs,rsize=1048576,wsize=1048576,bsize=1048576,retrans=1,echo_interval=60,nosharesock,actimeo=30,closetimeo=1)

for azuredisk pv:

$ kubectl exec nginx-pv-disk -c nginx -- mount | grep mount-path
tmpfs on /mount-path type tmpfs (rw,relatime,size=1912876k,nr_inodes=1048576,mode=755,inode64)

the disk is attached to the podvm:

az disk show -n pvc-e1355511-b329-47e2-b539-1a8600eb5930 -g mgns | jq -c '[.diskState, .managedBy]'
["Attached","/subscriptions/..../resourceGroups/mgns/providers/Microsoft.Compute/virtualMachines/podvm-nginx-pv-disk-f6e92b43"]

@daniel-weisse
Copy link
Contributor Author

The mount path being a tempfs seems very strange to me and its a problem I haven't seen before while trying to get the azure driver running.
I used a script to set up my cluster, so its possible I missed something.
I'll try to investigate this more tomorrow

@mkulke
Copy link
Contributor

mkulke commented Oct 23, 2024

The mount path being a tempfs seems very strange to me and its a problem I haven't seen before while trying to get the azure driver running. I used a script to set up my cluster, so its possible I missed something. I'll try to investigate this more tomorrow

I'll leave my cluster in this state, feel free to reach out on slack for debugging

Copy link
Contributor

@mkulke mkulke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. ran tests for azurefile and azuredisk

The csi-wrapper replaces the `NodeID` value when reproducing
`ControllerPublishVolume` with the VM ID saved in the peerpod volume
object.
For Azure, the VM ID is the full Azure resource ID of the VM.
For `ControllerPublishVolume`, the azuredisk-csi-driver requires *just*
the name of the VM as `NodeID`.
Using the full Azure resource ID causes errors.
The azurefile-csi-driver doesn't support `ControllerPublishVolume` since
release v1.30.0, while older releases only checked if `NodeID` is
present in `ControllerPublishVolumeRequest`.
Therefore, we add a check if the VM ID is a full Azure resource ID,
and if it is, pass just the name of the peerpod VM as `NodeID`.

Signed-off-by: Daniel Weiße <[email protected]>
The volume ID for an Azure PVC is the full Azure resource ID of the
disk, which contains  `/` characters.
Since Kubernetes does not allow these characters as object names,
normalize the ID to just the base name of the resource ID.

Signed-off-by: Daniel Weiße <[email protected]>
Keep the directory strcucture clean when adding multiple examples for
Azure.

Signed-off-by: Daniel Weiße <[email protected]>
This example follows the same instructions as the azurefile-csi-driver
example, but uses the azuredisk-csi-driver instead to provision storage
through Azure disks.

Signed-off-by: Daniel Weiße <[email protected]>
@daniel-weisse
Copy link
Contributor Author

Squashed the last 3 commits, should be good to merge now once test ran again

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants