-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Qubes on KVM #12
Comments
Hi! Very exciting to hear that someone else will be working on Qubes on KVM. In case you're not aware, there is this effort to do the same, motivated by the desire to run Qubes on POWER9 CPUs which don't have Xen support. As for the current state of libkvmchan, I would consider it pre-alpha at the moment. The majority of the vchan API is implemented (all that's missing is vchan lifecycle management, i.e. That said, in its current state, it is enough to support qrexec mostly unmodified (a usleep() needs to be added before the client connects since vchan creation takes longer than it does on Xen but everything else works fine). I haven't been working on it recently but I plan to resume development shortly to resolve the above concerns. After that's done, I will shift my focus to getting qubes-gui-{daemon,agent} running through libkvmchan's ivshmem backend. This will likely require upstream changes to the relevant Qubes components too, but after brief discussions with @marmarek I don't think this will be an issue. For X86_64, we'd also want to implement architecture-specific VIOMMU support in the project's VFIO driver like I have done for ppc64. It will still work without that, but guests will need to operate in the potentially unsafe VFIO-NOIOMMU mode. See here for more information. I would like to collaborate on your porting efforts as much as possible, so don't hesitate to reach out with any questions or concerns! I'd also be curious to hear what your experiences have been with libkvmchan thus far. From a skim of the linked Google Groups thread, it seems you have compiled it but not used it yet? Since I have done all development thus far on ppc64le, there may be some kinks that need to be worked out for X86_64 in addition to the VIOMMU thing, but it should be close to working. If you have any questions on how to set it up, let me know. |
Thanks for the detailed overview. I'm also happy to hear you're interested in implementing the gui modules :) You were right, I have not used libkvmchan as of yet. I have only glanced at the code and packaged it as well as created some systemd startup unit files for the kvmchand binary. Although, I am going to start working on it right now since I need (want) to get it working with Qubes as I am at the point where Qubes is attempting to start the qrexec process when starting up the virtual machine (qvm-start). I just want to confirm that the daemon should be launched as 'kvmchand -d' on the KVM host. (I created a systemd unit file that runs on start-up). Currently the daemon exits when the Qubes qvm-start command is run as it attempts to execute the qubesdb-daemon. I am not really concerned about it crashing as I have not had a chance to debug it further. I only mention this because sometimes when the daemon exits, I am unable to manually restart it, which then requires a reboot. Just wondering if you have experienced this issue, and if so, have any advice, and if not, don't worry about it. FYI, the systemd messages are Something else to think about is considering (sometime in the future) is splitting the host and guest daemons since it currently depends on libvirt which means libvirt needs to be installed into the VM template to satisfy the dependency. Your quest to run Qubes on the POWER9 CPU sounds very interesting. I can help as I am quite familiar with most of the Qubes components as I worked for them back in 2015 on a one year contract. Once I finish the process of ensuring all existing Qubes modules are working on X86_64 KVM I can attempt to cross-compile for the POWER9 CPU. Won't be much help debugging any issues though since I only have an AMD processor. |
I think I figured out what was preventing manually restarting kvmchand. Seems like one of the Qubes processes must have held a reference it. I noticed the VM was started in a pause state. When manually forcing the VM to shutdown using virt-manager, I was able to restart the daemon manually. |
Glad to hear you were able to resolve it. One thing to note is that under systemd, you can omit the And as for the splitting of host/guest daemons to reduce guest dependencies, this should be possible with a few makefile tweaks (and maybe some preprocessor statements in the entrypoint). I've created issue #13 to track this. |
FYI, It seems like kvmchand exits if it loses the libvirt connection. It seems strange but the qubesd python component also seems to lose its connection after each libvirt command accessed, but has a wrapper to automatically re-connect. Will need to check with @marmarek if that is normal behaviour. Every time qubesd re-established a connection to libvirt, kvmchand exited. For now I just added |
Thanks, much better logging :) |
Yeah, the libvirt part of the code currently doesn't gracefully handle loss of connection to libvirtd, since I originally didn't envision that as a likely scenario. Is it expected for libvirtd to restart frequently in a typical Qubes environment? If so I'll create an issue to track this. I'm also not sure what exactly the behavior should be when this happens. Should the daemon simply maintain all existing vchans and continuously try reconnectiong, or should it disconnect all existing vchans and essentially restart itself? More information on when exactly the libvirtd connection is expected to drop would be helpful in determining this. |
Sounds like libvirtd crashing. Do you see any core dump? (
It shouldn't be frequent, but it (normally) happens on installing updates.
Restarting libvirt should not interrupt existing connections. On a general thought - if I understand correctly libkvmvchan requires host side to orchestrate every VM-VM connection. We could use this occasion to adjust libvchan API to ensure dom0 really approves all the connections. In Xen case currently two cooperating VMs can establish vchan without dom0 approval, which is not optimal design. This change do mean libvchan API change, but I think the gains are worth it. If this change could also simplify (or even eliminate) |
Add usleep() in qrexec?
Working on getting kvmchand running in the template VM. VFIO-NOIOMMU mode is not enabled for the kernel that is installed in the template so I will need to build a custom kernel to get it running for testing purposes. I understand that running in this mode will taint the kernel and I think would prevent device assignment since there would be no IOMMU to provide DMA translation. Is this correct? Is this the proper way to set up the guest libvirt config? Do msi settings also need to be applied?
|
Yeah, something like this:
NOIOMMU mode will not prevent device assignment or hotplugging - the kernel will still assign memory regions to the PCIe device as normal. The only difference is that the ivshmem device's view of memory will not be restricted by an IOMMU. This means a malicious ivshmem device (which means a malicious host QEMU) would be able to write to privileged guest memory. For this use case, that's obviously not an issue since a compromised QEMU would be able to do those things anyways.
For NOIOMMU, no guest libvirt config changes are necessary. kvmchand will automatically attach the required ivshmem devices to all libvirt-managed guests at run-time. If you're curious, the relevant code is here. You'll be able to see the attachment in kvmchand's log, or by running
|
Gotcha. Created #14.
Correct.
This sounds perfectly reasonable to me. We discussed this previously in #1 and what the API for this might look like. Curious to hear yours and @pwmarcz's thoughts.
I don't think it would have any significant impact on kvmchand, since this is already essentially how its implemented today. The changes would just be adding some additional authentication logic and associated new APIs. |
I'm having an issue with getting I am including the related logs below for the host and The logs contains a few extra entries from master, but all line numbers match and I removed noise to keep the size to a minimum. PASS: Host start of
PASS: Guest start with systemd unit file auto-starting kvmchad on boot
FAIL: Guest start of qubes-qrexec-agent.service
|
Interesting, thank you for the detailed logs! I believe the issue is the following:
The current VFIO code assumes that all ivshmem devices will be in the same IOMMU group. On POWER this isn't an issue, since all devices end up in the same group (potentially due to statically assigning the device to the same bridge). I think the next step would be to see if we can get that behavior on x86 by reusing the same static assignment code, though I vaguely recall a limitation with NOIOMMU mode that results in each device getting its own fake IOMMU group so this may not work. The NOIOMMU code was added really early on in this project's life, before hotplugging of ivshmem devices was implemented, and hasn't been tested by me since the addition of POWER vIOMMU support, so that's why I haven't caught this. At this point, instead of modifying the VFIO code to tolerate multiple VFIO groups for NOIOMMU mode, I think the effort would be best spent on implementing proper vIOMMU support for x86_64 (#16). I'll spin up an x86_64 box and start work on this. I'll give you an update when it's implemented. |
Already hit a roadblock with vIOMMU support on x86_64. Hilariously, the x86_64 VM PCIe hotplug driver requires the entire PCIe bridge to be shut down for the duration of the hotplug, therefore invalidating all previously held ivshmem device handles. I may look into patching this in the kernel (likely this requirement comes from constraints of real hardware that don't necessarily apply to VMs with virtual PCIe devices), but for the meantime adding support for multiple IOMMU groups to VFIO and sticking with NOIOMMU mode may be the way to go. EDIT: Upon further investigation, the issues go even deeper. On Q35, all hotplugged devices need their own pre-defined pcie root port (see here), so additional root port allocation code will need to be added to libvirt.c. As you mentioned, i440fx seems to allow hotplugging of devices by default without any manual root port assignment. The downside is that the vIOMMU is unavailable to i440fx guests. In light of all of this, adding multiple IOMMU group support so that the existing NOIOMMU code can be used with i440fx guests seems like the path of least resistance for now. |
I hate roadblocks :) Thanks for looking into this so quickly. Are you familiar with the In regards to Q35, you can add a I don't mind having to rely on NOIOMMU mode in the short term but have concerns relying on that mode since it requires a custom kernel to be built (and maintained) as mainstream kernels such as Fedora do not enable the option. The other concern is how this effects security when also passing through other vfio devices, such as a network controller and GPU, when that NOIOMMU is enabled. The ultimate goal would be to have a solution that work with the Q35 machine as the I440 is considered legacy and Q35 adds many performance improvements and features. Even the OVMF team has stated many bios features are targeted towards Q35. Now I am aware that multiple IOMMU groups is an issue, I'll also start researching possible solutions tonight and play around with your code some more. Feel free to offload any testing or additional research to me. |
Implemented support for multiple VFIO groups: 8b605c2. Everything seems to work as expected on an i440fx guest!
From what I remember,
This might be the perfect solution - I don't know why I didn't try that! In theory this should match the i440fx behavior (with the added requirement of specifying the bridge in the libvirt.c hotplug code), right? If that's the case then adding support should be trivial. I'll work on this next.
Agreed on both accounts - NOIOMMU is a stopgap solution at best. Now that I'm aware of the
If you could let me know how the multiple VFIO group commit works for you, that'd be great. I'd also like to look into a proper way for detecting which PCI(e) bridge ivshmem devices should be hotplugged into in the |
Great, I will start testing it tonight!
I quickly hacked the
I will test it tonight. I was thinking that one |
Just a quick update to let you know that the multiple VFIO groups seem to be working nicely and was able to get the host to communicate with a I440 guest using qrexec! I will provide more details over the next few days since I still need to work on the configurations. |
Just another update... Over the last week I have done many tests and worked some more on packaging. The qubes-builder currently builds all host and template packages for Fedora 32. The template boots and communicates via qubes-db. I have not been able to get qvm-run working from host to guest for some reason, but admit I have not spent that much time on it since I wanted the initial packaging and build working for further testing. I had issues with
I also had issues within the VM where |
Another Quick Status UpdateSome good news; I just finished implementing the I was also in the process of preparing resources for you to highlight the code that needs to be changed for the KVM GUI when I came across some previous work you had completed within the qubes-gui-daemon and figured you must already have an understanding of the related Qubes internals. I will post them in a separate issue since the resources still may be useful. If there is anything I can help out with to get this implemented, just let me know what you need. |
Hi guys, any updates for us lurkers? |
Hi, I've recently picked up working on libkvmchan again to fix the outstanding bugs and bring it closer to feature parity with Xen's vchan. I just pushed a fix for #20, which was one of the major outstanding issues. At this point I'm going to look into Qubes-specific bringup work. It seems @nrgaway has already done a lot in this area, so I'll likely begin by basing it off of their work. My initial target will be ppc64le, but x86_64 shouldn't be much extra work. @nrgaway, if you could provide an overview of your development environment (host OS and configuration, how you build your qubes VM images, etc.), that would be greatly appreciated. |
The community is documenting pros and cons in an architectural discussion on the qubes forum here: @flflover mentioned this thread there. |
@shawnanastasio
Hello and thanks for the work you have completed so far on libkvmchan and core-qubes-vchan-kvm.
I just wanted to let you know I am currently working on porting all Qubes features and app modules to work on a KVM host. Goals and progress are posted on Google Groups qubes-devel (https://groups.google.com/forum/?oldui=1#!topic/qubes-devel/dIw40asXmEI).
I have packaged your modules for Fedora, Debian and ArchLinux to allow building with in the Qubes builder using your supplied license and crediting you as author. I have not committed and pushed the repos, but plan to this week after having a chance complete testing.
If you have any questions or comments you can post them in the thread I listed above or here. I would also be interested in any future development plans you may have in relation to these modules.
The text was updated successfully, but these errors were encountered: