-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PCI (GPU) passthrough hardening: option ROM edition #1087
Comments
The MSI boards already have global Option ROM disable, but couldn't convince them to make them per-Slot granularity. It is either fully on, fully off, or GPU Option ROMs only.
You want to enable Secure Boot then tell it to ignore to check Boot Loaders so that it only validates Option ROMs only. So, Point 8 here: #929
And how you can possibly do that? As far that I know, VBIOS flashing works by using vendor tools that tells the GPU to use its internal I2C/SPI/whatever Controller to flash the ROM. If you passed the card, these vendor tools would work as in a bare metal environment and I don't see how are you going to block that.
Sure, you can try asking one of the third party vendors to put a Flash Write disable jumper, which I recall having seen in a few historial PC Motherboards when Flash ROM was first introduced. But unless you're a high bidder than can ask for a few thousands of custom cards it is not gonna happen, so I won't bother with it. |
This may be enough, if there are no other devices needing Option ROM. In this particular case we are talking about a laptop, so the customization is limited (yes, I know you can still attach almost any PCIe device, but it's much less common in practice).
Yes, exactly.
Well, I want to enable it for Option ROM even if you need it disabled for the OS.
Yes, exactly, that's the problem.
That doesn't help much if internal flash can still be modified, even if not loaded by that VM. Option ROM could still be changed and will be used by firmware on next reboot (unless you do similar trick in firmware to side-load Option ROM?).
Yes, it would be technically better solution (as it's more comprehensive than just Option ROM), but not feasible at this scale. |
I don't see that like a problem you can actually fix.
Not if you disable loading Option ROMs. It may also be possible to hash the Option ROM (Point 7 of my writeup) so that you know it didn't changed. And yeah, putting an Option ROM in Firmware and loading it for that device instead of its own one QEMU style should also be possible. |
What I care about, is for a reflashed GPU (which we established already is hard to prevent in the first place) to not be able to attack host. There are many ideas how to achieve it - in the issue description, comments, and the other issue. |
The only problem I see here is that it is not validated. We would need grants to enable the DMA attacking tool in the automation process. We have capable hardware. That could confirm in every release that DMA protection is correctly applied.
OptionROMs are typically signed by
This is an exciting part. Do you have any examples of such issues? Because of that, OCP requested a standard update mechanism for GPU firmware, and a document was created.
This probably was already requested and at least partially implemented. Pease check #139
And for complex modern devices, that can be the core issue.
To prevent soft-bricking, one would imagine that boot firmware would detect that fact and warn the user or even not allow the user to self-soft-brick.
This and many other improvements could be employed in UEFI Secure Boot. I already have a ton of requirements in that space. I will explore our options as part of my training campaign in 2025. It may not be hard to implement that at least partially.
TPM measurement + event log? There are also UEFI variables dedicated to exposing firmware capabilities to OS like OsIndicationsSupported.
I guess we should employ guidance from here and expose things in ACPI DMAR table, some information already should be there, but the point is there is no validation of that.
There are better directions than this. Relying on some custom coreboot files exposed will create technical debt, and appropriate mechanisms already exist in the UEFI world. We should ask what to do with non-UEFI builds. Still, I think we should get back to the question of what the standard behavior OSes use for such capability is, and standard most likely will mean what Windows uses for that. Also, checking the Linux approach would be useful. @zirblazer It is hard to read your write-up. It should be split, TBH. Every point is separate (it could be linked for better context). @marmarek I don't think it is possible to make boot firmware responsible for controlling peripheral updates when those peripherals have their closed-source verification mechanism. We cannot handle all possible mocking of buses in the system without affecting correct operation. Unless we reach SPDM and device authentication for the whole system, the feature is unlikely to be implemented. Getting updates only from reasonably trustworthy sources with known paths for escalation, e.g., LVFS, can be done, but that does not prevent malicious actors from gaining privileges in the system and abusing those to deliver the wrong firmware to peripherals if those allow unauthenticated updates. That is on the peripheral vendor to provide the correct update mechanism or on the open-source firmware community to deliver support for a transparent mechanism. The best thing we can do is to look for best practices regarding peripheral firmware updates, test that on given hardware, and provide advice on what hardware is recommended now. Even together, we do not have enough resources to solve that problem. P.S. Maybe this is good discussion for December DUG? |
The problem you're addressing (if any)
Using GPU (or any PCI device for that matter) passthrough with a less trusted VM may allow it to reflash firmware of such device. Just after reboot (during firmware and OS startup) such device is not isolated in a VM and may try to compromise the whole host. This can be done in at least two ways:
Theoretically, reflashing malicious firmware should not be possible due to (at least) signature check done by the GPU firmware update mechanism, but history shows this sometimes happen to be buggy/ineffective or in some cases even non-existent.
Describe the solution you'd like
I see two solutions:
In either case, there needs to be a mechanism for the OS to verify if the mechanism was enabled to inform the user if passthrough is safe for a given device. And similarly, OS needs to be informed if early boot DMA was enabled. Maybe there is some ACPI table that can be used to pass this info to the OS? Or maybe OS can inspect coreboot config (cbfs?) to check if the option is enabled?
Where is the value to a user, and who might that user be?
Use GPU passthrough with reduced risk of compromising the whole system.
Describe alternatives you've considered
Alternative solution could be reliably blocking reflashing dGPU firmware by the VM. And ensure device reset on reboot works reliably too. In other words - ensure that all VM-controlled state is discarded on reboot.
I think this solution would require changes to the board design, and thus be significantly harder to make in practice.
Additional context
We consider making a feature like this mandatory for allowing Qubes OS certification of systems with dGPU. Without such feature, we don't consider dGPU passthrough safe enough to certify such system, and thus it doesn't make much sense for users to buy systems like this if dGPU would be allowed only in dom0, as it would be mostly wasted.
This is especially relevant for V5x models with nvidia.
The text was updated successfully, but these errors were encountered: