IOMMU passthrough mode but only on trusted VMs?

Upstairs_Cycle384 · 2025-08-08T16:33:38+0200

I understand that there are security implications of enabling IOMMU passthrough with `iommu=pt`. However, in our benchmarks, enabling this gives us a significant performance increases.

We have trusted VMs managed by our admins and untrusted VMs managed by our users. Both would use PCIe passthrough devices.

Setting `iommu=pt` is a global setting fot the entire Hypervisor, but is it possible to lock down the untrusted VMs in such a way that it's essentially in the `iommu=on` or `iommu=forced` for just those untrusted VMs?

I know using `iommu=pt` is a popular suggestion here but we are concerned that it opens us up to potential malware taking over the hypervisor from the guest VMs

leesteken · 2025-08-08T16:38:58+0200

iommu=pt does not "enable passthrough". It selects the identity mapping (also called "passthrough") for devices that are not passed through to VMs and that can improve performance (for devices that are not passed through). I believe your worries do not really apply and you therefore do not have to worry.

Upstairs_Cycle384 · 2025-08-08T21:33:29+0200

I think there is still a threat with `iommu=pt` when the following events occur in order:

1. PCIe device is passed through to an untrusted VM.
2. PCIe device firmware is flashed in VM to something malicious.
3. PCIe device can't touch hypervisor memory because of the IOMMU (`IOMMU=pt`)
4. When the hypervisor is rebooted OR the untrusted VM is shutdown, the malicious firmware running on the PCIe device now has full access to the hypervisor until the untrusted VM starts again and the device is assigned to the VM.

#4 happens because the IOMMU is no longer guarding the host with passthrough mode, and so the malicious firmware just has to wait to be released from the VM to get full unguarded access to all of physical memory on the hypervisor.

There is so much misinformation out there on PCIe passthrough. People think "ACS override" is the only unsafe thing they can do, but `IOMMU=pt` is almost just as bad.

leesteken · 2025-08-08T21:39:05+0200

iommu=pt is not about passthrough to VMs at all. When IOMMU is activated data transfer to PCI(e) devices is managed by the IOMMU (which is why passthrough to VMs can work in a secure way as long as you don't use pcie_acs_override). This can slow down non-passed through devices (which don't really need IOMMU). That's where iommu=pt comes in that sets up direct passthrough for the device (bypassing the IOMMU?) or if that is not possible the identity mapping (1-to-1) for those devices, which does not have such a performance penalty.

EDIT: If I'm wrong about this, somebody please correct me (and provide references to reliable sources that explain how I'm wrong).
EDIT2: iommu=pt might be a security risk but for non-passed through devices that might bypass the IOMMU but you already implicitly trust those devices as you have them connected to the Proxmox host. This is entirely separate from VMs (unless you use pcie_acs_override which breaks the IOMMU security completely).

Upstairs_Cycle384 · 2025-08-08T21:56:10+0200

Okay sure, but I don't think that changes anything about the scenario I just described? IOMMU is in place preventing rogue DMA coming from the compromised PCIe device.

With `iommu=pt`, the device, when not assigned to a VM, bypasses the IOMMU. It can directly access memory on the host. This is because DMA operations on the host are no longer governed by an IOMMU when it's operating in pt mode.

leesteken · 2025-08-08T22:04:14+0200

Upstairs_Cycle384 said:
Okay sure, but I don't think that changes anything about the scenario I just described? IOMMU is in place preventing rogue DMA coming from the compromised PCIe device.

With `iommu=pt`, the device, when not assigned to a VM, bypasses the IOMMU. It can directly access memory on the host. This is because DMA operations on the host are no longer governed by an IOMMU when it's operating in pt mode.

Sure but this is also the case without VMs at all (and your scenarios kept mentioning VMs).
Yes, IOMMU keeps all devices in check but the devices connected to the host can already read/write all of the host memory. I don't think non-passed through devices are actually limited (by Linux) in this way except that the IOMMU prevents them from doing things outside of the claimed capabilities, AFAIK. IOMMU prevents them from accessing memory the wrong way but not from accessing all the memory. Therefore iommu=pt does not really change anything in this regard.

Upstairs_Cycle384 · 2025-08-08T22:47:22+0200

Yes, IOMMU keeps all devices in check but the devices connected to the host can already read/write all of the host memory.

A malicious device cannot access memory outside of its assigned region because the hardware IOMMU is what is mediating memory access.

What you are describing is a DMA attack and it's mitigated by the IOMMU: https://en.wikipedia.org/wiki/DMA_attack (see mitigations).

I am arguing that `iommu=pt` mode lessens the security of your system. Here's an academic whitepaper I found:

From this paper, specifically on the topic of setting "IOMMU passthrough mode in Linux" [1]:

IOMMU pass through mode. In pass through mode, device ad-
dresses are used directly as CPU physical addresses. In this mode
the hardware IOMMU is turned off, so there is no permissions
checking for DMA requests. Devices enter pass through mode if
it is enabled by a kernel parameter, and if during device discovery,
the kernel determines that a device can address all of physical mem-
ory. Some devices can be in pass through mode without all devices
being in this mode.

Because there is no permissions checking, our driver and mi-
crocode attacks work in pass through mode. Pass through mode is
intended to use a software TLB [50], but we verified that on our sys-
tem, the software TLB does not check permissions. In our system,
even though GPU device addresses are 40 bits, it identifies as a 32-
bit device during its initialization. Therefore, the kernel must boot
with less than or equal to 4 GB of memory to enable pass through
mode. We verified that regardless of how much physical memory
is in the machine, if the kernel boots with a mem=4G option, the
kernel defaults to pass through mode where our attacks work.

[1] https://www.cs.utexas.edu/~witchel/pubs/zhu17gpgpu-security.pdf

leesteken · 2025-08-08T22:56:05+0200

Thank you for the references and the information. It seems clear that iommu=pt does lessen the security for devices connected to the host (and not devices passed to VMs). I'm not sure if it matters for devices connected to the host as I'm not sure whether the Lijnux kernel actually applies limits to those devices using the IOMMU. Either way, it does not apply to devices passed through to VMs which can never use the passthrough mode of the IOMMU because they are always monitored by the IOMMU and cannot use the passthrough mode. Or am I wrong about that?

leesteken · 2025-08-08T22:58:49+0200

For which devices (and on which platform) do you see performance improvements with iommu=pt? I would expect that for modern platforms there is no significant performance difference.

Search

Search

IOMMU passthrough mode but only on trusted VMs?

Upstairs_Cycle384

New Member

leesteken

Distinguished Member

Upstairs_Cycle384

New Member

leesteken

Distinguished Member

Upstairs_Cycle384

New Member

leesteken

Distinguished Member

Upstairs_Cycle384

New Member

leesteken

Distinguished Member

leesteken

Distinguished Member

We value your privacy