[SOLVED] VM won't start with PCI passthrough after upgrade to 9.0

This workaround has worked well for several virtual machines.

Can this patch be any value from 39-48?
Great that it works. But no, for the intel-iommu it can only take 39 or 48 bits for now (it might allow 52 at a later time when 5-level paging is supported by QEMU, but that's for quite specific use cases and one needs the hardware that supports that, but I digress).

The only thing that's needed here is that the guest address width (aw-bits) is smaller than the host physical address width (see lscpu for that), so that the read/writes from/to passthrough devices can properly be translated by the hardware.
 
Great that it works. But no, for the intel-iommu it can only take 39 or 48 bits for now (it might allow 52 at a later time when 5-level paging is supported by QEMU, but that's for quite specific use cases and one needs the hardware that supports that, but I digress).

The only thing that's needed here is that the guest address width (aw-bits) is smaller than the host physical address width (see lscpu for that), so that the read/writes from/to passthrough devices can properly be translated by the hardware.

Thank you.

Sorry if this is a different story, but should I combine it with the other flag guest-phys-bits?

The guest-phys-bits one doesn't seem to work and I still can't suppress the following error

guest-phys-bits is supposed to be patched, but I am confused by the lack of information that has been said about this.

Code:
QEMU[11923]: kvm: vfio_container_dma_map(0x5c9222494280, 0x380000000000, 0x10000, 0x78075ee70000) = -22 (Invalid argument)

https://bugzilla.kernel.org/show_bug.cgi?id=220057#c46

The VFIO_MAP_DMA failures are a VM configuration error and a byproduct that Intel ships platforms where the CPU physical address width is different from the IOMMU address width. QEMU/vBIOS defines the MMIO layout relative to the CPU address width, therefore the vCPU needs to reflect that address width restriction. QEMU makes this configuration available though a guest_phys_bits option, but it doesn't appear that Proxmox provides a way to configure this. The result is these error logs, which indicate P2P DMA mappings are not being created. With the fix we're pursuing above, this should not result in a performance/efficiency loss relative to the page table use though.
 
Last edited:
Sorry if this is a different story, but should I combine it with the other flag guest-phys-bits?

The guest-phys-bits one doesn't seem to work and I still can't suppress the following error

guest-phys-bits is supposed to be patched, but I am confused by the lack of information that has been said about this.

Also, I would appreciate it if you could confirm whether this method for setting the guest physical bits is correct.

qm set vmid -args '-cpu host,hv_passthrough,level=35,+vmx,guest-phys-bits=46 -global intel-iommu.aw-bits=39'
Yes, the guest-phys-bits cpu option is now exposed via the VM config for Proxmox VE too. Setting the cpu in -args overwrites the CPU option provided by Proxmox VE and there guest-phys-bits and phys-bits could've been set before too, even though it's more fragile.

Could you try setting the guest-phys-bits to the same address width as the vIOMMU (39)? If that doesn't work, it might be also worth to try to set phys-bits and host-phys-bits-limit to 39 one-by-one too (via -args '-cpu host,...').
 
  • Like
Reactions: uzumo
QEMU[11923]: kvm: vfio_container_dma_map(0x5c9222494280, 0x380000000000, 0x10000, 0x78075ee70000) = -22 (Invalid argument)
AFAICT from this error message, this is related to a wrong address width that is being sent to the iommufd (the vIOMMU functionality is provided by iommufd, which allows userspace programs to work directly with the IOMMU and therefore DMAR): vfio_container_dma_map uses iommufd and sends a IOMMU_IOAS_MAP_FILE ioctl in the background.

I'm not yet sure where the -22 EINVAL error code comes from exactly (a strace and/or ftrace would be helpful there), but I can only guess that since the fourth argument 0x78075ee70000 needs at least 48-bit to be represented, that this might tick off the IOMMU as it cannot be translated because of the limited address width.
 
  • Like
Reactions: uzumo
Thank you. It seems to be working properly, and the event output has stopped.

I switched back and forth several times to verify, and I confirmed that it outputs at 46 but stops outputting at 39.

Code:
args: -cpu host,hv_passthrough,-hypervisor,level=35,+vmx,guest-phys-bits=46 -global intel-iommu.aw-bits=39

Aug 29 02:54:43 pve1 QEMU[2922]: kvm: warning: vfio_container_dma_map(0x5cecabbace80, 0x380000000000, 0x400000000, 0x72e500000000) = -22 (Invalid argument)
Aug 29 02:54:43 pve1 QEMU[2922]: 0000:04:00.0: PCI peer-to-peer transactions on BARs are not supported.

Code:
args: -cpu host,hv_passthrough,-hypervisor,level=35,+vmx,guest-phys-bits=39 -global intel-iommu.aw-bits=39

No events
 
Last edited:
Will the following be backported to Proxmox VE 8?
As soon as there is a go-to solution to the problem, then yes, this will be backported to Proxmox VE 8 as well.

I am reverting back to Proxmox VE 8 because the virtual machine on Proxmox VE 9 often stops working on my computer.
Is there any trouble with the virtual machine on Proxmox VE 9 with the following? If yes, it would be valuable information to fix the problem.

Code:
args: -cpu host,hv_passthrough,-hypervisor,level=35,+vmx,guest-phys-bits=39 -global intel-iommu.aw-bits=39
 
As soon as there is a go-to solution to the problem, then yes, this will be backported to Proxmox VE 8 as well.


Is there any trouble with the virtual machine on Proxmox VE 9 with the following? If yes, it would be valuable information to fix the problem.

Code:
args: -cpu host,hv_passthrough,-hypervisor,level=35,+vmx,guest-phys-bits=39 -global intel-iommu.aw-bits=39
I'm sorry for any misunderstanding. There's no problem with this configuration itself.

As a separate issue, using PCI passthrough for the GPU in PVE 9 causes KVM to generate an internal error. Therefore, I want to use this guest-phys-bits setting in a PVE 8 environment, where the same configuration does not produce an internal error.
 
Last edited: