Struggling with PCIe passthru on new PVE build

kernull · Apr 20, 2022

TL;DR: pci card comes up as if it simply belongs to the host system despite clearly being set for passthru with a vm.

The situation:

this is a AMD build on a ASRock x570 mobo running PVE 7.1.

I followed the GRUB instructions from https://pve.proxmox.com/wiki/Pci_passthrough since the boot loader is clearly GRUB (v2.04-20)

I went through the manual and BIOS settings and can't find anything else related to virt to enable...

the grub file:

Code:

GRUB_DEFAULT=0
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian`
GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on iommu=pt"
GRUB_CMDLINE_LINUX=""

(I tried it with and without pt), and issued update-grub in between with restarts...

Code:

dmesg | grep IOMMU

[    0.851519] pci 0000:00:00.2: AMD-Vi: IOMMU performance counters supported
[    0.852470] pci 0000:00:00.2: AMD-Vi: Found IOMMU cap 0x40
[    0.863068] perf/amd_iommu: Detected AMD IOMMU #0 (2 banks, 4 counters/bank).

my /etc/modules

Code:

vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd

(also tried before adding these items since the wiki said it might already be in place with kernels later than 5.4)

per "It will not be possible to use PCI passthrough without interrupt remapping.":

Code:

 dmesg | grep 'remapping'
[    0.591109] x2apic: IRQ remapping doesn't support X2APIC mode
[    0.852473] AMD-Vi: Interrupt remapping enabled

in proxmox:

aaaand I think I may have just rubber duck-ed myself here... I just noticed that the HBA im trying to passthru is in the same IOMMU group as the video card that I'm using in the host... I guess I'll go do some reading on how IOMMU groups work and maybe try moving the graphics card to another spot (the hba wont fit elsewhere)

I'll just leave this post here in the chance it ends up helping someone else...

leesteken · Apr 20, 2022

You don't actually need amd_iommu=on because it is on by default. It looks like your iOMMU is working fine and you have multiple IOMMU groups.
Indeed, devices from the same IOMMU group cannot be shared between VMs or a VM and the host (for security reasons). If all else fails, you could consider using pcie_acs_override.

Note that devices are initially visible and usable by the Proxmox host (even when you put them in the configuration of a VM for passthrough). The device is taken from the host (and all other devices in the same IOMMU group) when you start the VM. If you don't want the host to touch the devices before starting the VM, you need to either blacklist all possible drivers for that devices or bind it early to the vfio-pci driver.

kernull · Apr 20, 2022

Thanks for the info! It is much appreciated!

Just watched a youtube on IOMMU, it explained a lot.

If you don't want the host to touch the devices before starting the VM, you need to either blacklist all possible drivers for that devices or bind it early to the vfio-pci driver.

this is definitely the case even if I were to move the card! which way is easiest or most recommended? the vfio-pci driver method you mentioned looks like whats described under PVIF PCI Passthru section?

I saw this in the wiki, but this section:

blacklist the drivers:

echo "blacklist radeon" >> /etc/modprobe.d/blacklist.conf
echo "blacklist nouveau" >> /etc/modprobe.d/blacklist.conf
echo "blacklist nvidia" >> /etc/modprobe.d/blacklist.conf

left me wondering how I can go about getting the string to add to the blacklist... is it just vendor or device name in the screenshot I posted in the orig post?

Thanks again!

preEdit: just found this too which it looks like you were the MVP again haha- https://forum.proxmox.com/threads/pcie-passthrough-for-hba-blacklist-driver-question.85445/
it seems to be almost exactly what I'm trying to do here... I'm going to try adding a vfio.conf file to /etc/modprobe.d as seen here... is there any risk to trying this? I'm always worried when dealing with settings involving RAIDs...

leesteken · Apr 20, 2022

If mpt3sas is the actual driver for the device, then the vfio.conf looks fine. Note that you can get all necessary information for the device with lspci -nnks 0e:00. vfio-pci can also be written als vfio_pci but I suggest change the second occurence in the file. You need to run update-initramfs -u and reboot to activate changes to the /etc/modprobe.d/ directory (for the current kernel).

Did you manage to move the GPU? Note that PCI IDs can change if you move/add/remove devices: they tend to shift by +1 or -1 if the changed device comes earlier in the PCI IDs. The numeric ID (lspci -n) does not change. This can trip up passthrough unexpectedly, as you can probably imagine.

kernull · Apr 20, 2022

thanks again!!

noted with the nnks options and the ... what is the 0e:00 vs the 1000:0072?

also using those options:

i see the [1028:1f1c]... what is that? just part of dell's id info?

I did not move the GPU yet. so the XXXX:YYYY is the PCI ID... ok. And re: incrementing ids... I will note this somewhere since im SURE I'll be cursing when I add another pci device for passthrough down the line.

vfio-pci / vfio_pci good catch! I missed that...

I will run 'update-initramfs -u' and shutdown, after which I'll move the gpu, reboot and see if the host ignores the HBA (if it doesnt I will double check PCI ID numbers)

Thanks again!!!

edit: just saw this from you post in the other thread: https://pcilookup.com/?ven=1000&dev=0087&action=submit so I feel like that answers that quesiton... <vendorID>:<vendorSpecifiedDeviceID>?

kernull · Apr 20, 2022

just following up here to say that the above worked for getting proxmox to not hijack the HBA at boot, but the VM i was passing the HBA to wouldnt boot until i turne dof RAM ballooning and set rombar=0 on the pcie hardware entry.

also, if the guest is ubuntu and you have missinng disks relate dto your fstab it will boot into emergency mode... took me a min to figure that out.

leesteken · Apr 21, 2022

kernull said:
just following up here to say that the above worked for getting proxmox to not hijack the HBA at boot, but the VM i was passing the HBA to wouldnt boot until i turne dof RAM ballooning and set rombar=0 on the pcie hardware entry.

Ballooning cannot work when using passthrough.because PCI(e) devices can read/write any part of the VM memory at any time (DMA) without the CPU knowning about it. All VM memory must therefore be pinned/locked into actual RAM, and cannot be reused (temporarily) by the host for other VMs.
DMA is also the security reason that devices are put into IOMMU groups. Because PCI(e) devices can communicate with each other within a group (without the CPU knowing). they can leak all memory from one VM (or the host) to another. This can happen when one device of a group is in one VM (or the host) and another device of the same group is in another VM.

Probably it was the rombar setting that fixed it for you. This makes me worry if the HBA device still works if you stop and restart the VM (not reboot from within the VM).

kernull said:
also, if the guest is ubuntu and you have missinng disks relate dto your fstab it will boot into emergency mode... took me a min to figure that out.

Indeed, if there is no nofail option used in the fstab entry, Linux systems tend to not boot fully.

kernull · Apr 21, 2022

good to know! I never thought of why it was necessary, just figured with a pcie card needing access to RAM, the hypervisor might not necessarily know it was asking for it like it does the cpu?

regarding rombar being disabled...

This makes me worry if the HBA device still works if you stop and restart the VM (not reboot from within the VM).

I will run this test when I get home, I actually havent had the host stop this guest yet... I assume I need to install the virt-io agent on this ubuntu guest as well...

I'm also trying to figure out exactly which phy pci slots are is in which IOMMU group... Since I dont quite know how to identify the physical slots with the output from `lspci -nn`, I think I will take one card and move it into each spot with a reboot to see where it shows up. I'm worried that all 3 full slots are in the same group, in which case I may be looking into getting a different motherboard I bought this board not being aware of the IOMMU grouping constraint and I have 15 more days to potentially return it. I'm coming from an enterprise mobo with dual xeons that I never encountered any pcie passthru restrictions on, and I want to be able to add pcie devices for various purposes in the future...

The ACS override that you mentioned- from what I read after googling, this seems like a way to break up IOMMU grouping but I'm concerned about stability. I would probably buy a new mobo before going acs override...

thanks, you've been a HUGE help!

edit: also thanks for the nofail fstab option too!

leesteken · Apr 21, 2022

kernull said:
good to know! I never thought of why it was necessary, just figured with a pcie card needing access to RAM, the hypervisor might not necessarily know it was asking for it like it does the cpu?

I'm not sure what you mean...

kernull said:
regarding rombar being disabled...

I will run this test when I get home, I actually havent had the host stop this guest yet... I assume I need to install the virt-io agent on this ubuntu guest as well...

Either install qemu-guest-agent and enable QEMU Guest Agent in the VM Options or don't enable it and hopefully Ubuntu responds to ACPI commands to turn it off. I suggest installing and enabling it, because it also does a filesystem sync when the VM is being backed up.

kernull said:
I'm also trying to figure out exactly which phy pci slots are is in which IOMMU group... Since I dont quite know how to identify the physical slots with the output from `lspci -nn`, I think I will take one card and move it into each spot with a reboot to see where it shows up. I'm worried that all 3 full slots are in the same group, in which case I may be looking into getting a different motherboard I bought this board not being aware of the IOMMU grouping constraint and I have 15 more days to potentially return it. I'm coming from an enterprise mobo with dual xeons that I never encountered any pcie passthru restrictions on, and I want to be able to add pcie devices for various purposes in the future...

The X570 is the best Ryzen chipset for passthrough. All other chipsets have (much) worse groups. Sometimes a newer (or even older) BIOS has better groups. I don't think you are going to find anything better for Ryzen. You might want to try a newer or older BIOS to see in groups improve.

kernull said:
The ACS override that you mentioned- from what I read after googling, this seems like a way to break up IOMMU grouping but I'm concerned about stability. I would probably buy a new mobo before going acs override...

It does not come with guarantees but I'm not worried about stability. Even stronger: with pcie_acs_override you never have IOMMU group issues when changing BIOS versions (because some newer BIOS version have worse grouping than older ones!). It does come with a potential security issue if you don't control the software and users on VMs that share devices from the same (original) groups with the host or other VMs.

Personally, I would not bother with a storage VM with passthrough and just provide a (local) network storage via a container from a ZFS pool on the Proxmox host. Then I don't have to waste a lot of memory on ZFS inside that VM (no ballooning or KSM) and all storage can share the large ARC on the host.

kernull · Apr 21, 2022

The X570 is the best Ryzen chipset for passthrough. All other chipsets have (much) worse groups.

I was looking at other motherboards with the x570... but now your other suggestions have me questioning a few things about the planned setup I have been migrating to

Personally, I would not bother with a storage VM with passthrough and just provide a (local) network storage via a container from a ZFS pool on the Proxmox host. Then I don't have to waste a lot of memory on ZFS inside that VM (no ballooning or KSM) and all storage can share the large ARC on the host.

yea I knew from the begging this is kind of a hacky way to do it, but it was implemented when I was running on ESXi and didnt know how to get mdadm RAID on the host... so passthrough to a ubuntu vm became the solution. A bunch of years later (like 6 i think?) here I am with 3 RAIDs and a bunch of things running on the ubuntu NAS that I'd like to keep running (at least for now so I can get all my hosted stuff back up).

You're making me ask myself what services/roles do I want the host to play? My biggest most appreciated feature of the vm based setup I have is that I could totally isolate services, apps, and roles (separately back them up, manage snapshots etc) and also have entirely virtual networks...

provide a (local) network storage via a container from a ZFS pool on the Proxmox host.

I would love to do this! I think containers are the better way to do a lot of the stuff I'm doing, but I have failed countless times to figure out how to get even part of what I have working with vms running with docker containers (I have started this venture recently again with a wee bit of luck, but thats a whole other story). First task will be to figure out what the difference is between LXC and Docker containers...

I think I will leave the NAS just to get back up and running, but will definitely plan to experiment with moving the RAIDs to the host and figure out how to get an LXC container to manage the share and other NAS apps/config. I have a lot of questions about the LXC containers, I've never actually used them - like I said, I'm coming from ESXi which didnt have any lxc support natively... I'm gonna do some googling and reading on this.

Also, I recognize that ZFS is a better storage solution than md, but when I started all this the cost of buying all drives upfront was too great and I have used the expand ability of md to the benefit of more storage.

thanks again!

leesteken · Apr 21, 2022

There are Proxmox container templates for Ubuntu but a desktop environment is a little more involved than a VM. A container with a mountpoint to your current md should be easy, no need to use ZFS right now. As a bonus multiple containers and the Proxmox host can share (parts of) that storage. There is also a Turn Key file server container template available (in the Proxmox template list), but I have no experience with it. Maybe use pcie_acs_override for your current Ubuntu VM (if you need because of the groups) and migrate to a container at your own pace later.

kernull · Apr 23, 2022

I am just now looking at all the turnkey options for containers. never knew this existed. gonna do some experimentation. thanks for the tip!!!

kernull · Jun 17, 2022

I figured Id follow up with what appears to be a solution for me regarding the iommu grouping...
the asrock x570 taichi does have an acs enable option... once you enable AER, the ACS enable appears and after selecting that it looks like the iommu groups have changed- the hba card is no longer in the same group as the video card

edit: attaching pic didnt work first time it seems...

Search

Search

Struggling with PCIe passthru on new PVE build

kernull

Member

leesteken

Distinguished Member

kernull

Member

leesteken

Distinguished Member

kernull

Member

kernull

Member

leesteken

Distinguished Member

kernull

Member

leesteken

Distinguished Member

kernull

Member

leesteken

Distinguished Member

kernull

Member

kernull

Member