HP MicroServer Gen8 LSI 9211 HBA Passthrough (Proxmox 6.2)

Whitterquick

Active Member
Aug 1, 2020
246
9
38
Hello all, I’m new around here and new to Proxmox. Fantastic work by all those involved, keep it up!

I have been scouring the Internet and in particular this forum and the Proxmox subreddit, but cannot find a definitive answer on how or if PCIe passthrough works on the HP Gen8 MicroServer. I have gone through the guide on the wiki (which is now out of date), I have tried blacklisting drivers, patching the kernel, pretty much everything that I have found suggested, so I was wondering if anyone has got this to work, and if so, how they did it.

Notable specs are as follows:
HP MicroServer Gen8
16GB EEC RAM
Intel Xeon E3-1265Lv2
LSI 9211-4i HBA (IT mode)

I am running the latest version of Proxmox 6.2 and have patched the kernel following this guide.
I should point out that I am fairly new to Linux/command line, and also that Proxmox works fine without the passthrough, and the disks work fine in any other OS (but not tried any other hypervisor). If anything technical is worth a try please list the steps (ELI5) rather than just stating what needs to be done as I may not know how to do it. I understand Xeon E3 has poor isolation capabilities but if this has worked in the past it should be able to work again.

Most threads I have seen where others have said they got it working have been on older versions of Proxmox. If there are no solutions then an option might be to downgrade the kernel or use an older version but I would ideally prefer to not do that.

Let’s try to get a [SOLVED] thread for everyone else to refer to, as I have seen many others are having issues with this and a lot of threads seem to get abandoned without an answer.
 
Last edited:
Please note this thread is specific to HP MicroServer Gen8 as the hardware and BIOS are very specific and solutions for other systems will not always work here (for example other Gen8 or HPE servers).
 
Not sure if I can be any help but without any information posted other than hardware specs there isn't much we can help with. How about you attach the output of lspci and find /sys/kernel/iommu_groups/ -type l
 
Not sure if I can be any help but without any information posted other than hardware specs there isn't much we can help with. How about you attach the output of lspci and find /sys/kernel/iommu_groups/ -type l

Sorry, I just thought there may be something glaring that I'm missing as the issue seems to be common to these MicroServers.

Here is the output from lspci
Code:
root@pve:~# lspci
00:00.0 Host bridge: Intel Corporation Xeon E3-1200 v2/Ivy Bridge DRAM Controller (rev 09)
00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor PCI Express Root Port (rev 09)
00:06.0 PCI bridge: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor PCI Express Root Port (rev 09)
00:1a.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #2 (rev 05)
00:1c.0 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 1 (rev b5)
00:1c.4 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 5 (rev b5)
00:1c.6 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 7 (rev b5)
00:1c.7 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 8 (rev b5)
00:1d.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #1 (rev 05)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev a5)
00:1f.0 ISA bridge: Intel Corporation C204 Chipset Family LPC Controller (rev 05)
00:1f.2 SATA controller: Intel Corporation 6 Series/C200 Series Chipset Family SATA AHCI Controller (rev 05)
01:00.0 System peripheral: Hewlett-Packard Company Integrated Lights-Out Standard Slave Instrumentation & System Support (rev 05)
01:00.1 VGA compatible controller: Matrox Electronics Systems Ltd. MGA G200EH
01:00.2 System peripheral: Hewlett-Packard Company Integrated Lights-Out Standard Management Processor Support and Messaging (rev 05)
01:00.4 USB controller: Hewlett-Packard Company Integrated Lights-Out Standard Virtual USB Controller (rev 02)
03:00.0 Ethernet controller: Broadcom Limited NetXtreme BCM5720 Gigabit Ethernet PCIe
03:00.1 Ethernet controller: Broadcom Limited NetXtreme BCM5720 Gigabit Ethernet PCIe
04:00.0 USB controller: Renesas Technology Corp. uPD720201 USB 3.0 Host Controller (rev 03)
07:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic SAS2004 PCI-Express Fusion-MPT SAS-2 [Spitfire] (rev 03)
root@pve:~#

and find /sys/kernel/iommu_groups/ -type l
Code:
root@pve:~# find /sys/kernel/iommu_groups/ -type l
/sys/kernel/iommu_groups/7/devices/0000:00:1c.7
/sys/kernel/iommu_groups/5/devices/0000:00:1c.4
/sys/kernel/iommu_groups/13/devices/0000:04:00.0
/sys/kernel/iommu_groups/3/devices/0000:00:1a.0
/sys/kernel/iommu_groups/11/devices/0000:07:00.0
/sys/kernel/iommu_groups/1/devices/0000:00:01.0
/sys/kernel/iommu_groups/8/devices/0000:00:1d.0
/sys/kernel/iommu_groups/6/devices/0000:00:1c.6
/sys/kernel/iommu_groups/14/devices/0000:01:00.4
/sys/kernel/iommu_groups/14/devices/0000:01:00.2
/sys/kernel/iommu_groups/14/devices/0000:01:00.0
/sys/kernel/iommu_groups/14/devices/0000:01:00.1
/sys/kernel/iommu_groups/4/devices/0000:00:1c.0
/sys/kernel/iommu_groups/12/devices/0000:03:00.0
/sys/kernel/iommu_groups/12/devices/0000:03:00.1
/sys/kernel/iommu_groups/2/devices/0000:00:06.0
/sys/kernel/iommu_groups/10/devices/0000:00:1f.2
/sys/kernel/iommu_groups/10/devices/0000:00:1f.0
/sys/kernel/iommu_groups/0/devices/0000:00:00.0
/sys/kernel/iommu_groups/9/devices/0000:00:1e.0
root@pve:~#

Thanks.
 
Are you getting errors when you try to pass through the device?

How did you try to pass it through?

Did you exclude it from the kernel through grub options "vfio-pci.ids="? If what is the output of cat /proc/cmdline

Is your proxmox system using uefi?
 
Are you getting errors when you try to pass through the device?

How did you try to pass it through?

Did you exclude it from the kernel through grub options "vfio-pci.ids="? If what is the output of cat /proc/cmdline

Is your proxmox system using uefi?

In the GUI I get the following: Error: start failed: QEMU exited with code 1.

I tried to pass it through using many guides including the wiki, blacklisting the mpt3sas driver, cloning the romfile... nothing works. I have q35 enabled and I have tried with and without 'all functions'.

I believe I have tried excluding from kernel but can you remind me how to do this?
Code:
root@pve:~# cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-5.4.44-2-pve-removermrr root=/dev/mapper/pve-root ro quiet intel_iommu=on iommu=pt pcie_acs_override=downstream vfio_iommu_type1.allow_unsafe_interrupts=1 intremap=no_x2apic_optout

If I remove this pass-through the VM works fine.
 
To exclude it from the kernel you need to find the pci id of the card. In your case run lspci -n -s 0c:00.0 which should return something like 0c:00.0 0106: 1022:7901 (rev 51). The text you are looking for will look like XXXX:XXXX and is what you need for the next step. Edit your grub config and add vfio-pci.ids=XXXX:XXXX with the pci id from the previous command to the arguments for the kernel and update grub. Reboot and then try passing the card in like you normally would through the gui.
 
To exclude it from the kernel you need to find the pci id of the card. In your case run lspci -n -s 0c:00.0 which should return something like 0c:00.0 0106: 1022:7901 (rev 51). The text you are looking for will look like XXXX:XXXX and is what you need for the next step. Edit your grub config and add vfio-pci.ids=XXXX:XXXX with the pci id from the previous command to the arguments for the kernel and update grub. Reboot and then try passing the card in like you normally would through the gui.

Yes I believe I did try this. I will doublecheck if I done it correctly.

Also I did not answer earlier that my system is using BIOS, not UEFI, and it’s a bit of a rigid BIOS too.
 
To exclude it from the kernel you need to find the pci id of the card. In your case run lspci -n -s 0c:00.0 which should return something like 0c:00.0 0106: 1022:7901 (rev 51). The text you are looking for will look like XXXX:XXXX and is what you need for the next step. Edit your grub config and add vfio-pci.ids=XXXX:XXXX with the pci id from the previous command to the arguments for the kernel and update grub. Reboot and then try passing the card in like you normally would through the gui.

Just tried this and it didn't work. :(
From the example you gave, the correct string would be vfio-pci.ids=1022:7901 is that correct? (with my own numbers obviously)
 
The format looks correct. Do you see that when you cat /proc/cmdline.

Do you have the option for passthrough enabled in the bios "VT-d"? Do you get any output when you run dmesg | grep -e DMAR -e IOMMU?
 
The format looks correct. Do you see that when you cat /proc/cmdline.

Do you have the option for passthrough enabled in the bios "VT-d"? Do you get any output when you run dmesg | grep -e DMAR -e IOMMU?

Yes I see it at the end of the string in cat /proc/cmdline. The only option for VT-d in the BIOS is to enable/disable. There are no passthrough options in the BIOS.
Code:
root@pve:~# dmesg | grep -e DMAR -e IOMMU
[    0.000000] Warning: PCIe ACS overrides enabled; This may allow non-IOMMU protected peer-to-peer DMA
[    0.006686] ACPI: DMAR 0x00000000F1DE4A80 0003B4 (v01 HP     ProLiant 00000001 \xd2?   0000162E)
[    0.055403] DMAR: IOMMU enabled
[    0.121616] DMAR: Host address width 39
[    0.121617] DMAR: DRHD base: 0x000000fed90000 flags: 0x1
[    0.121621] DMAR: dmar0: reg_base_addr fed90000 ver 1:0 cap c9008020660262 ecap f010da
[    0.121621] DMAR: RMRR base: 0x000000f1ffd000 end: 0x000000f1ffffff
[    0.121622] DMAR: RMRR base: 0x000000f1ff6000 end: 0x000000f1ffcfff
[    0.121623] DMAR: RMRR base: 0x000000f1f93000 end: 0x000000f1f94fff
[    0.121623] DMAR: RMRR base: 0x000000f1f8f000 end: 0x000000f1f92fff
[    0.121625] DMAR: RMRR base: 0x000000f1f7f000 end: 0x000000f1f8efff
[    0.121626] DMAR: RMRR base: 0x000000f1f7e000 end: 0x000000f1f7efff
[    0.121626] DMAR: RMRR base: 0x000000000f4000 end: 0x000000000f4fff
[    0.121627] DMAR: RMRR base: 0x000000000e8000 end: 0x000000000e8fff
[    0.121627] DMAR: RMRR base: 0x000000f1dee000 end: 0x000000f1deefff
[    0.121629] DMAR-IR: IOAPIC id 8 under DRHD base  0xfed90000 IOMMU 0
[    0.121630] DMAR-IR: HPET id 0 under DRHD base 0xfed90000
[    0.121630] DMAR-IR: Queued invalidation will be enabled to support x2apic and Intr-remapping.
[    0.121861] DMAR-IR: Enabled IRQ remapping in x2apic mode
[    0.813194] DMAR: No ATSR found
[    0.813375] DMAR: dmar0: Using Queued invalidation
[    0.814183] DMAR: Intel(R) Virtualization Technology for Directed I/O
[    1.026915] ehci-pci 0000:00:1a.0: DMAR: 32bit DMA uses non-identity mapping
[    1.037958] mpt3sas 0000:07:00.0: DMAR: 32bit DMA uses non-identity mapping
[    1.046414] uhci_hcd 0000:01:00.4: DMAR: Setting identity map [0xf1dee000 - 0xf1deefff]
[    1.046426] uhci_hcd 0000:01:00.4: DMAR: Setting identity map [0xf1ff6000 - 0xf1ffcfff]
[    1.046435] uhci_hcd 0000:01:00.4: DMAR: 32bit DMA uses non-identity mapping
[    1.069225] ehci-pci 0000:00:1d.0: DMAR: 32bit DMA uses non-identity mapping
[   94.198484] vfio-pci 0000:07:00.0: DMAR: Device is ineligible for IOMMU domain attach due to platform RMRR requirement.  Patch is in effect.
[  300.562769] vfio-pci 0000:07:00.0: DMAR: Device is ineligible for IOMMU domain attach due to platform RMRR requirement.  Patch is in effect.
root@pve:~#
 
So it looks like your kernel patch is in place and working. Can you verify that cat /sys/module/vfio_iommu_type1/parameters/allow_unsafe_interrupts shows it is being set properly. Seems like proxmox has switch from that being built into the kernel to being a module recently.

Also can you post the results of dmesg | tail -100 and tail -100 /var/log/messages run immediately after you get the error trying to start the VM.
 
I did a small BIOS modification last night as advised by this link, but still not working. This does give me hope that it is possible and will get back with those outputs you requested soon.
 
So it looks like your kernel patch is in place and working. Can you verify that cat /sys/module/vfio_iommu_type1/parameters/allow_unsafe_interrupts shows it is being set properly. Seems like proxmox has switch from that being built into the kernel to being a module recently.

Also can you post the results of dmesg | tail -100 and tail -100 /var/log/messages run immediately after you get the error trying to start the VM.

Code:
root@pve:~# cat /sys/module/vfio_iommu_type1/parameters/allow_unsafe_interrupts
Y
root@pve:~# dmesg | tail -100
[ 3.471236] vfio_pci: add [1000:0070[ffffffff:ffffffff]] class 0x000000/00000000
[ 3.478310] Loading iSCSI transport class v2.0-870.
[ 3.487851] iscsi: registered transport (tcp)
[ 3.502721] iscsi: registered transport (iser)
[ 3.508976] spl: loading out-of-tree module taints kernel.
[ 3.511751] znvpair: module license 'CDDL' taints kernel.
[ 3.511752] Disabling lock debugging due to kernel taint
[ 3.553081] systemd-journald[422]: Received request to flush runtime journal from PID 1
[ 3.593988] power_meter ACPI000D:00: Found ACPI power meter.
[ 3.594025] power_meter ACPI000D:00: Ignoring unsafe software power cap!
[ 3.594030] power_meter ACPI000D:00: hwmon_device_register() is deprecated. Please convert the driver to use hwmon_device_register_with_info().
[ 3.599188] IPMI message handler: version 39.2
[ 3.605776] ipmi device interface
[ 3.611172] ipmi_si: IPMI System Interface driver
[ 3.611196] ipmi_si dmi-ipmi-si.0: ipmi_platform: probing via SMBIOS
[ 3.611198] ipmi_platform: ipmi_si: SMBIOS: io 0xca2 regsize 1 spacing 1 irq 0
[ 3.611200] ipmi_si: Adding SMBIOS-specified kcs state machine
[ 3.611262] ipmi_si IPI0001:00: ipmi_platform: probing via ACPI
[ 3.611307] ipmi_si IPI0001:00: ipmi_platform: [io 0x0ca2-0x0ca3] regsize 1 spacing 1 irq 0
[ 3.611310] ipmi_si dmi-ipmi-si.0: Removing SMBIOS-specified kcs state machine in favor of ACPI
[ 3.611311] ipmi_si: Adding ACPI-specified kcs state machine
[ 3.611361] ipmi_si: Trying ACPI-specified kcs state machine at i/o address 0xca2, slave address 0x20, irq 0
[ 3.613099] input: PC Speaker as /devices/platform/pcspkr/input/input4
[ 3.614373] EDAC MC0: Giving out device to module ie31200_edac controller IE31200: DEV 0000:00:00.0 (POLLED)
[ 3.662151] mgag200 0000:01:00.1: remove_conflicting_pci_framebuffers: bar 0: 0xf9000000 -> 0xf9ffffff
[ 3.662154] mgag200 0000:01:00.1: remove_conflicting_pci_framebuffers: bar 1: 0xfbde0000 -> 0xfbde3fff
[ 3.662156] mgag200 0000:01:00.1: remove_conflicting_pci_framebuffers: bar 2: 0xfb000000 -> 0xfb7fffff
[ 3.662158] mgag200 0000:01:00.1: vgaarb: deactivate vga console
[ 3.669838] Console: switching to colour dummy device 80x25
[ 3.679701] [TTM] Zone kernel: Available graphics memory: 8206630 KiB
[ 3.679703] [TTM] Zone dma32: Available graphics memory: 2097152 KiB
[ 3.679704] [TTM] Initializing pool allocator
[ 3.679708] [TTM] Initializing DMA pool allocator
[ 3.719241] fbcon: mgag200drmfb (fb0) is primary device
[ 3.833590] Console: switching to colour frame buffer device 128x48
[ 3.835749] mgag200 0000:01:00.1: fb0: mgag200drmfb frame buffer device
[ 3.957590] ipmi_si IPI0001:00: The BMC does not support clearing the recv irq bit, compensating, but the BMC needs to be fixed.
[ 4.020870] RAPL PMU: API unit is 2^-32 Joules, 2 fixed counters, 163840 ms ovfl timer
[ 4.020871] RAPL PMU: hw unit of domain pp0-core 2^-16 Joules
[ 4.020872] RAPL PMU: hw unit of domain package 2^-16 Joules
[ 4.027699] cryptd: max_cpu_qlen set to 1000
[ 4.034603] AVX version of gcm_enc/dec engaged.
[ 4.034605] AES CTR mode by8 optimization enabled
[ 4.050878] Adding 8388604k swap on /dev/mapper/pve-swap. Priority:-2 extents:1 across:8388604k SSFS
[ 4.058607] [drm] Initialized mgag200 1.0.0 20110418 for 0000:01:00.1 on minor 0
[ 4.124782] ipmi_si IPI0001:00: IPMI message handler: Found new BMC (man_id: 0x00000b, prod_id: 0x2000, dev_id: 0x13)
[ 4.201618] ipmi_si IPI0001:00: IPMI kcs interface initialized
[ 4.202860] ipmi_ssif: IPMI SSIF Interface driver
[ 4.821692] intel_rapl_common: Found RAPL domain package
[ 4.821693] intel_rapl_common: Found RAPL domain core
[ 5.160913] ZFS: Loaded module v0.8.4-pve1, ZFS pool version 5000, ZFS filesystem version 5
[ 5.254981] audit: type=1400 audit(1596831693.888:2): apparmor="STATUS" operation="profile_load" profile="unconfined" name="/usr/bin/lxc-start" pid=686 comm="apparmor_parser"
[ 5.255173] audit: type=1400 audit(1596831693.888:3): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe" pid=685 comm="apparmor_parser"
[ 5.255177] audit: type=1400 audit(1596831693.888:4): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe//kmod" pid=685 comm="apparmor_parser"
[ 5.255741] audit: type=1400 audit(1596831693.888:5): apparmor="STATUS" operation="profile_load" profile="unconfined" name="/usr/bin/man" pid=688 comm="apparmor_parser"
[ 5.255744] audit: type=1400 audit(1596831693.888:6): apparmor="STATUS" operation="profile_load" profile="unconfined" name="man_filter" pid=688 comm="apparmor_parser"
[ 5.255746] audit: type=1400 audit(1596831693.888:7): apparmor="STATUS" operation="profile_load" profile="unconfined" name="man_groff" pid=688 comm="apparmor_parser"
[ 5.259188] audit: type=1400 audit(1596831693.892:8): apparmor="STATUS" operation="profile_load" profile="unconfined" name="/usr/sbin/tcpdump" pid=684 comm="apparmor_parser"
[ 5.262599] audit: type=1400 audit(1596831693.896:9): apparmor="STATUS" operation="profile_load" profile="unconfined" name="lxc-container-default" pid=687 comm="apparmor_parser"
[ 5.262602] audit: type=1400 audit(1596831693.896:10): apparmor="STATUS" operation="profile_load" profile="unconfined" name="lxc-container-default-cgns" pid=687 comm="apparmor_parser"
[ 5.262605] audit: type=1400 audit(1596831693.896:11): apparmor="STATUS" operation="profile_load" profile="unconfined" name="lxc-container-default-with-mounting" pid=687 comm="apparmor_parser"
[ 5.285834] softdog: initialized. soft_noboot=0 soft_margin=60 sec soft_panic=0 (nowayout=0)
[ 5.308954] new mount options do not match the existing superblock, will be ignored
[ 5.352653] vmbr0: port 1(eno1) entered blocking state
[ 5.352654] vmbr0: port 1(eno1) entered disabled state
[ 5.352984] device eno1 entered promiscuous mode
[ 5.652326] bpfilter: Loaded bpfilter_umh pid 824
[ 5.652533] Started bpfilter
[ 9.092764] tg3 0000:03:00.0 eno1: Link is up at 1000 Mbps, full duplex
[ 9.092779] tg3 0000:03:00.0 eno1: Flow control is on for TX and on for RX
[ 9.092792] tg3 0000:03:00.0 eno1: EEE is enabled
[ 9.092814] vmbr0: port 1(eno1) entered blocking state
[ 9.092816] vmbr0: port 1(eno1) entered forwarding state
[ 9.092891] IPv6: ADDRCONF(NETDEV_CHANGE): vmbr0: link becomes ready
[ 17.832008] L1TF CPU bug present and SMT on, data leak possible. See CVE-2018-3646 and https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/l1tf.html for details.
[ 67.437722] mpt2sas_cm0: sending message unit reset !!
[ 67.439245] mpt2sas_cm0: message unit reset: SUCCESS
[ 68.597409] device tap201i0 entered promiscuous mode
[ 68.626657] fwbr201i0: port 1(fwln201i0) entered blocking state
[ 68.626660] fwbr201i0: port 1(fwln201i0) entered disabled state
[ 68.626780] device fwln201i0 entered promiscuous mode
[ 68.626860] fwbr201i0: port 1(fwln201i0) entered blocking state
[ 68.626862] fwbr201i0: port 1(fwln201i0) entered forwarding state
[ 68.631033] vmbr0: port 2(fwpr201p0) entered blocking state
[ 68.631034] vmbr0: port 2(fwpr201p0) entered disabled state
[ 68.631107] device fwpr201p0 entered promiscuous mode
[ 68.631171] vmbr0: port 2(fwpr201p0) entered blocking state
[ 68.631172] vmbr0: port 2(fwpr201p0) entered forwarding state
[ 68.634314] fwbr201i0: port 2(tap201i0) entered blocking state
[ 68.634315] fwbr201i0: port 2(tap201i0) entered disabled state
[ 68.634409] fwbr201i0: port 2(tap201i0) entered blocking state
[ 68.634410] fwbr201i0: port 2(tap201i0) entered forwarding state
[ 68.680262] vfio-pci 0000:07:00.0: DMAR: Device is ineligible for IOMMU domain attach due to platform RMRR requirement. Patch is in effect.
[ 69.133423] fwbr201i0: port 2(tap201i0) entered disabled state
[ 69.153154] fwbr201i0: port 1(fwln201i0) entered disabled state
[ 69.153197] vmbr0: port 2(fwpr201p0) entered disabled state
[ 69.153325] device fwln201i0 left promiscuous mode
[ 69.153326] fwbr201i0: port 1(fwln201i0) entered disabled state
[ 69.171417] device fwpr201p0 left promiscuous mode
[ 69.171419] vmbr0: port 2(fwpr201p0) entered disabled state
root@pve:~#
 
root@pve:~# tail -100 /var/log/messages
Aug 7 21:32:57 pve kernel: [ 3.614430] RPC: Registered tcp transport module.
Aug 7 21:32:57 pve kernel: [ 3.614431] RPC: Registered tcp NFSv4.1 backchannel transport module.
Aug 7 21:32:57 pve kernel: [ 3.615039] vfio_pci: add [1000:0070[ffffffff:ffffffff]] class 0x000000/00000000
Aug 7 21:32:57 pve kernel: [ 3.622593] Loading iSCSI transport class v2.0-870.
Aug 7 21:32:57 pve kernel: [ 3.625573] iscsi: registered transport (tcp)
Aug 7 21:32:57 pve kernel: [ 3.636384] iscsi: registered transport (iser)
Aug 7 21:32:57 pve kernel: [ 3.647693] spl: loading out-of-tree module taints kernel.
Aug 7 21:32:57 pve kernel: [ 3.650037] znvpair: module license 'CDDL' taints kernel.
Aug 7 21:32:57 pve kernel: [ 3.650038] Disabling lock debugging due to kernel taint
Aug 7 21:32:57 pve kernel: [ 3.740842] power_meter ACPI000D:00: Found ACPI power meter.
Aug 7 21:32:57 pve kernel: [ 3.740876] power_meter ACPI000D:00: Ignoring unsafe software power cap!
Aug 7 21:32:57 pve kernel: [ 3.740879] power_meter ACPI000D:00: hwmon_device_register() is deprecated. Please convert the driver to use hwmon_device_register_with_info().
Aug 7 21:32:57 pve kernel: [ 3.743986] IPMI message handler: version 39.2
Aug 7 21:32:57 pve kernel: [ 3.749342] ipmi device interface
Aug 7 21:32:57 pve kernel: [ 3.755862] input: PC Speaker as /devices/platform/pcspkr/input/input4
Aug 7 21:32:57 pve kernel: [ 3.756195] ipmi_si: IPMI System Interface driver
Aug 7 21:32:57 pve kernel: [ 3.756209] ipmi_si dmi-ipmi-si.0: ipmi_platform: probing via SMBIOS
Aug 7 21:32:57 pve kernel: [ 3.756212] ipmi_platform: ipmi_si: SMBIOS: io 0xca2 regsize 1 spacing 1 irq 0
Aug 7 21:32:57 pve kernel: [ 3.756213] ipmi_si: Adding SMBIOS-specified kcs state machine
Aug 7 21:32:57 pve kernel: [ 3.756760] ipmi_si IPI0001:00: ipmi_platform: probing via ACPI
Aug 7 21:32:57 pve kernel: [ 3.756812] ipmi_si IPI0001:00: ipmi_platform: [io 0x0ca2-0x0ca3] regsize 1 spacing 1 irq 0
Aug 7 21:32:57 pve kernel: [ 3.756814] ipmi_si dmi-ipmi-si.0: Removing SMBIOS-specified kcs state machine in favor of ACPI
Aug 7 21:32:57 pve kernel: [ 3.756815] ipmi_si: Adding ACPI-specified kcs state machine
Aug 7 21:32:57 pve kernel: [ 3.756871] ipmi_si: Trying ACPI-specified kcs state machine at i/o address 0xca2, slave address 0x20, irq 0
Aug 7 21:32:57 pve kernel: [ 3.763077] EDAC MC0: Giving out device to module ie31200_edac controller IE31200: DEV 0000:00:00.0 (POLLED)
Aug 7 21:32:57 pve kernel: [ 3.804492] mgag200 0000:01:00.1: remove_conflicting_pci_framebuffers: bar 0: 0xf9000000 -> 0xf9ffffff
Aug 7 21:32:57 pve kernel: [ 3.804494] mgag200 0000:01:00.1: remove_conflicting_pci_framebuffers: bar 1: 0xfbde0000 -> 0xfbde3fff
Aug 7 21:32:57 pve kernel: [ 3.804496] mgag200 0000:01:00.1: remove_conflicting_pci_framebuffers: bar 2: 0xfb000000 -> 0xfb7fffff
Aug 7 21:32:57 pve kernel: [ 3.804498] mgag200 0000:01:00.1: vgaarb: deactivate vga console
Aug 7 21:32:57 pve kernel: [ 3.812639] Console: switching to colour dummy device 80x25
Aug 7 21:32:57 pve kernel: [ 3.822417] [TTM] Zone kernel: Available graphics memory: 8206630 KiB
Aug 7 21:32:57 pve kernel: [ 3.822419] [TTM] Zone dma32: Available graphics memory: 2097152 KiB
Aug 7 21:32:57 pve kernel: [ 3.822420] [TTM] Initializing pool allocator
Aug 7 21:32:57 pve kernel: [ 3.822426] [TTM] Initializing DMA pool allocator
Aug 7 21:32:57 pve kernel: [ 3.863249] fbcon: mgag200drmfb (fb0) is primary device
Aug 7 21:32:57 pve kernel: [ 3.977682] Console: switching to colour frame buffer device 128x48
Aug 7 21:32:57 pve kernel: [ 3.978918] mgag200 0000:01:00.1: fb0: mgag200drmfb frame buffer device
Aug 7 21:32:57 pve kernel: [ 4.089665] ipmi_si IPI0001:00: The BMC does not support clearing the recv irq bit, compensating, but the BMC needs to be fixed.
 
Aug 7 21:32:57 pve kernel: [ 4.158997] RAPL PMU: API unit is 2^-32 Joules, 2 fixed counters, 163840 ms ovfl timer
Aug 7 21:32:57 pve kernel: [ 4.158998] RAPL PMU: hw unit of domain pp0-core 2^-16 Joules
Aug 7 21:32:57 pve kernel: [ 4.158998] RAPL PMU: hw unit of domain package 2^-16 Joules
Aug 7 21:32:57 pve kernel: [ 4.167737] cryptd: max_cpu_qlen set to 1000
Aug 7 21:32:57 pve kernel: [ 4.174621] AVX version of gcm_enc/dec engaged.
Aug 7 21:32:57 pve kernel: [ 4.174623] AES CTR mode by8 optimization enabled
Aug 7 21:32:57 pve kernel: [ 4.202836] [drm] Initialized mgag200 1.0.0 20110418 for 0000:01:00.1 on minor 0
Aug 7 21:32:57 pve kernel: [ 4.264793] ipmi_si IPI0001:00: IPMI message handler: Found new BMC (man_id: 0x00000b, prod_id: 0x2000, dev_id: 0x13)
Aug 7 21:32:57 pve kernel: [ 4.305774] Adding 8388604k swap on /dev/mapper/pve-swap. Priority:-2 extents:1 across:8388604k SSFS
Aug 7 21:32:57 pve kernel: [ 4.344380] ipmi_si IPI0001:00: IPMI kcs interface initialized
Aug 7 21:32:57 pve kernel: [ 4.514432] ipmi_ssif: IPMI SSIF Interface driver
Aug 7 21:32:57 pve kernel: [ 5.143824] intel_rapl_common: Found RAPL domain package
Aug 7 21:32:57 pve kernel: [ 5.143826] intel_rapl_common: Found RAPL domain core
Aug 7 21:32:57 pve kernel: [ 5.244954] ZFS: Loaded module v0.8.4-pve1, ZFS pool version 5000, ZFS filesystem version 5
Aug 7 21:32:57 pve kernel: [ 5.364972] audit: type=1400 audit(1596832376.983:2): apparmor="STATUS" operation="profile_load" profile="unconfined" name="/usr/bin/lxc-start" pid=685 comm="apparmor_parser"
Aug 7 21:32:57 pve kernel: [ 5.365655] audit: type=1400 audit(1596832376.983:3): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe" pid=684 comm="apparmor_parser"
Aug 7 21:32:57 pve kernel: [ 5.365658] audit: type=1400 audit(1596832376.983:4): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe//kmod" pid=684 comm="apparmor_parser"
Aug 7 21:32:57 pve kernel: [ 5.366060] audit: type=1400 audit(1596832376.987:5): apparmor="STATUS" operation="profile_load" profile="unconfined" name="/usr/bin/man" pid=687 comm="apparmor_parser"
Aug 7 21:32:57 pve kernel: [ 5.366065] audit: type=1400 audit(1596832376.987:6): apparmor="STATUS" operation="profile_load" profile="unconfined" name="man_filter" pid=687 comm="apparmor_parser"
Aug 7 21:32:57 pve kernel: [ 5.366068] audit: type=1400 audit(1596832376.987:7): apparmor="STATUS" operation="profile_load" profile="unconfined" name="man_groff" pid=687 comm="apparmor_parser"
Aug 7 21:32:57 pve kernel: [ 5.368023] audit: type=1400 audit(1596832376.987:8): apparmor="STATUS" operation="profile_load" profile="unconfined" name="/usr/sbin/tcpdump" pid=683 comm="apparmor_parser"
Aug 7 21:32:57 pve kernel: [ 5.370351] audit: type=1400 audit(1596832376.991:9): apparmor="STATUS" operation="profile_load" profile="unconfined" name="lxc-container-default" pid=686 comm="apparmor_parser"
Aug 7 21:32:57 pve kernel: [ 5.370356] audit: type=1400 audit(1596832376.991:10): apparmor="STATUS" operation="profile_load" profile="unconfined" name="lxc-container-default-cgns" pid=686 comm="apparmor_parser"
Aug 7 21:32:57 pve kernel: [ 5.370359] audit: type=1400 audit(1596832376.991:11): apparmor="STATUS" operation="profile_load" profile="unconfined" name="lxc-container-default-with-mounting" pid=686 comm="apparmor_parser"
Aug 7 21:32:57 pve kernel: [ 5.426540] softdog: initialized. soft_noboot=0 soft_margin=60 sec soft_panic=0 (nowayout=0)
Aug 7 21:32:57 pve kernel: [ 5.451914] new mount options do not match the existing superblock, will be ignored
Aug 7 21:32:57 pve kernel: [ 5.470754] vmbr0: port 1(eno1) entered blocking state
Aug 7 21:32:57 pve kernel: [ 5.470756] vmbr0: port 1(eno1) entered disabled state
Aug 7 21:32:57 pve kernel: [ 5.470829] device eno1 entered promiscuous mode
Aug 7 21:32:57 pve kernel: [ 5.781677] bpfilter: Loaded bpfilter_umh pid 828
Aug 7 21:33:00 pve kernel: [ 9.271176] tg3 0000:03:00.0 eno1: Link is up at 1000 Mbps, full duplex
Aug 7 21:33:00 pve kernel: [ 9.271197] tg3 0000:03:00.0 eno1: Flow control is on for TX and on for RX
Aug 7 21:33:00 pve kernel: [ 9.271198] tg3 0000:03:00.0 eno1: EEE is enabled
Aug 7 21:33:00 pve kernel: [ 9.271219] vmbr0: port 1(eno1) entered blocking state
Aug 7 21:33:00 pve kernel: [ 9.271222] vmbr0: port 1(eno1) entered forwarding state
Aug 7 21:33:00 pve kernel: [ 9.271300] IPv6: ADDRCONF(NETDEV_CHANGE): vmbr0: link becomes ready
Aug 7 21:33:09 pve kernel: [ 18.006404] L1TF CPU bug present and SMT on, data leak possible. See CVE-2018-3646 and https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/l1tf.html for details.
Aug 7 21:43:22 pve kernel: [ 629.811959] mpt2sas_cm0: sending message unit reset !!
Aug 7 21:43:22 pve kernel: [ 629.813687] mpt2sas_cm0: message unit reset: SUCCESS
Aug 7 21:43:23 pve kernel: [ 630.954274] device tap201i0 entered promiscuous mode
Aug 7 21:43:23 pve kernel: [ 630.981950] fwbr201i0: port 1(fwln201i0) entered blocking state
Aug 7 21:43:23 pve kernel: [ 630.981951] fwbr201i0: port 1(fwln201i0) entered disabled state
Aug 7 21:43:23 pve kernel: [ 630.982027] device fwln201i0 entered promiscuous mode
Aug 7 21:43:23 pve kernel: [ 630.982120] fwbr201i0: port 1(fwln201i0) entered blocking state
Aug 7 21:43:23 pve kernel: [ 630.982122] fwbr201i0: port 1(fwln201i0) entered forwarding state
Aug 7 21:43:23 pve kernel: [ 630.986491] vmbr0: port 2(fwpr201p0) entered blocking state
Aug 7 21:43:23 pve kernel: [ 630.986493] vmbr0: port 2(fwpr201p0) entered disabled state
Aug 7 21:43:23 pve kernel: [ 630.986603] device fwpr201p0 entered promiscuous mode
Aug 7 21:43:23 pve kernel: [ 630.986683] vmbr0: port 2(fwpr201p0) entered blocking state
Aug 7 21:43:23 pve kernel: [ 630.986685] vmbr0: port 2(fwpr201p0) entered forwarding state
Aug 7 21:43:23 pve kernel: [ 630.989724] fwbr201i0: port 2(tap201i0) entered blocking state
Aug 7 21:43:23 pve kernel: [ 630.989726] fwbr201i0: port 2(tap201i0) entered disabled state
Aug 7 21:43:23 pve kernel: [ 630.989831] fwbr201i0: port 2(tap201i0) entered blocking state
Aug 7 21:43:23 pve kernel: [ 630.989832] fwbr201i0: port 2(tap201i0) entered forwarding state
Aug 7 21:43:23 pve kernel: [ 631.029718] vfio-pci 0000:07:00.0: DMAR: Device is ineligible for IOMMU domain attach due to platform RMRR requirement. Patch is in effect.
Aug 7 21:43:23 pve kernel: [ 631.483253] fwbr201i0: port 2(tap201i0) entered disabled state
Aug 7 21:43:23 pve kernel: [ 631.506183] fwbr201i0: port 1(fwln201i0) entered disabled state
Aug 7 21:43:23 pve kernel: [ 631.506227] vmbr0: port 2(fwpr201p0) entered disabled state
Aug 7 21:43:23 pve kernel: [ 631.506535] device fwln201i0 left promiscuous mode
Aug 7 21:43:23 pve kernel: [ 631.506537] fwbr201i0: port 1(fwln201i0) entered disabled state
Aug 7 21:43:23 pve kernel: [ 631.528426] device fwpr201p0 left promiscuous mode
Aug 7 21:43:23 pve kernel: [ 631.528428] vmbr0: port 2(fwpr201p0) entered disabled state
root@pve:~#
 
So it looks like your kernel patch is in place and working. Can you verify that cat /sys/module/vfio_iommu_type1/parameters/allow_unsafe_interrupts shows it is being set properly. Seems like proxmox has switch from that being built into the kernel to being a module recently.

Also can you post the results of dmesg | tail -100 and tail -100 /var/log/messages run immediately after you get the error trying to start the VM.
Hello jtracy,
Do you see anything in any of my logs that looks like it could be causing VMs with PCIe passthrough to fail?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!