Latest Update 5.13.19-4-pve broke my QEMU PCIe Sharing. Works with 5.13.19-3

bluepr0

Well-Known Member
Mar 1, 2019
68
5
48
68
Hello!

My TrueNAS VM is not starting anymore. I had a PCIE Card shared but now I'm getting the error bellow. I have tried going back one version on QEMU but still the same result. How I can check what packages were installed before the updated ones today? (I was up to date yesterday so these packages installed were pushed today)

Thanks a lot!
Screenshot 2022-02-03 at 16.42.01.png
 
It looks like 5.13.19-4-pve has something wrong with it.

During bootup, when you briefly see the Proxmox grub menu, quickly select 'Advanced options for Proxmox VE GNU/Linux', and then 5.13.19-3-pve from the list of Kernels.

After doing that, my device properly passed through.

For the Proxmox folks, I'm passing through two motherboard devices and four physical PCI cards. Only one device was failing after the 5.13.19-4-pve update, and I don't know why. I have it block in vfio, but PVE was accessing it anyways and throwing kernel crash errors and strange messages on shutdown. I don't have the messages at the moment, as it was critical to get back up.

But the hardware in questions is the follwing:

Bash:
IOMMU Group 32 06:00.0 Serial Attached SCSI controller [0107]: Intel Corporation C602 chipset 4-Port SATA Storage Control Unit [8086:1d6b] (rev 06)


lspci -n -s 06:00

00:1f.2 0104: 8086:2826 (rev 06)


cat /etc/modprobe.d/vfio.conf | grep 8086:1d6b

options vfio-pci ids=10de:13c2,10de:0fbb,10de:1b80,10de:10f0,1b4b:9215,8086:1d6b,8086:2826,1b73:1100 disable_vga=1


I'm not an expert, but I don't think this should be a linux support issue for the C602 chipset as it should be getting black-listed and not touched by my PVE host. Not sure what is going on. The device was also in it's own IOMMU group.

Thank you for this awesome software that I couldn't live without.
 
Last edited:
  • Like
Reactions: bluepr0
I noticed this during the update, I'm I reading it wrong, or were two kernels distributed on top of each other?

1643907996778.png

Maybe it's normal.
 
I am also experiencing the same issue with PCIE pass through using pve-kernel-5.13.19-4.

When starting a VM from the terminal:

Code:
# qm start 101
got unexpected control message:
Cannot bind 0000:47:00.0 to vfio

# lspci | grep 47:00
47:00.0 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] (rev 51)

PCIE passthrough works fine if I boot back into pve-kernel-5.13.19-3.

I'm using a Gigabyte TRX40 Aorus Pro WiFi motherboard with the latest firmware (F5) with an AMD Threadripper 3960X processor.
 
  • Like
Reactions: bluepr0
I can confirm the same issue with pcie passthrough to a TrueNAS VM. I had to reboot and select pve-kernel-5.13.19-3 to fix the issue.
 
FYI I fixed the issue by updating to the latest kernel:
Code:
apt update
apt install pve-kernel-5.15
 
  • Like
Reactions: andrema2
Can also confirm the pve-kernel update broke my PCIe passthrough.

I've moved to the 5.15 kernel since I don't have the ability to attach a monitor and use the grub screen easily. 5.15 is working slightly better; my LSI card works but my GPU doesn't. Edit - Reverting back to 5.13.19-3 fixed all issues.

If you are running headless like me this post was the most helpful on changing the kernel.

If anyone sees this before they're running their updates; don't update.
 
Last edited:
I also can confirm that updating to 5.13.19-4-pve broke the PCI passthrough of my LSI SAS2008 for TrueNAS.
The task window error shows
Code:
TASK ERROR: Cannot bind 0000:01:00.0 to vfio
dmesg | grep -i -e DMAR -e IOMMU only lists linked modules
Code:
 root@pvemain:~# dmesg | grep -i -e DMAR -e IOMMU
[   46.609470] Modules linked in: tcp_diag inet_diag veth ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter bpfilter nf_tables bonding tls softdog nfnetlink_log nfnetlink intel_rapl_msr intel_rapl_common isst_if_common skx_edac nfit x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel ipmi_ssif kvm ast crct10dif_pclmul drm_vram_helper ghash_clmulni_intel drm_ttm_helper aesni_intel ttm crypto_simd drm_kms_helper cryptd cec rapl rc_core i2c_algo_bit intel_cstate fb_sys_fops syscopyarea sysfillrect sysimgblt pcspkr joydev input_leds mei_me ioatdma mei dca intel_pch_thermal acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad mac_hid zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) vhost_net vhost vhost_iotlb tap ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi vfio_pci vfio_virqfd irqbypass vfio_iommu_type1 vfio drm sunrpc ip_tables x_tables
 
Last edited:
Also confirmed - 5.13.19-4-pve broke my system as well - get extremely hugh IO delays and systems hangs HARD! Can't even perform proper shutdown through KVM - console totally frozen!! Went back to previous kernel - all is well . . .
 
Confirmed latest kernel broke my TrueNAS HBA card PCI passthrough, booted back into 5.13.19-3-pve and all is well.
 
Is it just drive controller PCI(e) passthrough that is breaking? My GPUs and USB controller work fine with this version (they break with 5.15.17). Or is it just only people with drive controllers responding to this thread?
 
Is it just drive controller PCI(e) passthrough that is breaking? My GPUs and USB controller work fine with this version (they break with 5.15.17). Or is it just only people with drive controllers responding to this thread?
Same here. I have GPU passthrough that works fine on 5.13.19-4. Only my onboard SATA controller breaks.