Need help , kernel bug or my setting issue of pci passthrough ?

gzbenson

New Member
Oct 25, 2022
12
1
3
i am trying to setup a TrueNAS ( Core 13.0-U3.1 ) VM on proxmox , and i want to pass through 2 pcie devices into this VM ( one sata controller , one Nvme ) for storage pool .

1 x 0000:02:00.0 SATA controller: ASMedia Technology Inc. Device 1166 (rev 02) -- > anything is work fine when i just pass through this 6 port sata controller ( M.2 PCIe 4.0x4 interface on board )

1 x 0000:01:00.0 Non-Volatile memory controller: Sandisk Corp Device 5017 (rev 01) (another M2 PCIe 4.0x4 near CPU Slot) -> VM boot normal , function is Ok , i also created the pool on truenas using this Nvme disk , but i found that i could not reboot or shutdown the vm normally when i pass through this PCIe device , this VM seems to be stuck by something error 。 i need to use "qm stop vmid " and start again the vm or reboot the host and start the vm again .

my host : MAXSUN B660m + 12th i3-12100 + 64G RAM
pve-manager/7.2-14/65898fbc (running kernel: 5.15.74-1-pve) : i use system-boot and zfs rpool

my vm conf :
agent: 1
balloon: 0
boot: order=scsi0
cores: 4
cpu: host
hostpci0: 0000:01:00,pcie=1
hostpci1: 0000:02:00,pcie=1

ide2: local:iso/TrueNAS-13.0-U3.1.iso,media=cdrom,size=1017510K
machine: q35
memory: 30720
meta: creation-qemu=7.1.0,ctime=1668954355
name: truenas
net0: virtio=D6:30:B8:62:37:BF,bridge=vmbr0,firewall=1
numa: 0
ostype: l26
scsi0: local-zfs:vm-110-disk-0,iothread=1,size=16G
scsihw: virtio-scsi-single
Sockets:1

Code:
lspci

00:00.0 Host bridge: Intel Corporation Device 4630 (rev 05)
00:02.0 VGA compatible controller: Intel Corporation Device 4692 (rev 0c)
00:06.0 PCI bridge: Intel Corporation Device 464d (rev 05)
00:0a.0 Signal processing controller: Intel Corporation Device 467d (rev 01)
00:14.0 USB controller: Intel Corporation Device 7ae0 (rev 11)
00:14.2 RAM memory: Intel Corporation Device 7aa7 (rev 11)
00:16.0 Communication controller: Intel Corporation Device 7ae8 (rev 11)
00:17.0 SATA controller: Intel Corporation Device 7ae2 (rev 11)
00:1a.0 PCI bridge: Intel Corporation Device 7ac8 (rev 11)
00:1c.0 PCI bridge: Intel Corporation Device 7ab8 (rev 11)
00:1d.0 PCI bridge: Intel Corporation Device 7ab6 (rev 11)
00:1d.7 PCI bridge: Intel Corporation Device 7ab7 (rev 11)
00:1f.0 ISA bridge: Intel Corporation Device 7a86 (rev 11)
00:1f.4 SMBus: Intel Corporation Device 7aa3 (rev 11)
00:1f.5 Serial bus controller [0c80]: Intel Corporation Device 7aa4 (rev 11)
01:00.0 Non-Volatile memory controller: Sandisk Corp Device 5017 (rev 01)
02:00.0 SATA controller: ASMedia Technology Inc. Device 1166 (rev 02)
04:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 06)
05:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8125 2.5GbE Controller (rev 05)

pve kernel error

Nov 21 10:22:57 pve systemd[1]: Started 110.scope.
Nov 21 10:22:57 pve kernel: pcieport 0000:00:06.0: [12] Timeout
Nov 21 10:22:57 pve kernel: pcieport 0000:00:06.0: [ 8] Rollover
Nov 21 10:22:57 pve kernel: pcieport 0000:00:06.0: device [8086:464d] error status/mask=00001100/00002000
Nov 21 10:22:57 pve kernel: pcieport 0000:00:06.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID)
Nov 21 10:22:57 pve kernel: pcieport 0000:00:06.0: AER: Multiple Corrected error received: 0000:00:06.0

Nov 21 10:22:57 pve pvedaemon[625872]: start VM 110: UPID:pve:00098CD0:00441B04:637AE101:qmstart:110:root@pam:
Nov 21 10:22:57 pve pvedaemon[165878]: <root@pam> starting task UPID:pve:00098CD0:00441B04:637AE101:qmstart:110:root@pam:
Nov 21 10:17:01 pve CRON[621896]: pam_unix(cron:session): session closed for user root
Nov 21 10:17:01 pve CRON[621897]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Nov 21 10:17:01 pve CRON[621896]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)


Is this a pcie aspm problem on PCI bridge
00:06.0 PCI bridge: Intel Corporation Device 464d (rev 05) ??

Code:
/sys/kernel/iommu_groups/7/devices/0000:00:1a.0
/sys/kernel/iommu_groups/15/devices/0000:05:00.0
/sys/kernel/iommu_groups/5/devices/0000:00:16.0
/sys/kernel/iommu_groups/13/devices/0000:02:00.0
/sys/kernel/iommu_groups/3/devices/0000:00:0a.0
/sys/kernel/iommu_groups/11/devices/0000:00:1f.0
/sys/kernel/iommu_groups/11/devices/0000:00:1f.5
/sys/kernel/iommu_groups/11/devices/0000:00:1f.4
/sys/kernel/iommu_groups/1/devices/0000:00:02.0
/sys/kernel/iommu_groups/8/devices/0000:00:1c.0
/sys/kernel/iommu_groups/6/devices/0000:00:17.0
/sys/kernel/iommu_groups/14/devices/0000:04:00.0
/sys/kernel/iommu_groups/4/devices/0000:00:14.2
/sys/kernel/iommu_groups/4/devices/0000:00:14.0
/sys/kernel/iommu_groups/12/devices/0000:01:00.0
/sys/kernel/iommu_groups/2/devices/0000:00:06.0
/sys/kernel/iommu_groups/10/devices/0000:00:1d.7
/sys/kernel/iommu_groups/0/devices/0000:00:00.0
/sys/kernel/iommu_groups/9/devices/0000:00:1d.0
 
Code:
    lspci -nnks 0000:01:00
01:00.0 Non-Volatile memory controller [0108]: Sandisk Corp Device [15b7:5017] (rev 01)
    Subsystem: Sandisk Corp Device [15b7:5017]
    Kernel driver in use: vfio-pci
    Kernel modules: nvme
    
    
    lspci -nnks 0000:00:06
00:06.0 PCI bridge [0604]: Intel Corporation Device [8086:464d] (rev 05)
    Kernel driver in use: pcieport
 
when reboot or shutdown the vm- truenans core , it seems to stuck .
1669001146770.png


-- Journal begins at Mon 2022-09-26 17:47:05 CST, ends at Mon 2022-11-21 11:24:05 CST. --
Nov 21 11:24:05 pve QEMU[625882]: kvm: vfio_err_notifier_handler(0000:01:00.0) Unrecoverable error detected. Please collect any data possible and then kill the guest
Nov 21 11:24:05 pve kernel: pcieport 0000:00:06.0: AER: device recovery successful
Nov 21 11:24:05 pve kernel: pcieport 0000:00:06.0: [21] ACSViol (First)
Nov 21 11:24:05 pve kernel: pcieport 0000:00:06.0: device [8086:464d] error status/mask=00200000/00010000
Nov 21 11:24:05 pve kernel: pcieport 0000:00:06.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Receiver ID)

Nov 21 11:24:05 pve kernel: pcieport 0000:00:06.0: AER: Uncorrected (Non-Fatal) error received: 0000:00:06.0
Nov 21 11:21:24 pve kernel: vmbr0: port 2(enp5s0) entered forwarding state
Nov 21 11:21:24 pve kernel: vmbr0: port 2(enp5s0) entered blocking state
Nov 21 11:21:24 pve kernel: r8169 0000:05:00.0 enp5s0: Link is Up - 10Mbps/Full (downshifted) - flow control off
Nov 21 11:21:24 pve kernel: RTL8226B_RTL8221B 2.5Gbps PHY r8169-0-500:00: Downshift occurred from negotiated speed 2.5Gbps to actual speed 10Mbps, check cabling!
Nov 21 11:21:22 pve kernel: vmbr0: port 2(enp5s0) entered disabled state
Nov 21 11:21:22 pve kernel: r8169 0000:05:00.0 enp5s0: Link is Down
Nov 21 11:20:42 pve kernel: vmbr0: port 2(enp5s0) entered forwarding state
Nov 21 11:20:42 pve kernel: vmbr0: port 2(enp5s0) entered blocking state
Nov 21 11:20:42 pve kernel: r8169 0000:05:00.0 enp5s0: Link is Up - 2.5Gbps/Full - flow control rx/tx
Nov 21 11:20:35 pve kernel: vmbr0: port 2(enp5s0) entered disabled state
Nov 21 11:20:35 pve kernel: r8169 0000:05:00.0 enp5s0: Link is Down
Nov 21 11:19:32 pve pvedaemon[666600]: starting vnc proxy UPID:pve:000A2BE8:00494901:637AEE44:vncproxy:110:root@pam:
Nov 21 11:19:32 pve pvedaemon[613121]: <root@pam> starting task UPID:pve:000A2BE8:00494901:637AEE44:vncproxy:110:root@pam:
Nov 21 11:19:32 pve pvedaemon[638737]: <root@pam> end task UPID:pve:0009C120:0045B495:637AE51A:vncproxy:110:root@pam: OK
Nov 21 11:18:18 pve pveproxy[665728]: worker exit
Nov 21 11:18:17 pve pveproxy[665728]: got inotify poll request in wrong process - disabling inotify
 
Could i got the guiled or information about iommu group , i am not sure what wrong in my conf.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!