Hi There,
I've got an issue with pass-through on proxmox 7.4, regardless what VM am trying to do so with & used all kernel possible variation i can get installed (5.15.x-6.x) to make it work.
When ever i try to to pass through (LSI-9201-8i Flashed in IT mode) to any VM i create (Rockstar, TrueNAS Core/Scale, Generic Linux..) server crashes/locks-up with not much output in logs.
Only apparent kernel crash i was able to capture is when i managed to blacklist `mpt2sas` from cmdline or even from `/etc/modprobe.d/pve-blacklist.conf`.
As soon as it boots ups .. it proceeds with loading system & then `kernel panic` (attached)
I've tried to the best of my knowledge to enable (serial tty, netconsole & i did kdump too) but it was no where as I cant get a solid output from them as well.
Am out of all clues ...
BTW: passing through any other device works properly, Quadro P400 was tested/used to isolate the issue from being MB or HBA.
Configration :
Going manual method gives out the same out put
This output was taken w/o blacklisting "mpt2sas or mpt3sas " as i understood that it is automated & it "should work unless there is a need for manual work", pve is trying to deattach the device from that loaded drive to pass it to vm .. which results in a lock-up!!
I've got an issue with pass-through on proxmox 7.4, regardless what VM am trying to do so with & used all kernel possible variation i can get installed (5.15.x-6.x) to make it work.
When ever i try to to pass through (LSI-9201-8i Flashed in IT mode) to any VM i create (Rockstar, TrueNAS Core/Scale, Generic Linux..) server crashes/locks-up with not much output in logs.
Only apparent kernel crash i was able to capture is when i managed to blacklist `mpt2sas` from cmdline or even from `/etc/modprobe.d/pve-blacklist.conf`.
As soon as it boots ups .. it proceeds with loading system & then `kernel panic` (attached)
I've tried to the best of my knowledge to enable (serial tty, netconsole & i did kdump too) but it was no where as I cant get a solid output from them as well.
Am out of all clues ...
BTW: passing through any other device works properly, Quadro P400 was tested/used to isolate the issue from being MB or HBA.
Configration :
-
Code:
Proxmox VE (proxmox-ve: 7.4-1 (running kernel: 6.2.9-1-pve)) Intel 11th Gen @ 2.20GHz 32GB DDR4 (NON ECC) Motherboard has 1x (PCIe x1) 1x (PCIe x16) 2x NVMe slots 1x Wifi slot (empty) RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (Built-In) RTL8125 2.5GbE Controller (PCIe x1) Broadcom / LSI SAS2008 PCI-Express Fusion-MPT SAS-2 (PCIe x16) "Genuine" - Thank you Art Of Server :cool: Boot/System Drive nvme0n1 238.5G SK hynix BC501 HFM256GDJTNG-8310A pveversion -v proxmox-ve: 7.4-1 (running kernel: 6.2.9-1-pve) pve-manager: 7.4-3 (running version: 7.4-3/9002ab8a) pve-kernel-5.15: 7.4-2 pve-kernel-6.2.9-1-pve: 6.2.9-1 pve-kernel-5.15.107-1-pve: 5.15.107-1 pve-kernel-5.15.102-1-pve: 5.15.102-1 ceph-fuse: 15.2.17-pve1 corosync: 3.1.7-pve1 criu: 3.15-1+pve-1 glusterfs-client: 9.2-1 ifupdown2: 3.1.0-1+pmx3 ksm-control-daemon: 1.4-1 libjs-extjs: 7.0.0-1 libknet1: 1.24-pve2 libproxmox-acme-perl: 1.4.4 libproxmox-backup-qemu0: 1.3.1-1 libproxmox-rs-perl: 0.2.1 libpve-access-control: 7.4-2 libpve-apiclient-perl: 3.2-1 libpve-common-perl: 7.3-4 libpve-guest-common-perl: 4.2-4 libpve-http-server-perl: 4.2-3 libpve-rs-perl: 0.7.5 libpve-storage-perl: 7.4-2 libspice-server1: 0.14.3-2.1 lvm2: 2.03.11-2.1 lxc-pve: 5.0.2-2 lxcfs: 5.0.3-pve1 novnc-pve: 1.4.0-1 proxmox-backup-client: 2.4.1-1 proxmox-backup-file-restore: 2.4.1-1 proxmox-kernel-helper: 7.4-1 proxmox-mail-forward: 0.1.1-1 proxmox-mini-journalreader: 1.3-1 proxmox-widget-toolkit: 3.6.5 pve-cluster: 7.3-3 pve-container: 4.4-3 pve-docs: 7.4-2 pve-edk2-firmware: 3.20230228-2 pve-firewall: 4.3-1 pve-firmware: 3.6-5 pve-ha-manager: 3.6.1 pve-i18n: 2.12-1 pve-qemu-kvm: 7.2.0-8 pve-xtermjs: 4.16.0-1 qemu-server: 7.4-3 smartmontools: 7.2-pve3 spiceterm: 3.2-2 swtpm: 0.8.0~bpo11+3 vncterm: 1.7-1 zfsutils-linux: 2.1.11-pve1
Code:
VFIO checklist
cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-6.2.6-1-pve root=/dev/mapper/pve-root ro intel_iommu=on iommu=pt modprobe.blacklist=mpt3sas net.ifnames=0 biosdevname=0
# dmesg | grep -e DMAR -e IOMMU
[ 0.036408] ACPI: DMAR 0x000000003639B000 000088 (v02 INTEL EDK2 00000002 01000013)
[ 0.036432] ACPI: Reserving DMAR table memory at [mem 0x3639b000-0x3639b087]
[ 0.073674] DMAR: IOMMU enabled
[ 0.139545] DMAR: Host address width 39
[ 0.139546] DMAR: DRHD base: 0x000000fed90000 flags: 0x0
[ 0.139551] DMAR: dmar0: reg_base_addr fed90000 ver 4:0 cap 1c0000c40660462 ecap 29a00f0505e
[ 0.139554] DMAR: DRHD base: 0x000000fed91000 flags: 0x1
[ 0.139559] DMAR: dmar1: reg_base_addr fed91000 ver 1:0 cap d2008c40660462 ecap f050da
[ 0.139562] DMAR: RMRR base: 0x0000003f000000 end: 0x0000004f7fffff
[ 0.139565] DMAR-IR: IOAPIC id 2 under DRHD base 0xfed91000 IOMMU 1
[ 0.139567] DMAR-IR: HPET id 0 under DRHD base 0xfed91000
[ 0.139568] DMAR-IR: Queued invalidation will be enabled to support x2apic and Intr-remapping.
[ 0.141131] DMAR-IR: Enabled IRQ remapping in x2apic mode
[ 0.317923] pci 0000:00:02.0: DMAR: Skip IOMMU disabling for graphics
[ 0.393003] DMAR: No ATSR found
[ 0.393004] DMAR: No SATC found
[ 0.393005] DMAR: IOMMU feature fl1gp_support inconsistent
[ 0.393006] DMAR: IOMMU feature pgsel_inv inconsistent
[ 0.393007] DMAR: IOMMU feature nwfs inconsistent
[ 0.393008] DMAR: IOMMU feature dit inconsistent
[ 0.393009] DMAR: IOMMU feature sc_support inconsistent
[ 0.393010] DMAR: IOMMU feature dev_iotlb_support inconsistent
[ 0.393011] DMAR: dmar0: Using Queued invalidation
[ 0.393014] DMAR: dmar1: Using Queued invalidation
[ 0.393338] DMAR: Intel(R) Virtualization Technology for Directed I/O
cat /etc/modules
# /etc/modules: kernel modules to load at boot time.
#
# This file contains the names of kernel modules that should be loaded
# at boot time, one per line. Lines beginning with "#" are ignored.
vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd
cat /etc/modprobe.d/pve-blacklist.conf
# This file contains a list of modules which are not supported by Proxmox VE
# nidiafb see bugreport https://bugzilla.proxmox.com/show_bug.cgi?id=701
blacklist nvidiafb
#blacklist mpt2sas
blacklist mpt3sas
# bash iommu.sh
IOMMU Group 0:
00:02.0 VGA compatible controller [0300]: Intel Corporation Device [8086:9a60]
IOMMU Group 1:
00:00.0 Host bridge [0600]: Intel Corporation Device [8086:9a36] (rev 04)
IOMMU Group 2:
00:01.0 PCI bridge [0604]: Intel Corporation Device [8086:9a01] (rev 04)
IOMMU Group 3:
00:04.0 Signal processing controller [1180]: Intel Corporation Device [8086:9a03] (rev 04)
IOMMU Group 4:
00:08.0 System peripheral [0880]: Intel Corporation Device [8086:9a11] (rev 04)
IOMMU Group 5:
00:0a.0 Signal processing controller [1180]: Intel Corporation Device [8086:9a0d] (rev 01)
IOMMU Group 6:
00:0d.0 USB controller [0c03]: Intel Corporation Tiger Lake-H Thunderbolt 4 USB Controller [8086:9a17] (rev 04)
IOMMU Group 7:
00:14.0 USB controller [0c03]: Intel Corporation Device [8086:43ed] (rev 11)
00:14.2 RAM memory [0500]: Intel Corporation Device [8086:43ef] (rev 11)
IOMMU Group 8:
00:15.0 Serial bus controller [0c80]: Intel Corporation Device [8086:43e8] (rev 11)
00:15.1 Serial bus controller [0c80]: Intel Corporation Device [8086:43e9] (rev 11)
00:15.2 Serial bus controller [0c80]: Intel Corporation Device [8086:43ea] (rev 11)
00:15.3 Serial bus controller [0c80]: Intel Corporation Device [8086:43eb] (rev 11)
IOMMU Group 9:
00:16.0 Communication controller [0780]: Intel Corporation Device [8086:43e0] (rev 11)
IOMMU Group 10:
00:17.0 SATA controller [0106]: Intel Corporation Device [8086:43d3] (rev 11)
IOMMU Group 11:
00:19.0 Serial bus controller [0c80]: Intel Corporation Device [8086:43ad] (rev 11)
00:19.1 Serial bus controller [0c80]: Intel Corporation Device [8086:43ae] (rev 11)
IOMMU Group 12:
00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:43bc] (rev 11)
IOMMU Group 13:
00:1d.0 PCI bridge [0604]: Intel Corporation Device [8086:43b2] (rev 11)
IOMMU Group 14:
00:1d.3 PCI bridge [0604]: Intel Corporation Device [8086:43b3] (rev 11)
IOMMU Group 15:
00:1e.0 Communication controller [0780]: Intel Corporation Device [8086:43a8] (rev 11)
00:1e.3 Serial bus controller [0c80]: Intel Corporation Device [8086:43ab] (rev 11)
IOMMU Group 16:
00:1f.0 ISA bridge [0601]: Intel Corporation Device [8086:438b] (rev 11)
00:1f.3 Audio device [0403]: Intel Corporation Device [8086:43c8] (rev 11)
00:1f.4 SMBus [0c05]: Intel Corporation Device [8086:43a3] (rev 11)
00:1f.5 Serial bus controller [0c80]: Intel Corporation Device [8086:43a4] (rev 11)
IOMMU Group 17:
01:00.0 Serial Attached SCSI controller [0107]: Broadcom / LSI SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] [1000:0072] (rev 03)
IOMMU Group 18:
02:00.0 Non-Volatile memory controller [0108]: SK hynix BC501 NVMe Solid State Drive 512GB [1c5c:1327]
IOMMU Group 19:
03:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [10ec:8168] (rev 15)
IOMMU Group 20:
04:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8125 2.5GbE Controller [10ec:8125] (rev 05)
Going manual method gives out the same out put
Code:
root@nas:~# cat /sys/bus/pci/devices/0000:01:00.0/driver_override
(null)
root@nas:~# echo "vfio-pci" > /sys/bus/pci/devices/0000:01:00.0/driver_override
root@nas:~# cat /sys/bus/pci/devices/0000:01:00.0/driver_override
vfio-pci
root@nas:~# modprobe -r vfio-pci
root@nas:~# cat /sys/bus/pci/devices/0000:01:00.0/driver_override
vfio-pci
root@nas:~# lspci -nnk
01:00.0 Serial Attached SCSI controller [0107]: Broadcom / LSI SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] [1000:0072] (rev 03)
Subsystem: Broadcom / LSI SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] [1000:0072]
Kernel modules: mpt3sas
root@nas:~# modprobe -i --first-time vfio-pci
root@nas:~# modprobe -i --first-time vfio-pci
modprobe: ERROR: could not insert 'vfio_pci': Module already in kernel
root@nas:~# client_loop: send disconnect: Broken pipe
This output was taken w/o blacklisting "mpt2sas or mpt3sas " as i understood that it is automated & it "should work unless there is a need for manual work", pve is trying to deattach the device from that loaded drive to pass it to vm .. which results in a lock-up!!
Code:
# dmesg -w (ssh)
[ 47.981207] sd 0:0:0:0: [sda] Synchronizing SCSI cache
[ 47.981226] sd 0:0:0:0: [sda] Synchronize Cache(10) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[ 48.109253] sd 0:0:1:0: [sdb] Synchronizing SCSI cache
[ 48.109272] sd 0:0:1:0: [sdb] Synchronize Cache(10) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[ 48.177267] sd 0:0:2:0: [sdc] Synchronizing SCSI cache
[ 48.177288] sd 0:0:2:0: [sdc] Synchronize Cache(10) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[ 48.245242] sd 0:0:3:0: [sdd] Synchronizing SCSI cache
[ 48.245261] sd 0:0:3:0: [sdd] Synchronize Cache(10) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[ 48.245910] mpt2sas_cm0: mpt3sas_transport_port_remove: removed: sas_addr(0x4433221100000000)
[ 48.245914] mpt2sas_cm0: removing handle(0x0009), sas_addr(0x4433221100000000)
[ 48.245916] mpt2sas_cm0: enclosure logical id(0x500605b00973f5f0), slot(3)
[ 48.245917] mpt2sas_cm0: mpt3sas_transport_port_remove: removed: sas_addr(0x4433221103000000)
[ 48.245918] mpt2sas_cm0: removing handle(0x000a), sas_addr(0x4433221103000000)
[ 48.245920] mpt2sas_cm0: enclosure logical id(0x500605b00973f5f0), slot(0)
[ 48.245921] mpt2sas_cm0: mpt3sas_transport_port_remove: removed: sas_addr(0x4433221101000000)
[ 48.245922] mpt2sas_cm0: removing handle(0x000b), sas_addr(0x4433221101000000)
[ 48.245923] mpt2sas_cm0: enclosure logical id(0x500605b00973f5f0), slot(2)
[ 48.245924] mpt2sas_cm0: mpt3sas_transport_port_remove: removed: sas_addr(0x4433221102000000)
[ 48.245925] mpt2sas_cm0: removing handle(0x000c), sas_addr(0x4433221102000000)
[ 48.245926] mpt2sas_cm0: enclosure logical id(0x500605b00973f5f0), slot(1)
[ 48.246025] mpt2sas_cm0: sending message unit reset !!
[ 48.247671] mpt2sas_cm0: message unit reset: SUCCESS