Intel ARC Passthrough to linux VM works, Sata HBA passthrough to Truenas Works, running both at the same time crashes host

BenDrankin

Member
Mar 23, 2023
5
0
6
I have been attempting to do hardware passthrough where I pass my Intel ARC GPU through to a Ubuntu Server VM, and two Sata HBAs to a separate Truenas Scale VM. One or the other works fine, but if I attempt to start both VMs with the passthroughs enabled the whole host locks up. This also happens even if I select just one of either SATA HBAs so it does not appear to be specific to the HBA Model.

The GPU is in PCI_E1 which is the CPU driven PCIE Slot. The HBAs are in one pcie X1 slot and one m.2 slot (which should be chipset).

This is an install I upgraded from 8 to 9, and have been tempted to rollback to 8 to see if that helps things but I know 8 is EOL soon. The GPU passthrough worked on 8 but I did not attempt the HBA passthrough until 9.

All three devices (including both ARC device IDs) are using VFIO, everything is in their own IOMMU group, the passthroughs have no issue working individually.

the setup:
MSI - Pro Z690-A Motherboard
Intel i5 12400F
48 GB DDR4 memory
Intel Arc A380 GPU
512 GB NVME drive for host install
Sata HBA1 - Marvell 88SE9215 PCIe 2.0 x1 4-port SATA
Sata HBA2 - ASMedia Technology Inc. ASM1166 Serial ATA Controller

Proxmox Version 9.2.3
Linux Kernel 7.0.6-2
 
Hi, can you share the journal and the vm config?
 
Can you share the IOMMU groups as well?
Bash:
{
shopt -s nullglob
for g in $(find /sys/kernel/iommu_groups/* -maxdepth 0 -type d | sort -V); do
    echo "IOMMU Group ${g##*/}:"
    for d in $g/devices/*; do
        echo -e "\t$(lspci -nns ${d##*/})"
    done;
done;
}
Code blocks like these are generally preferred over attachments.
 
Understood. Here's the groups


Code:
 shopt -s nullglob
for g in $(find /sys/kernel/iommu_groups/* -maxdepth 0 -type d | sort -V); do
    echo "IOMMU Group ${g##*/}:"
    for d in $g/devices/*; do
        echo -e "\t$(lspci -nns ${d##*/})"
    done;
done;
IOMMU Group 0:
        00:00.0 Host bridge [0600]: Intel Corporation 12th Gen Core Processor Host Bridge [8086:4650] (rev 02)
IOMMU Group 1:
        00:01.0 PCI bridge [0604]: Intel Corporation 12th Gen Core Processor PCI Express x16 Controller #1 [8086:460d] (rev 02)
IOMMU Group 2:
        00:06.0 PCI bridge [0604]: Intel Corporation 12th Gen Core Processor PCI Express x4 Controller #0 [8086:464d] (rev 02)
IOMMU Group 3:
        00:08.0 System peripheral [0880]: Intel Corporation 12th Gen Core Processor Gaussian & Neural Accelerator [8086:464f] (rev 02)
IOMMU Group 4:
        00:14.0 USB controller [0c03]: Intel Corporation Alder Lake-S PCH USB 3.2 Gen 2x2 XHCI Controller [8086:7ae0] (rev 11)
        00:14.2 RAM memory [0500]: Intel Corporation Alder Lake-S PCH Shared SRAM [8086:7aa7] (rev 11)
IOMMU Group 5:
        00:14.3 Network controller [0280]: Intel Corporation Alder Lake-S PCH CNVi WiFi [8086:7af0] (rev 11)
IOMMU Group 6:
        00:16.0 Communication controller [0780]: Intel Corporation Alder Lake-S
PCH HECI Controller #1 [8086:7ae8] (rev 11)
IOMMU Group 7:
        00:17.0 SATA controller [0106]: Intel Corporation Alder Lake-S PCH SATA
Controller [AHCI Mode] [8086:7ae2] (rev 11)
IOMMU Group 8:
        00:1b.0 PCI bridge [0604]: Intel Corporation Device [8086:7ac0] (rev 11)IOMMU Group 9:
        00:1b.4 PCI bridge [0604]: Intel Corporation Alder Lake-S PCH PCI Express Root Port #21 [8086:7ac4] (rev 11)
IOMMU Group 10:
        00:1c.0 PCI bridge [0604]: Intel Corporation Alder Lake-S PCH PCI Express Root Port #1 [8086:7ab8] (rev 11)
IOMMU Group 11:
        00:1c.1 PCI bridge [0604]: Intel Corporation Alder Lake-S PCH PCI Express Root Port #2 [8086:7ab9] (rev 11)
IOMMU Group 12:
        00:1c.2 PCI bridge [0604]: Intel Corporation Alder Lake-S PCH PCI Express Root Port #3 [8086:7aba] (rev 11)
IOMMU Group 13:
        00:1c.4 PCI bridge [0604]: Intel Corporation Alder Lake-S PCH PCI Express Root Port #5 [8086:7abc] (rev 11)
IOMMU Group 14:
        00:1f.0 ISA bridge [0601]: Intel Corporation Z690 Chipset LPC/eSPI Controller [8086:7a84] (rev 11)
        00:1f.3 Audio device [0403]: Intel Corporation Alder Lake-S HD Audio Controller [8086:7ad0] (rev 11)
        00:1f.4 SMBus [0c05]: Intel Corporation Alder Lake-S PCH SMBus Controller [8086:7aa3] (rev 11)
        00:1f.5 Serial bus controller [0c80]: Intel Corporation Alder Lake-S PCH SPI Controller [8086:7aa4] (rev 11)
IOMMU Group 15:
        01:00.0 PCI bridge [0604]: Intel Corporation Device [8086:4fa1] (rev 01)IOMMU Group 16:
        02:01.0 PCI bridge [0604]: Intel Corporation Device [8086:4fa4]
IOMMU Group 17:
        02:04.0 PCI bridge [0604]: Intel Corporation Device [8086:4fa4]
IOMMU Group 18:
        03:00.0 VGA compatible controller [0300]: Intel Corporation DG2 [Arc A380] [8086:56a5] (rev 05)
IOMMU Group 19:
        04:00.0 Audio device [0403]: Intel Corporation DG2 Audio Controller [8086:4f92]
IOMMU Group 20:
        05:00.0 Non-Volatile memory controller [0108]: Micron/Crucial Technology P2 [Nick P2] / P3 / P3 Plus NVMe PCIe SSD (DRAM-less) [c0a9:540a] (rev 01)
IOMMU Group 21:
        07:00.0 SATA controller [0106]: ASMedia Technology Inc. ASM1166 Serial ATA Controller [1b21:1166] (rev 02)
IOMMU Group 22:
        09:00.0 Ethernet controller [0200]: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection [8086:10fb] (rev 01)
IOMMU Group 23:
        09:00.1 Ethernet controller [0200]: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection [8086:10fb] (rev 01)
IOMMU Group 24:
        0a:00.0 Ethernet controller [0200]: Intel Corporation Ethernet Controller I225-V [8086:15f3] (rev 03)
IOMMU Group 25:
        0b:00.0 SATA controller [0106]: Marvell Technology Group Ltd. 88SE9215 PCIe 2.0 x1 4-port SATA 6 Gb/s Controller [1b4b:9215] (rev 11)
 
Please show the groups without using pcie_acs_override=downstream,multifunction. Pretending (which is what this override does) that devices are in separate groups does not make them independent.
 
Last edited:
Ok I removed that line, ran update-grub, and rebooted. Here's the updated grub file and IOMMU group output:

Code:
cat /etc/default/grub
# If you change this file or any /etc/default/grub.d/*.cfg file,
# run 'update-grub' afterwards to update /boot/grub/grub.cfg.
# For full documentation of the options in these files, see:
#   info -f grub -n 'Simple configuration'

GRUB_DEFAULT=0
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR=`( . /etc/os-release && echo ${NAME} )`
GRUB_CMDLINE_LINUX_DEFAULT="amd_iommu=on iommu=pt video=efifb:off video=vesafb:off"
GRUB_CMDLINE_LINUX=""

# If your computer has multiple operating systems installed, then you
# probably want to run os-prober. However, if your computer is a host
# for guest OSes installed via LVM or raw disk devices, running
# os-prober can cause damage to those guest OSes as it mounts
# filesystems to look for things.
#GRUB_DISABLE_OS_PROBER=false

# Uncomment to enable BadRAM filtering, modify to suit your needs
# This works with Linux (no patch required) and with any kernel that obtains
# the memory map information from GRUB (GNU Mach, kernel of FreeBSD ...)
#GRUB_BADRAM="0x01234567,0xfefefefe,0x89abcdef,0xefefefef"

# Uncomment to disable graphical terminal
#GRUB_TERMINAL=console

# The resolution used on graphical terminal
# note that you can use only modes which your graphic card supports via VBE/GOP/UGA
# you can see them in real GRUB with the command `videoinfo'
#GRUB_GFXMODE=640x480

# Uncomment if you don't want GRUB to pass "root=UUID=xxx" parameter to Linux
#GRUB_DISABLE_LINUX_UUID=true

# Uncomment to disable generation of recovery mode menu entries
#GRUB_DISABLE_RECOVERY="true"

# Uncomment to get a beep at grub start
#GRUB_INIT_TUNE="480 440 1"
{
shopt -s nullglob
for g in $(find /sys/kernel/iommu_groups/* -maxdepth 0 -type d | sort -V); do
    echo "IOMMU Group ${g##*/}:"
    for d in $g/devices/*; do
        echo -e "\t$(lspci -nns ${d##*/})"
    done;
done;
}
IOMMU Group 0:
        00:00.0 Host bridge [0600]: Intel Corporation 12th Gen Core Processor Host Bridge [8086:4650] (rev 02)
IOMMU Group 1:
        00:01.0 PCI bridge [0604]: Intel Corporation 12th Gen Core Processor PCI Express x16 Controller #1 [8086:460d] (rev 02)
IOMMU Group 2:
        00:06.0 PCI bridge [0604]: Intel Corporation 12th Gen Core Processor PCI Express x4 Controller #0 [8086:464d] (rev 02)
IOMMU Group 3:
        00:08.0 System peripheral [0880]: Intel Corporation 12th Gen Core Processor Gaussian & Neural Accelerator [8086:464f] (rev 02)
IOMMU Group 4:
        00:14.0 USB controller [0c03]: Intel Corporation Alder Lake-S PCH USB 3.2 Gen 2x2 XHCI Controller [8086:7ae0] (rev 11)
        00:14.2 RAM memory [0500]: Intel Corporation Alder Lake-S PCH Shared SRAM [8086:7aa7] (rev 11)
IOMMU Group 5:
        00:14.3 Network controller [0280]: Intel Corporation Alder Lake-S PCH CNVi WiFi [8086:7af0] (rev 11)
IOMMU Group 6:
        00:16.0 Communication controller [0780]: Intel Corporation Alder Lake-S PCH HECI Controller #1 [8086:7ae8] (rev 11)
IOMMU Group 7:
        00:17.0 SATA controller [0106]: Intel Corporation Alder Lake-S PCH SATA Controller [AHCI Mode] [8086:7ae2] (rev 11)
IOMMU Group 8:
        00:1b.0 PCI bridge [0604]: Intel Corporation Device [8086:7ac0] (rev 11)
IOMMU Group 9:
        00:1b.4 PCI bridge [0604]: Intel Corporation Alder Lake-S PCH PCI Express Root Port #21 [8086:7ac4] (rev 11)
IOMMU Group 10:
        00:1c.0 PCI bridge [0604]: Intel Corporation Alder Lake-S PCH PCI Express Root Port #1 [8086:7ab8] (rev 11)
IOMMU Group 11:
        00:1c.1 PCI bridge [0604]: Intel Corporation Alder Lake-S PCH PCI Express Root Port #2 [8086:7ab9] (rev 11)
IOMMU Group 12:
        00:1c.2 PCI bridge [0604]: Intel Corporation Alder Lake-S PCH PCI Express Root Port #3 [8086:7aba] (rev 11)
IOMMU Group 13:
        00:1c.4 PCI bridge [0604]: Intel Corporation Alder Lake-S PCH PCI Express Root Port #5 [8086:7abc] (rev 11)
IOMMU Group 14:
        00:1f.0 ISA bridge [0601]: Intel Corporation Z690 Chipset LPC/eSPI Controller [8086:7a84] (rev 11)
        00:1f.3 Audio device [0403]: Intel Corporation Alder Lake-S HD Audio Controller [8086:7ad0] (rev 11)
        00:1f.4 SMBus [0c05]: Intel Corporation Alder Lake-S PCH SMBus Controller [8086:7aa3] (rev 11)
        00:1f.5 Serial bus controller [0c80]: Intel Corporation Alder Lake-S PCH SPI Controller [8086:7aa4] (rev 11)
IOMMU Group 15:
        01:00.0 PCI bridge [0604]: Intel Corporation Device [8086:4fa1] (rev 01)
IOMMU Group 16:
        02:01.0 PCI bridge [0604]: Intel Corporation Device [8086:4fa4]
IOMMU Group 17:
        02:04.0 PCI bridge [0604]: Intel Corporation Device [8086:4fa4]
IOMMU Group 18:
        03:00.0 VGA compatible controller [0300]: Intel Corporation DG2 [Arc A380] [8086:56a5] (rev 05)
IOMMU Group 19:
        04:00.0 Audio device [0403]: Intel Corporation DG2 Audio Controller [8086:4f92]
IOMMU Group 20:
        05:00.0 Non-Volatile memory controller [0108]: Micron/Crucial Technology P2 [Nick P2] / P3 / P3 Plus NVMe PCIe SSD (DRAM-less) [c0a9:540a] (rev 01)
IOMMU Group 21:
        07:00.0 SATA controller [0106]: ASMedia Technology Inc. ASM1166 Serial ATA Controller [1b21:1166] (rev 02)
IOMMU Group 22:
        09:00.0 Ethernet controller [0200]: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection [8086:10fb] (rev 01)
IOMMU Group 23:
        09:00.1 Ethernet controller [0200]: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection [8086:10fb] (rev 01)
IOMMU Group 24:
        0a:00.0 Ethernet controller [0200]: Intel Corporation Ethernet Controller I225-V [8086:15f3] (rev 03)
IOMMU Group 25:
        0b:00.0 SATA controller [0106]: Marvell Technology Group Ltd. 88SE9215 PCIe 2.0 x1 4-port SATA 6 Gb/s Controller [1b4b:9215] (rev 11)
 
The IOMMU groups look the same, so my hypothesis is entirely wrong. But they look suspiciously exactly the same (except for some line breaks), so why would you use the pcie_acs_override at all? Can you please show the output of cat /proc/cmdline, as it is when you produce(d) the groups list, to make sure?

PS: amd_iommu=on and video=efifb:off and video=vesafb:off do nothing (on Proxmox).
 
Last edited:
The ACS override line was something I found on reddit/in the forums that a lot of people mentioned fixing their issue. I forgot to remove it after it ultimately did not fix the problem. Thanks for the tip on the other lines that do nothing on proxmox. Those will be removed. Here is the output you requested.

Code:
cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-7.0.6-2-pve root=/dev/mapper/pve-root ro amd_iommu=on iommu=pt video=efifb:off video=vesafb:off