[SOLVED] PVE crashes with any passthrough

dany-

New Member
Sep 6, 2024
4
0
1
I am trying this ASROCK X470D4U + CPU 3700X combo and to make a proxmox based hyperconverged server.

As the title said, whenever I enable sata ahci or nvme controllers in PVE (both amd and asmedia), the machine would crash. i.e. symptom:
  • immediately blank terminal screen in live tty session
  • ssh link may still be live, but any command would result in input/output errors
However passing through graphics card, nic and usb pcie devices totally works. I've tried everything mentioned on https://pve.proxmox.com/wiki/PCI(e)_Passthrough but still no luck.

Some screenshots to confirm iommu is enabled, virtual function io is loaded

root:~# dmesg | grep -i vfio
[ 6.981863] VFIO - User Level meta-driver version: 0.3
# dmesg | # dmesg | grep -e DMAR -e IOMMU -e AMD-Vi
[ 0.000000] Warning: PCIe ACS overrides enabled; This may allow non-IOMMU protected peer-to-peer DMA
[ 0.143803] AMD-Vi: Using global IVHD EFR:0x58f77ef22294ade, EFR2:0x0
[ 1.412512] pci 0000:00:00.2: AMD-Vi: IOMMU performance counters supported
[ 1.416660] AMD-Vi: Extended features (0x58f77ef22294ade, 0x0): PPR X2APIC NX GT IA GA PC GA_vAPIC
[ 1.416671] AMD-Vi: Interrupt remapping enabled
[ 1.416672] AMD-Vi: X2APIC enabled
[ 1.416862] AMD-Vi: Virtual APIC enabled
[ 1.420366] perf/amd_iommu: Detected AMD IOMMU #0 (2 banks, 4 counters/bank).

This is to confirm ahci modules are not loaded

root:~# lspci -nnk | egrep -i "ahci|sata"
03:00.1 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset SATA Controller [1022:43c8] (rev 01)
Subsystem: ASMedia Technology Inc. 400 Series Chipset SATA Controller [1b21:1062]
Kernel modules: ahci
25:00.0 SATA controller [0106]: ASMedia Technology Inc. ASM1062 Serial ATA Controller [1b21:0612] (rev 02)
Kernel modules: ahci

my modprobe configuration

root:~#cat /etc/modprobe.d/*
options vfio_iommu_type1 allow_unsafe_interrupts=1
options vfio-pci ids=03:00.1,25:00.0
blacklist ahci
 
Last edited:
Unsure if this is pve's problem or the mobo's problem, I tend to believe it's latter as other devices passthrough fine. So I got myself an HBA and skipped the mobo's storage controllers. And bravo everything works well so far. So I'd say the problem is not resolved directly yet, but will mark it as resolved so others can pick up something.

I'm still unsure if there is any untouched corner in pve or bios configuration that will magically fix this problem. But I'll just go with HBA and one less PCIe slot for now.
 
A X470 motherboard can only passthrough two PCIe slots and one M.2 (all connected to the CPU). All other PCIe and M.2 slots and all onboard devices are in one big "chipset" IOMMU group and cannot be passed through without the Proxmox host losing all devices in the group (and causing Proxmox to crash).
Only X570(S) motherboards have everything in separate groups. More information about iOMMU groups: https://pve.proxmox.com/wiki/PCI_Passthrough#Verify_IOMMU_isolation
 
  • Like
Reactions: dany-
@leesteken you're absolutely right! There is no way PVE can survive moving so many critical devices out of the IOMMU group. I neglected this important check... wish I had your help earlier. All the storage devices are in this IOMMU group 18. It seems I should switch to X570.


0x010601
0x43c80000:03:00.1
18​
0x1022400 Series Chipset SATA Controller
0x0106010x06120000:25:00.0
18​
0x1b21ASM1062 Serial ATA Controller
0x0108020x50300000:26:00.0
18​
0x15b7Western Digital WD Black NVMe SSD (my boot device)
0x0108020x50060000:2a:00.0
18​
0x15b7WD NVMe SSD
0x0200000x15330000:23:00.0
18​
0x8086I210 Gigabit Network Connection
0x0200000x15330000:24:00.0
18​
0x8086I210 Gigabit Network Connection
0x0300000x20000000:22:00.0
18​
0x1a03ASPEED Graphics Family
0x0604000x43c60000:03:00.2
18​
0x1022400 Series Chipset PCIe Bridge
0x0604000x43c70000:20:00.0
18​
0x1022400 Series Chipset PCIe Port (many)
0x0604000x43c70000:20:01.0
18​
0x1022400 Series Chipset PCIe Port
0x0604000x11500000:21:00.0
18​
0x1a03AST1150 PCI-to-PCI Bridge
0x0c03300x43d00000:03:00.0
18​
0x1022...

However in case you have any clue, what are those two SATA controllers? They're in different IOMMU groups, but attaching SATA drive to any of them showed nothing in guest.

0x0106010x79010000:30:00.0
26​
0x1022FCH SATA Controller [AHCI mode]
0x0106010x79010000:31:00.0
27​
0x1022FCH SATA Controller [AHCI mode]

this is the board's schematic drawing.
1725834542652.png
and the manual https://download.asrock.com/Manual/X470D4U.pdf
 
@leesteken you're absolutely right! There is no way PVE can survive moving so many critical devices out of the IOMMU group. I neglected this important check... wish I had your help earlier. All the storage devices are in this IOMMU group 18. It seems I should switch to X570.
All of this has been discussed on this forum many times over, back in the day when AM4 was the latest tech.
However in case you have any clue, what are those two SATA controllers? They're in different IOMMU groups, but attaching SATA drive to any of them showed nothing in guest.

0x0106010x79010000:30:00.0
26​
0x1022FCH SATA Controller [AHCI mode]
0x0106010x79010000:31:00.0
27​
0x1022FCH SATA Controller [AHCI mode]
They are probably part of a M.2 slot, which can provide either PCIe or SATA and USB: https://en.wikipedia.org/wiki/M.2
I use a M.2 to SATA-port converter to connect an optical drive to such a SATA controller, which I can then passthrough to a VM.
this is the board's schematic drawing.
View attachment 74411
and the manual https://download.asrock.com/Manual/X470D4U.pdf
All X470 and X370 are the same w.r.t. IOMMU groups. As are B450, B550, etc. Your motherboard has no M.2 from the CPU but a x4 PCIe slot instead.
 
  • Like
Reactions: dany-
All of this has been discussed on this forum many times over, back in the day when AM4 was the latest tech.

They are probably part of a M.2 slot, which can provide either PCIe or SATA and USB: https://en.wikipedia.org/wiki/M.2
I use a M.2 to SATA-port converter to connect an optical drive to such a SATA controller, which I can then passthrough to a VM.

All X470 and X370 are the same w.r.t. IOMMU groups. As are B450, B550, etc. Your motherboard has no M.2 from the CPU but a x4 PCIe slot instead.
Many thanks for the helpful info.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!