[SOLVED] Passthrough of onboard SATA controller locks up system

Aluveitie

Member
Sep 21, 2022
24
5
8
I am trying to pass through the 2 onboard SATA controllers to VMs. Proxmox itself runs on an NVMe SSD and does not use those controllers.
Both controllers have their own IOMMU group
Code:
IOMMU Group 28 83:00.0 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] [1022:7901] (rev 51)
IOMMU Group 29 84:00.0 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] [1022:7901] (rev 51)

What I did so far
Code:
root@server:~# more /etc/modules
amd_iommu_v

vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd
Code:
root@server:~# more /etc/modprobe.d/vfio.conf
options vfio-pci ids=1022:7901
softdep ahci pre: vfio-pci
Code:
root@server:~# more /etc/modprobe.d/blacklist.conf
blacklist ahci

Which results in
Code:
root@server:~# lspci -k -s 83:00
83:00.0 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] (rev 51)
        Subsystem: Gigabyte Technology Co., Ltd FCH SATA Controller [AHCI mode]
        Kernel driver in use: vfio-pci
        Kernel modules: ahci
root@server:~# lspci -k -s 84:00
84:00.0 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] (rev 51)
        Subsystem: Gigabyte Technology Co., Ltd FCH SATA Controller [AHCI mode]
        Kernel driver in use: vfio-pci
        Kernel modules: ahci

I've one SATA disk currently attached, and Proxmox does not list it anymore under Disks.

The VM:
Code:
root@server:~# more /etc/pve/qemu-server/100.conf
bios: ovmf
boot: order=scsi0
cores: 8
cpu: EPYC-Rome,flags=+aes
efidisk0: local-lvm:vm-100-disk-0,efitype=4m,size=4M
hostpci0: 0000:83:00.0,rombar=0
hostpci1: 0000:84:00.0,rombar=0
machine: q35
memory: 12288
meta: creation-qemu=6.2.0,ctime=1662755808
name: truenas
net0: virtio=5A:A2:90:3C:C7:0B,bridge=vmbr0
net1: virtio=7A:B4:AD:7C:C8:F7,bridge=vmbr1,firewall=1
numa: 0
ostype: l26
scsi0: local-lvm:vm-100-disk-1,size=32G
scsi1: local-lvm:vm-100-disk-2,size=4G
scsi2: local-lvm:vm-100-disk-3,size=12G
scsi21: local-lvm:vm-100-disk-4,size=6G
scsi22: local-lvm:vm-100-disk-5,size=6G
scsi23: local-lvm:vm-100-disk-6,size=6G
scsi24: local-lvm:vm-100-disk-9,size=6G
scsi25: local-lvm:vm-100-disk-7,size=4G
scsihw: virtio-scsi-pci
smbios1: uuid=4585d1cc-0603-496b-9e7e-803418c40743
sockets: 1

Starting the VM is hanging and I can find those logs:
Code:
[  651.761767] vfio-pci 0000:83:00.0: not ready 1023ms after FLR; waiting
[  653.809887] vfio-pci 0000:83:00.0: not ready 2047ms after FLR; waiting
[  657.105863] vfio-pci 0000:83:00.0: not ready 4095ms after FLR; waiting

Soon later Proxmox itself becomes unresponsive and I have to hard reset the server...
Code:
Message from syslogd@server at Sep 22 14:52:18 ...
 kernel:[  752.435321] watchdog: BUG: soft lockup - CPU#3 stuck for 26s! [task UPID:serve:3913]
 
Last edited:
Check whether the controller is alone in its IOMMU group. Devices in the same group cannot be shared between VMs or the VM and the Proxmox host. Maybe the Proxmox host loses its drive controller and cannot write its logs anyymore and maybe also its network device.
This is a common problem for Ryzen motherboards (except X570). What is the make and model of your motherboard?

EDIT: What is the output of for d in /sys/kernel/iommu_groups/*/devices/*; do n=${d#*/iommu_groups/*}; n=${n%%/*}; printf 'IOMMU group %s ' "$n"; lspci -nns "${d##*/}"; done?

EDIT2: I'm stupid and did not read the first couple of lines of the first post, sorry!
 
Last edited:
  • Like
Reactions: Pifouney
@leesteken They have their own group.
Code:
root@server:~# for d in /sys/kernel/iommu_groups/*/devices/*; do n=${d#*/iommu_groups/*}; n=${n%%/*}; printf 'IOMMU group %s ' "$n"; lspci -nns "${d##*/}"; done | grep -e "28\|29"
IOMMU group 28 83:00.0 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] [1022:7901] (rev 51)
IOMMU group 29 84:00.0 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] [1022:7901] (rev 51)

Maybe some context: it is an Epyc 7232P on a Gigabyte MZ31-AR0 rev 2.0.
 
Last edited:
Starting the VM is hanging and I can find those logs:
Code:
[  651.761767] vfio-pci 0000:83:00.0: not ready 1023ms after FLR; waiting
[  653.809887] vfio-pci 0000:83:00.0: not ready 2047ms after FLR; waiting
[  657.105863] vfio-pci 0000:83:00.0: not ready 4095ms after FLR; waiting
Looks like the SATA controller is not resetting properly.

The work-around to use it once (stopping the VM and starting it again will run into this always) for a VM is to early bind the numeric ([brnd:mdln]) ID (lspci -nns 83:00.0). Unfortunately, there are probably other SATA controllers with the same ID that you don't want to prevent the Proxmox host from touching.

Since kernel 5.15, it is possible to choose the reset-mechanism for each PCIe device. What is the output of cat '/sys/bus/pci/devices/000:83:00.0/reset_method'? Maybe it has more than one reset-methd. You can try writing one of the other possible values to reset_method and see if that helps (does not persist after reboot).

Or figure out how to reset that particular piece of hardware and write a quirk for the driver in the linux kernel. Or buy a different SATA controller PCIe card that is known to work with passthrough (on this forum or another).

EDIT: You appear not be alone with this issue.
 
Last edited:
@leesteken I have those 2 SATA controllers onboard, both of which I plan to pass through (to different VM if possible). I don't need any SATA controller for Proxmox.

Passing through the controller would make things easier, but I would just pass through single disks if necessary.

Here is the output:
Code:
root@server:~# cat '/sys/bus/pci/devices/0000:83:00.0/reset_method'
flr bus
 
@leesteken I have those 2 SATA controllers onboard, both of which I plan to pass through (to different VM if possible). I don't need any SATA controller for Proxmox.
Still it would not allow for a restart of a VM.
Here is the output:
Code:
root@server:~# cat '/sys/bus/pci/devices/0000:83:00.0/reset_method'
flr bus
Maybe doing an echo bus >'/sys/bus/pci/devices/0000:83:00.0/reset_method' before starting the VM would fix this for you?
 
Still it would not allow for a restart of a VM.
Since it is for my personal use I could live with that, I'd just like the Proxmox booting then starting the VM automatically.
Maybe doing an echo bus >'/sys/bus/pci/devices/0000:83:00.0/reset_method' before starting the VM would fix this for you?
Genius, that did work!!
What would be the best way to apply that across reboots of Proxmox?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!