PCI Passthrough SATA controller turns into softlock

D3D_Tray

Member
May 13, 2021
2
0
6
28
Hello Specialist,

I'm struggling to pass through my SATA Controller with my HDDs to OMV (OS is installed on a M.2 drive, separated PCI) and I'm kinda confused, since it all worked fine before with XCP-ng and I really don't want to switch back since proxmox is much faster and easier to maintenance (compared to used resources) as XCP (in my case!)

My "Server":
CPU: AMD Ryzen 5 Pro 4650G with iGPU
RAM: 2x16GB
M.2 HDD0: Micron 1100 256GB, (MTFDDAV256TBN-1AR1ZABYY)
bay SATA HDD1: Seagate Barracuda 500 GB (ST3500418AS)
usb HDD1: HGST generic 1TB Drive
usb HDD2: HGST generic 1TB Drive
Motherboard: Gigabyte A520M S2H (BIOS F13h)


Basically I followed the internal documentation, I added "amd_iommu=on" to grub and loaded the vfio modules;

Confirmation here:
Code:
root@pve:~# lsmod | grep vfio
vfio_pci               53248  0
vfio_virqfd            16384  1 vfio_pci
irqbypass              16384  2 vfio_pci,kvm
vfio_iommu_type1       32768  0
vfio                   32768  2 vfio_iommu_type1,vfio_pci

root@pve:~# dmesg |grep -e DMAR -e IOMMU -e AMD-Vi
[    1.062137] pci 0000:00:00.2: AMD-Vi: Unable to read/write to IOMMU perf counter.
[    1.064408] pci 0000:00:00.2: AMD-Vi: Found IOMMU cap 0x40
[    1.064409] pci 0000:00:00.2: AMD-Vi: Extended features (0x206d73ef22254ade):
[    1.064412] AMD-Vi: Interrupt remapping enabled
[    1.064412] AMD-Vi: Virtual APIC enabled
[    1.064412] AMD-Vi: X2APIC enabled
[    1.064570] AMD-Vi: Lazy IO/TLB flushing enabled
[    7.304927] AMD-Vi: AMD IOMMUv2 driver by Joerg Roedel <jroedel@suse.de>

The internet says "Unable to read/write to IOMMU perf counter." should be okay... (I don't really believe it...)

Anyways, according to the documentation at 10.9.2; I should create a conf-file in /etc/modprobe.d/ (I called mine vfio.conf) and add the device ID in it.

Code:
root@pve:~# lspci -nn
00:00.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Renoir Root Complex [1022:1630]
00:00.2 IOMMU [0806]: Advanced Micro Devices, Inc. [AMD] Renoir IOMMU [1022:1631]
00:01.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge [1022:1632]
00:02.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge [1022:1632]
00:02.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Renoir PCIe GPP Bridge [1022:1634]
00:08.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge [1022:1632]
00:08.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Renoir Internal PCIe GPP Bridge to Bus [1022:1635]
00:08.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Renoir Internal PCIe GPP Bridge to Bus [1022:1635]
00:14.0 SMBus [0c05]: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller [1022:790b] (rev 51)
00:14.3 ISA bridge [0601]: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge [1022:790e] (rev 51)
00:18.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Renoir Device 24: Function 0 [1022:1448]
00:18.1 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Renoir Device 24: Function 1 [1022:1449]
00:18.2 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Renoir Device 24: Function 2 [1022:144a]
00:18.3 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Renoir Device 24: Function 3 [1022:144b]
00:18.4 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Renoir Device 24: Function 4 [1022:144c]
00:18.5 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Renoir Device 24: Function 5 [1022:144d]
00:18.6 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Renoir Device 24: Function 6 [1022:144e]
00:18.7 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Renoir Device 24: Function 7 [1022:144f]
01:00.0 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Device [1022:43ec]
01:00.1 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] Device [1022:43eb]
01:00.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43e9]
02:03.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43ea]
03:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [10ec:8168] (rev 16)
04:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Renoir [1002:1636] (rev d9)
04:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Device [1002:1637]
04:00.2 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 10h-1fh) Platform Security Processor [1022:15df]
04:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Renoir USB 3.1 [1022:1639]
04:00.4 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Renoir USB 3.1 [1022:1639]
04:00.6 Audio device [0403]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 10h-1fh) HD Audio Controller [1022:15e3]
05:00.0 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] [1022:7901] (rev 81)

Confused which SATA controller I should, I traced back my hdd (/dev/sdc because sda and sdb are USB HDDs) with "ls -al /sys/block/sdc"

Code:
root@pve:~# ls -al /sys/block/sdc
lrwxrwxrwx 1 root root 0 May 13 07:40 /sys/block/sdc -> ../devices/pci0000:00/0000:00:02.1/0000:01:00.1/ata1/host0/target0:0:0/0:0:0:0/block/sdc

Therefore my vfio.conf have only one line; options vfio-pci ids=1022:43eb
I recreated/updated what it says, reboot, lspci -nnk

Code:
01:00.1 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] Device [1022:43eb]
        Subsystem: ASMedia Technology Inc. Device [1b21:1062]
        Kernel driver in use: ahci
        Kernel modules: ahci

DUH! Didn't worked! Still in use by AHCI! Not good...

Okay, maybe, just maybe it's still available... just add it as PCI Device to my OMV and try...
Server not available, connection timeout... but server is running, the monitor is still on!
USB Port are offline... so I have to power it down via ACPI... and I can't ping my server anymore so completely softlock (kinda, USB Keyboard don't work but I guess a PS/2 would do it)

I remember de documentation says something about "seperate IOMMU groups"

Code:
root@pve:~# find /sys/kernel/iommu_groups/ -type l | grep "/6/"
/sys/kernel/iommu_groups/6/devices/0000:03:00.0
/sys/kernel/iommu_groups/6/devices/0000:01:00.2
/sys/kernel/iommu_groups/6/devices/0000:02:03.0
/sys/kernel/iommu_groups/6/devices/0000:01:00.0
/sys/kernel/iommu_groups/6/devices/0000:01:00.1

root@pve:~# lspci -nn | grep '03:00.0\|01:00.2\|02:03.0\|01:00.0\|01:00.1'
01:00.0 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Device [1022:43ec]
01:00.1 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] Device [1022:43eb]
01:00.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43e9]
02:03.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43ea]
03:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [10ec:8168] (rev 16)

OPS.... I guess my SATA controller shares the seat with USB and Ethernet... that's bad...


My current workaround:
I added via Web-UI my USB-HDD as USB device to OMV and the regular HDD as regular disk

Code:
root@pve:~# qm set 201 -scsi2 /dev/disk/by-id/ata-ST3500418AS_9VMXF57H
update VM 201: -scsi2 /dev/disk/by-id/ata-ST3500418AS_9VMXF57H

And yes, it works but...
Code:
[   29.384954] blk_update_request: I/O error, dev sdc, sector 2048 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
[   29.385295] sd 8:0:0:0: [sdb] tag#7 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[   29.385302] sd 8:0:0:0: [sdb] tag#7 CDB: Read(10) 28 00 00 00 28 00 00 00 08 00
[   29.385305] blk_update_request: I/O error, dev sdb, sector 10240 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
[   29.385636] blk_update_request: I/O error, dev sdb, sector 10240 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[   29.385891] Buffer I/O error on dev sdb1, logical block 8192, async page read
[   29.386078] blk_update_request: I/O error, dev sdb, sector 10241 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[   29.386329] Buffer I/O error on dev sdb1, logical block 8193, async page read
[   29.386509] blk_update_request: I/O error, dev sdb, sector 10242 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[   29.386760] Buffer I/O error on dev sdb1, logical block 8194, async page read
[   29.386941] blk_update_request: I/O error, dev sdb, sector 10243 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[   29.387184] Buffer I/O error on dev sdb1, logical block 8195, async page read
[   29.387358] blk_update_request: I/O error, dev sdb, sector 10244 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[   29.387602] Buffer I/O error on dev sdb1, logical block 8196, async page read
[   29.387773] Buffer I/O error on dev sdb1, logical block 8197, async page read
[   29.387943] Buffer I/O error on dev sdb1, logical block 8198, async page read
[   29.388076] sd 8:0:0:0: [sdb] Synchronizing SCSI cache
[   29.388104] Buffer I/O error on dev sdb1, logical block 8199, async page read
[   29.457323] sd 8:0:0:1: [sdc] tag#10 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK
[   29.457326] sd 8:0:0:1: [sdc] tag#10 CDB: Read(10) 28 00 00 00 08 00 00 00 01 00
[   29.457329] blk_update_request: I/O error, dev sdc, sector 2048 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[   29.457502] Buffer I/O error on dev sdc1, logical block 0, async page read
[   29.473363] sd 8:0:0:1: [sdc] tag#10 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK
[   29.473366] sd 8:0:0:1: [sdc] tag#10 CDB: Read(10) 28 00 00 00 08 01 00 00 07 00
[   29.473368] blk_update_request: I/O error, dev sdc, sector 2049 op 0x0:(READ) flags 0x0 phys_seg 7 prio class 0
[   29.473541] Buffer I/O error on dev sdc1, logical block 1, async page read

TBH it only affects my USB-HDD but still, it kinda worries me a bit...

I also wanted to pass through my iGPU to my Plex Server for HW encoding but if I can't even pass through my SATA contoller then I guess that IOMMU isn't working properly...

My BIOS is set to "auto" but also tbh, auto and on give the same results...
 
IOMMU appears to be working fine, it's just that the SATA, USB and network controllers are provided by the A520 chipset and cannot be securely separated according to your motherboard BIOS. Have you considered passing all of them to the VM? Have you checked the IOMMU group of the other SATA controller (or is that shared with the M.2)? Especially for AMD, different BIOS version give different IOMMU groups. Sometimes an older version works better than the latest. However, I don't think that will be the case for the A520 chipset. An X570 is much better for PCI passthrough as it is the only chipset that does separate its devices.
I don't think anyone has been successful in passing though an AMD integrated GPU, but you might be able to passthrough a discrete GPU (and use the integrated GPU for the Proxmox host). Or can you maybe run Plex in a container and give the host device to it using bind mounts and device permissions?

Sorry to bring bad news, but your hardware selection happens to be not very suitable for PCI passthrough.
I'm not sure about the sdb and sdc errors. Are those both external USB drives? Are you sure they have no bad sectors?
 
Thank you for your reply ♥

Sadly yes, the other SATA controller is from the M.2 Slot and seperated... and IOMMU group 3 is much bigger in my case


Code:
root@pve:~# ls -al /sys/block/sdd
lrwxrwxrwx 1 root root 0 May 13 08:27 /sys/block/sdd -> ../devices/pci0000:00/0000:00:08.2/0000:05:00.0/ata8/host7/target7:0:0/7:0:0:0/block/sdd
root@pve:~# find /sys/kernel/iommu_groups/ -type l | grep "/3/"
/sys/kernel/iommu_groups/3/devices/0000:00:08.0
/sys/kernel/iommu_groups/3/devices/0000:04:00.3
/sys/kernel/iommu_groups/3/devices/0000:04:00.1
/sys/kernel/iommu_groups/3/devices/0000:00:08.1
/sys/kernel/iommu_groups/3/devices/0000:04:00.6
/sys/kernel/iommu_groups/3/devices/0000:04:00.4
/sys/kernel/iommu_groups/3/devices/0000:05:00.0
/sys/kernel/iommu_groups/3/devices/0000:04:00.2
/sys/kernel/iommu_groups/3/devices/0000:04:00.0
/sys/kernel/iommu_groups/3/devices/0000:00:08.2
root@pve:~# lspci -nn | grep '00:08.0\|04:00.3\|04:00.1\|00:08.1\|04:00.6\|04:00.4\|05:00.0\|04:00.2\|04:00.0\|00:08.2'
+00:08.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge [1022:1632]
00:08.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Renoir Internal PCIe GPP Bridge to Bus [1022:1635]
00:08.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Renoir Internal PCIe GPP Bridge to Bus [1022:1635]
04:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Renoir [1002:1636] (rev d9)
04:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Device [1002:1637]
04:00.2 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 10h-1fh) Platform Security Processor [1022:15df]
04:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Renoir USB 3.1 [1022:1639]
04:00.4 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Renoir USB 3.1 [1022:1639]
04:00.6 Audio device [0403]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 10h-1fh) HD Audio Controller [1022:15e3]
05:00.0 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] [1022:7901] (rev 81)
root@pve:~#

But I think I could bypass this problem if I buy a PCIe SATA Controller and pass it through the VM so it's fine :)

My USB Drives shouldn't have any bad sectors but maybe the do have now some, I'll let them check for any faults.


Thank you very much for your help, I really appreciate your time and knowledge ♥
 
But I think I could bypass this problem if I buy a PCIe SATA Controller and pass it through the VM so it's fine :)
Please be aware that you can only passthrough PCI slots that are connected to the CPU and most likely not those that are connected to the A520 chipset. In your case that is the x16 slot. If you use that for a SATA controller, you cannot use it for a discrete GPU anymore.
If you want a nice overview of the IOMMU groups and devices, you can use this command: for d in /sys/kernel/iommu_groups/*/devices/*; do n=${d#*/iommu_groups/*}; n=${n%%/*}; printf 'IOMMU group %s ' "$n"; lspci -nns "${d##*/}"; done

If it is not a problem with the drive itself, maybe it's the controller or connection? Please try other USB cables and other ports on the motherboard.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!