PCIe passthrough attempt results in rpool encountering uncorrectible I/O failure error and crashing PVE

Huecuva · May 27, 2021

Hey guys. I have a PCIe SATA controller that I'm trying to pass through to a Debian VM on my Proxmox node. The machine is running on a Ryzen 7 3700X in an Asrock B450 Pro 4 motherboard that I have updated to the latest BIOS with 64GB of RAM. Proxmox is installed on a RAIDZ-2 consisting of four 250GB WD Blue SATA SSDs that are home to both the PVE and the VMs. Connected to the SATA controller I'm trying to pass through are two 6TB WD Red hard drives in a RAID0 which was created using Debian.

I added the following lines as required to /etc/modules:

Code:

vfio
 vfio_iommu_type1
 vfio_pci
 vfio_virqfd

Here are the results of lspci -nnk as it pertains to the device in question:

Code:

04:00.0 SATA controller [0106]: ASMedia Technology Inc. ASM1062 Serial ATA Controller [1b21:0612] (rev 02)
        Subsystem: ASRock Incorporation Motherboard [1849:0612]
        Kernel driver in use: ahci
        Kernel modules: ahci
07:00.0 SATA controller [0106]: ASMedia Technology Inc. ASM1062 Serial ATA Controller [1b21:0612] (rev 02)
        Subsystem: ASRock Incorporation Motherboard [1849:0612]
        Kernel driver in use: ahci
        Kernel modules: ahci

I don't know why it is two devices. It is only one PCIe SATA controller with two SATA ports on it. Maybe that's why?

I have IOMMU enabled in the BIOS, and I have added the appropriate line to grub config and verified that iommu is enabled.

Code:

# cat /etc/default/grub | grep iommu
GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on iommu=pt"

Code:

# dmesg | grep -e DMAR -e IOMMU -e AMD-Vi
[    1.190979] pci 0000:00:00.2: AMD-Vi: IOMMU performance counters supported
[    1.194154] pci 0000:00:00.2: AMD-Vi: Found IOMMU cap 0x40
[    1.194154] pci 0000:00:00.2: AMD-Vi: Extended features (0x58f77ef22294a5a):
[    1.194156] AMD-Vi: Interrupt remapping enabled
[    1.194223] AMD-Vi: Lazy IO/TLB flushing enabled
[    1.195100] perf/amd_iommu: Detected AMD IOMMU #0 (2 banks, 4 counters/bank).

I have the two devices added to the VM, and it was created as a q35 machine as are all indicated in the VM config:

Code:

hostpci0: 04:00.0,pcie=1
hostpci1: 07:00.0,pcie=1
machine: q35

One can also see that my PCIe card is not just in a separate group, but each part of it is in fact in its own separate group:

Code:

# find /sys/kernel/iommu_groups/ -type l | grep groups/6 /sys/kernel/iommu_groups/6/devices/0000:00:07.1 # find /sys/kernel/iommu_groups/ -type l | grep groups/3 /sys/kernel/iommu_groups/3/devices/0000:00:04.0

I created a file in /etc/modprobe.d and I also followed the directions in this forum thread that mentioned adding aliases to the /etc/modprobe.d/vfio.conf file/:

Code:

# cat /etc/modprobe.d/vfio.conf
options vfio-pci ids=1b21:0612
alias pci:v00001B21d00000612sv00001849sd00000612bc01sc06i01 vfio-pci
alias pci:v00001B21d00000612sv00001849sd00000612bc01sc06i01 vfio-pci

I blacklisted the driver:

Code:

# cat /etc/modprobe.d/pve-blacklist.conf
# This file contains a list of modules which are not supported by Proxmox VE

# nidiafb see bugreport https://bugzilla.proxmox.com/show_bug.cgi?id=701
blacklist nvidiafb
blacklist asmedia

I don't really know what the DRIVERNAME referenced in the directions is, so asmedia is just a guess. Also, the PCI(e) directions don't mention a filename for the blacklist, but the PCI directions name /etc/modpbrode.d/blacklist.conf. However, the file I found at /etc/modprobe.d/ was pve-blacklist.conf, so that's where I put my blacklist. Should it have been in blacklist.conf? I can't seem to get the asmedia device to list either vfio-pci driver or no driver at all in use.

The problem is that every time I try to run my VM, rpool encounters an uncorrectable I/O failure error and gets suspended, and the entire PVE crashes and has to be hard booted.

I cannot figure out what is wrong and I can't do anything else with my server until this card is passed through. Any help is greatly appreciated.

EDIT: I've since realized that I was looking at the wrong device groups before and my PCIe SATA card is in fact in group 0 with a whole lot of other things:

Code:

# find /sys/kernel/iommu_groups/ -type l | grep groups/0
/sys/kernel/iommu_groups/0/devices/0000:02:07.0
/sys/kernel/iommu_groups/0/devices/0000:02:00.0
/sys/kernel/iommu_groups/0/devices/0000:00:01.0
/sys/kernel/iommu_groups/0/devices/0000:08:00.0
/sys/kernel/iommu_groups/0/devices/0000:01:00.2
/sys/kernel/iommu_groups/0/devices/0000:01:00.0
/sys/kernel/iommu_groups/0/devices/0000:02:06.0
/sys/kernel/iommu_groups/0/devices/0000:07:00.0
/sys/kernel/iommu_groups/0/devices/0000:00:01.3
/sys/kernel/iommu_groups/0/devices/0000:02:05.0
/sys/kernel/iommu_groups/0/devices/0000:01:00.1
/sys/kernel/iommu_groups/0/devices/0000:04:00.0
/sys/kernel/iommu_groups/0/devices/0000:02:01.0
/sys/kernel/iommu_groups/0/devices/0000:02:04.0

It appears that it shares a group with the rest of my pci bridges and the onboard LAN. I don't know how to fix this.

EDIT AGAIN: I managed to get ACS enabled in my BIOS. All it did was put them all in group 13 now, minus a few of the Starship/Matisse PCIe Dummy Host Bridge devices. I have an m.2 PCIe adapter. Would it be worth installing that and putting my PCIe SATA card in an m.2 slot?

leesteken · May 27, 2021

On B450 motherboards, there are usually 1 PCIe (typically x16) and 1 M.2 slot connected directly to the CPU which are in isolated groups and can be passed through. Other M.2 and PCIe (typically x1) slots are in the large chipset group, and it won't always do passthrough even if you break the isolation/security with pcie_acs_override.

If you want to use the M.2 slot on the motherboard closest to the CPU, it is almost guaranteed to work with PCI passthrough. Or use the x16 PCIe slot next to it if it is not used by a GPU. Otherwise you have use the pcie_acs_override to break up the IOMMU groups anyway, and you might as well try using one of the other PCIe slots directly.

Looking at the specification of that motherboard, it looks like the PCIE4 slot shares it PCIe-lanes with the M2_1 slot. If you are not using that M.2 (probably the one closest to the CPU), you should definately try that x4 PCIe slot (which looks like a x16).

Huecuva · May 28, 2021

avw said:
On B450 motherboards, there are usually 1 PCIe (typically x16) and 1 M.2 slot connected directly to the CPU which are in isolated groups and can be passed through. Other M.2 and PCIe (typically x1) slots are in the large chipset group, and it won't always do passthrough even if you break the isolation/security with pcie_acs_override.

If you want to use the M.2 slot on the motherboard closest to the CPU, it is almost guaranteed to work with PCI passthrough. Or use the x16 PCIe slot next to it if it is not used by a GPU. Otherwise you have use the pcie_acs_override to break up the IOMMU groups anyway, and you might as well try using one of the other PCIe slots directly.

Looking at the specification of that motherboard, it looks like the PCIE4 slot shares it PCIe-lanes with the M2_1 slot. If you are not using that M.2 (probably the one closest to the CPU), you should definately try that x4 PCIe slot (which looks like a x16).

Excellent. Thank you for the response. I will give that a try forthwith.

Huecuva · May 28, 2021

I moved the PCIe SATA controller into the PCIe4 x8 slot and the PVE started to have some weird issues taking an eternity to boot with start jobs taking forever and then not having a network connection once it finally booted. I left the card installed in the x8 slot and reinstalled Proxmox. It installed without an issue and the network connection was functional. When I check the IOMMU groups however, I discover this:

Code:

01:00.0 SATA controller [0106]: ASMedia Technology Inc. ASM1062 Serial ATA Controller [1b21:0612] (rev 02)
        Subsystem: ASMedia Technology Inc. ASM1062 Serial ATA Controller [1b21:1060]
        Kernel driver in use: ahci
        Kernel modules: ahci
08:00.0 SATA controller [0106]: ASMedia Technology Inc. ASM1062 Serial ATA Controller [1b21:0612] (rev 02)
        Subsystem: ASRock Incorporation Motherboard [1849:0612]
        Kernel driver in use: ahci
        Kernel modules: ahci
# find /sys/kernel/iommu_groups/ -type l | grep groups/15
/sys/kernel/iommu_groups/15/devices/0000:03:00.0
/sys/kernel/iommu_groups/15/devices/0000:09:00.0
/sys/kernel/iommu_groups/15/devices/0000:02:00.2
/sys/kernel/iommu_groups/15/devices/0000:02:00.0
/sys/kernel/iommu_groups/15/devices/0000:03:06.0
/sys/kernel/iommu_groups/15/devices/0000:08:00.0
/sys/kernel/iommu_groups/15/devices/0000:03:05.0
/sys/kernel/iommu_groups/15/devices/0000:02:00.1
/sys/kernel/iommu_groups/15/devices/0000:03:01.0
/sys/kernel/iommu_groups/15/devices/0000:03:04.0
/sys/kernel/iommu_groups/15/devices/0000:03:07.0
# find /sys/kernel/iommu_groups/ -type l | grep groups/14
/sys/kernel/iommu_groups/14/devices/0000:01:00.0

It looks like one of the two devices that this thing is has been put into an isolated group, but the other is still in the same group with a bunch of the onboard pci bridges. I get the feeling that if I use the M.2 adapter and put it into M.2 1_2, it would still be in those same groups. There is a GPU in the x16 slot, but it's only a Radeon 5450 and isn't expected to do any heavy work. I can easily move it to the x8 slot if putting the SATA controller in the x16 slot means it's in its own groups or, ideally, both parts of it in the same group. Do you think the x16 slot would be completely isolated by itself?

Well, I put the GPU in the x8 slot and the SATA controller in the x16 slot and it was having the problems with the start jobs again making it take an absolutely unacceptably long time to boot and shut down, so I reinstalled PVE again with the devices in the correct slots. It's still having the issues with the start and stop jobs even after a fresh install. I've attached some screenshots of what's happening.

Other weirdness
More weirdness
Start jobs
More start jobs
it keeps repeating this over and over even after it boots and I log in.

I'm going to try reinstalling again when I get time, but I have no idea why this is doing this just because I moved some PCIe cards around. Do you have any ideas?

leesteken · May 28, 2021

I don't want to be pedantic, but I don't see a x8 PCIe slot on that board. Did I get the right manual? I'm assuming you meant PCIE4, which is the (electrically) x4 PCIe slot (in a physically x16 PCIe slot), and not PCIE2, which is the (electical and physical) x16 PCIe slot that works as a x8 when you have a Ryzen with integrated GPU.

I cannot explain the weirdness unless you were using PCIE4 and M2_1 as the same time (which is not supported), or just accidentally loosened a memory DIMM or a cable is making a bad connection. Booting can be very slow when some but not all data from the drives reaches the system and therefore a lot of retries are necessary. Please check your SATA cables and connectors. Unplugging and reconnecting them might help.

I think that those two devices at 01:00.0 and 08:00.0 are not from the same thing at all. I think the first one is the motherboard chipset SATA controller and the other the PCIe card you moved, which just happens to use the same (very common) SATA controller chip. This might become a bit of a problem if you need the same Linux driver to not touch one of them for passthrough. I agree that using the M.2-adapter would give the same results, because the M2_1 and PCIE4 use the same 4 PCIe-lanes (the adapter just changes the shape of the electrical contacts). Both the PCIE2 and PCIE4 give you a separate group, so I suggest leaving the GPU in PCIE2 if you don't need M2_1.

Can you show me all the groups using

for d in /sys/kernel/iommu_groups/*/devices/*; do n=${d#*/iommu_groups/*}; n=${n%%/*}; printf 'IOMMU group %s ' "$n"; lspci -nns "${d##*/}"; done

? I find it much easier to read, it shows PCI bridges, and gives me some context of known devices.

Huecuva · May 29, 2021

I guess it's not a x8 slot. It looks like it's only a x4 slot. I am referring to PCIe4. I don't have any M.2 drives installed at all. I reinstalled PVE again and it seems to be booting a lot faster now. I had checked my cables and maybe one of the SATA cables on one of my 6TB drives might have been a little loose after all the moving my SATA controller around.

Here are the results of the command you requested:

Code:

# for d in /sys/kernel/iommu_groups/*/devices/*; do n=${d#*/iommu_groups/*}; n=${n%%/*}; printf 'IOMMU group %s ' "$n"; lspci -nns "${d##*/}"; done
IOMMU group 0 00:01.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
IOMMU group 10 00:08.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
IOMMU group 11 00:08.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B] [1022:1484]
IOMMU group 12 00:14.0 SMBus [0c05]: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller [1022:790b] (rev 61)
IOMMU group 12 00:14.3 ISA bridge [0601]: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge [1022:790e] (rev 51)
IOMMU group 13 00:18.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 0 [1022:1440]
IOMMU group 13 00:18.1 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 1 [1022:1441]
IOMMU group 13 00:18.2 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 2 [1022:1442]
IOMMU group 13 00:18.3 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 3 [1022:1443]
IOMMU group 13 00:18.4 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 4 [1022:1444]
IOMMU group 13 00:18.5 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 5 [1022:1445]
IOMMU group 13 00:18.6 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 6 [1022:1446]
IOMMU group 13 00:18.7 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 7 [1022:1447]
IOMMU group 14 01:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Cedar [Radeon HD 5000/6000/7350/8350 Series] [1002:68f9]
IOMMU group 14 01:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Cedar HDMI Audio [Radeon HD 5400/6300/7300 Series] [1002:aa68]
IOMMU group 15 02:00.0 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset USB 3.1 XHCI Controller [1022:43d5] (rev 01)
IOMMU group 15 02:00.1 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset SATA Controller [1022:43c8] (rev 01)
IOMMU group 15 02:00.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Bridge [1022:43c6] (rev 01)
IOMMU group 15 03:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Port [1022:43c7] (rev 01)
IOMMU group 15 03:01.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Port [1022:43c7] (rev 01)
IOMMU group 15 03:04.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Port [1022:43c7] (rev 01)
IOMMU group 15 03:05.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Port [1022:43c7] (rev 01)
IOMMU group 15 03:06.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Port [1022:43c7] (rev 01)
IOMMU group 15 03:07.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Port [1022:43c7] (rev 01)
IOMMU group 15 08:00.0 SATA controller [0106]: ASMedia Technology Inc. ASM1062 Serial ATA Controller [1b21:0612] (rev 02)
IOMMU group 15 09:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [10ec:8168] (rev 15)
IOMMU group 16 0a:00.0 SATA controller [0106]: ASMedia Technology Inc. ASM1062 Serial ATA Controller [1b21:0612] (rev 02)
IOMMU group 17 0b:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Function [1022:148a]
IOMMU group 18 0c:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Reserved SPP [1022:1485]
IOMMU group 19 0c:00.1 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Cryptographic Coprocessor PSPCPP [1022:1486]
IOMMU group 1 00:01.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge [1022:1483]
IOMMU group 20 0c:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Matisse USB 3.0 Host Controller [1022:149c]
IOMMU group 21 0c:00.4 Audio device [0403]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse HD Audio Controller [1022:1487]
IOMMU group 2 00:01.3 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge [1022:1483]
IOMMU group 3 00:02.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
IOMMU group 4 00:03.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
IOMMU group 5 00:03.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge [1022:1483]
IOMMU group 6 00:04.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
IOMMU group 7 00:05.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
IOMMU group 8 00:07.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
IOMMU group 9 00:07.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B] [1022:1484]

This is with the PCIe SATA controller now positioned in PCIe2, the X16 slot. I put it there in hopes that that slot would be entirely isolated. Also, storage devices seem to be able to benefit from PCIe x16 speeds, though I'm not sure how well that applies with GPU in PCIe4.

EDIT: So now with the SATA controller in PCIe2, these are the results obtained by lspci -nnk | grep ASMedia.

Code:

# lspci -nnk | grep "ASMedia"
    Subsystem: ASMedia Technology Inc. 400 Series Chipset USB 3.1 XHCI Controller [1b21:1142]
    Subsystem: ASMedia Technology Inc. 400 Series Chipset SATA Controller [1b21:1062]
08:00.0 SATA controller [0106]: ASMedia Technology Inc. ASM1062 Serial ATA Controller [1b21:0612] (rev 02)
0a:00.0 SATA controller [0106]: ASMedia Technology Inc. ASM1062 Serial ATA Controller [1b21:0612] (rev 02)
    Subsystem: ASMedia Technology Inc. ASM1062 Serial ATA Controller [1b21:1060]

After closer examination of a more complete lspci list, it appears that there are some ASMedia subsystems on some of the AMD devices, so you're probably right about there being onboard ASMedia devices. However, there are two main ASMedia devices with the same ID 1b21:0612. It's also strange that one of those main ASMedia devices has an ASRock subsystem and one has ASMedia listed as subsystem.

If they are separate devices and the ASRock one is onboard and still somehow has the same ID as the PCIe card, would I then use the subsystem ID to pass the device through?

Dunuin · May 29, 2021

I think that those two devices at 01:00.0 and 08:00.0 are not from the same thing at all.

According to the datasheet of your motherboard there is a asmedia SATA controller onboard: "2 x SATA3 6.0 Gb/s Connectors by ASMedia ASM1061, support NCQ, AHCI and Hot Plug". So avw was right with that.

If they were two different devices, would they still have the same device ID (1b21:0612)?

Yes, device ID would be the same if its the same chip. And because it is the same chip you can't blacklist the driver (or atleast your host wouldn't be able to use 2 of the 6 onboard SATA ports). You should check which of the 4 onboard SATA ports are connected to the chipset and which two to the onboard ASM1061. If any of your 4 SSDs is connected to the onboard ASM1061 its no wonder that your ZFS pool creates errors if you try to passthrough both ASM1061 controllers.

Huecuva · May 29, 2021

Well, I seem to be having some issues getting any OS to boot properly with this PCIe card installed. I wiped out my PVE install and put Debian on the machine just to see how a normal OS would react to this card instead of trying to deal with it through a hypervisor. I tried it with the two WD Red drives plugged into onboard SATA and everything worked like a dream. The OS picked up the RAID and I was able to mount it and access it. When I tried it with the PCIe SATA controller installed, the OS encountered some weirdness booting, and then, while it detected both WD Red drives in lsblk, it failed to see that they were in a RAID. I had to manually install mdadm (which installed automatically using onboard SATA) and then it detected the RAID but would not mount it and kept spitting out some nonsense that made using the command line difficult. I can't remember what it was spewing but now it doesn't seem to want to reboot properly. I hard booted it because it was basically frozen and it is no longer seeing the RAID and obviously I can't install mdadm as it is already installed now (EDIT: It looks like it's actually only seeing one of the 6TB drives now). BIOS is not detecting any drives connected to the PCIe card. GParted seems inconsistent.

I'm beginning to wonder if this PCIe SATA card I have is just a complete piece of shit. I've already ordered a slightly more expensive PCIe SATA controller with a completely different chipset on it. I really hope it solves my problems. This is really frustrating.

EDIT: The nonsense it keeps spouting and making the command line difficult to use says something about:

Code:

ata12.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
ata12.00: failed command: READ DMA EXT
ata12.00: cmd 25/00:48:b8:f3:a0/00:00:ba:02:00/00 tag XX dma 4096 in
res (more hex here) Emask 0x4 (timeout)
ata12.00: status {DRDY}

I don't know what any of that means, but I think this PCIe card has a problem with RAIDs or something. I'm not even dealing with passthrough now and Debian is having issues with it.

leesteken · May 29, 2021

If you know which drive are which (by using ls -l /dev/disk/by-id), you can check which SATA contoller they connect to using ls -l /dev/disk/by-path/. This way you can identify each SATA port or each controller in your system.
Your current problems sound like a hardware failure and it could still be a bad memory connection or cable issue. But otherwise the PCIe-card is suspect of either hardware issues or not playing nice with the IOMMU of your system (but that would give additional errors from the IOMMU about DMA).
Please note that there are no guarantees that passthrough of any PCIe device will work perfectly, because it is not a commonly tested use-case. The PCIe devices should be standards compliant and therefore should work, but it tends to be trial and error in practice.

Huecuva · May 29, 2021

Is it possible to pass through individual hard drives without passing through the SATA controller? I'm only using this SATA controller because my motherboard doesn't have enough SATA ports on it for all four SSDs, my storage RAID and my optical drive. If I can pass through just the HDDs, I can plug in the optical drive to the PCIe card instead.

My RAM is all firmly seated and hasn't been touched since I installed it. I've checked the SATA cables multiple times and these weird problems only happen when the PCIe card is installed, so I doubt it's an issue with memory or cables. I will check the disk ID later when I have more time this afternoon.

Dunuin · May 29, 2021

Huecuva said:
Is it possible to pass through individual hard drives without passing through the SATA controller?

No, not if you want a real PCI passthough so the VM can directly access the drive without virtualization between. If virtualization and its overhead is fine you can "passthough" single ports using "qm set".

Huecuva · May 30, 2021

I'd rather not have to deal with the overhead. I just want this VM to be the only thing to be able to directly access this RAID and I plan on having it shared on my LAN for access from the rest of my machines. Since I've already ordered another PCIe SATA controller with a different chipset on it that isn't ASMedia, maybe I should just wait until that shows up. I have a feeling I might be able to pass it through much more easily and the fact that this one has an ASMedia chip on it that happens to be the same as what is on my motherboard is what is causing the bulk of my problems.

EDIT: Would the passthrough process be any different or more likely to succeed if I were using a container instead of a VM? I've never used containers.

EDIT AGAIN: After closer examination and some more research, it appears that there are in fact several different errors I'm getting when trying to boot any OS with this PCIe card installed. Also, it was managing to show one drive and only one was having errors, but now they're both having errors. They appear to be sdh and sdf but they do not appear when I list block devices. A lot of the errors I get seem to indicate a dead or dying drive, but the drives work just fine when I connect them to onboard SATA. This error appears to be fixable but the command that thread suggests doesn't work (I get APM_Level = unsupported) and it's only one problem out of a few. I hope the new card fixes all this crap. I don't know if this card can still be returned. It does seem to be faulty.

Code:

# ls -l /dev/disk/by-id
total 0
lrwxrwxrwx 1 root root  9 May 29 22:43 ata-WDC_WD60EFRX-68L0BN1_WD-WX11D665142P -> ../../sdh
lrwxrwxrwx 1 root root 10 May 29 22:43 ata-WDC_WD60EFRX-68L0BN1_WD-WX11D665142P-part1 -> ../../sdh1
lrwxrwxrwx 1 root root  9 May 29 22:44 ata-WDC_WD60EFRX-68L0BN1_WD-WX11D6651VTZ -> ../../sdf
lrwxrwxrwx 1 root root 10 May 29 22:44 ata-WDC_WD60EFRX-68L0BN1_WD-WX11D6651VTZ-part1 -> ../../sdf1
lrwxrwxrwx 1 root root  9 May 29 22:44 ata-WDC_WDS250G1B0A-00H9H0_171302A01736 -> ../../sda
lrwxrwxrwx 1 root root 10 May 29 22:44 ata-WDC_WDS250G1B0A-00H9H0_171302A01736-part1 -> ../../sda1
lrwxrwxrwx 1 root root  9 May 29 22:44 ata-WDC_WDS250G1B0A-00H9H0_171303A007F3 -> ../../sde
lrwxrwxrwx 1 root root 10 May 29 22:44 ata-WDC_WDS250G1B0A-00H9H0_171303A007F3-part1 -> ../../sde1
lrwxrwxrwx 1 root root  9 May 29 22:44 ata-WDC_WDS250G1B0A-00H9H0_171305A01553 -> ../../sdb
lrwxrwxrwx 1 root root 10 May 29 22:44 ata-WDC_WDS250G1B0A-00H9H0_171305A01553-part1 -> ../../sdb1
lrwxrwxrwx 1 root root  9 May 29 22:44 ata-WDC_WDS250G1B0A-00H9H0_171691424183 -> ../../sdc
lrwxrwxrwx 1 root root 10 May 29 22:44 ata-WDC_WDS250G1B0A-00H9H0_171691424183-part1 -> ../../sdc1
lrwxrwxrwx 1 root root  9 May 29 22:44 usb-SanDisk_Cruzer_Switch_4C530007960508110213-0:0 -> ../../sdg
lrwxrwxrwx 1 root root 10 May 29 22:44 usb-SanDisk_Cruzer_Switch_4C530007960508110213-0:0-part1 -> ../../sdg1
lrwxrwxrwx 1 root root 10 May 29 22:44 usb-SanDisk_Cruzer_Switch_4C530007960508110213-0:0-part2 -> ../../sdg2
lrwxrwxrwx 1 root root  9 May 29 22:44 usb-USB_DISK_USB_DISK_05F3-0:0 -> ../../sdd
lrwxrwxrwx 1 root root 10 May 29 22:44 usb-USB_DISK_USB_DISK_05F3-0:0-part1 -> ../../sdd1
lrwxrwxrwx 1 root root 10 May 29 22:44 usb-USB_DISK_USB_DISK_05F3-0:0-part2 -> ../../sdd2
lrwxrwxrwx 1 root root 10 May 29 22:44 usb-USB_DISK_USB_DISK_05F3-0:0-part3 -> ../../sdd3
lrwxrwxrwx 1 root root 10 May 29 22:44 usb-USB_DISK_USB_DISK_05F3-0:0-part4 -> ../../sdd4
lrwxrwxrwx 1 root root  9 May 29 22:43 wwn-0x50014ee20dce53bd -> ../../sdh
lrwxrwxrwx 1 root root 10 May 29 22:43 wwn-0x50014ee20dce53bd-part1 -> ../../sdh1
lrwxrwxrwx 1 root root  9 May 29 22:44 wwn-0x50014ee2632398f5 -> ../../sdf
lrwxrwxrwx 1 root root 10 May 29 22:44 wwn-0x50014ee2632398f5-part1 -> ../../sdf1
lrwxrwxrwx 1 root root  9 May 29 22:44 wwn-0x5001b444a6f2f49f -> ../../sdc
lrwxrwxrwx 1 root root 10 May 29 22:44 wwn-0x5001b444a6f2f49f-part1 -> ../../sdc1
lrwxrwxrwx 1 root root  9 May 29 22:44 wwn-0x5001b448b4e0fad8 -> ../../sde
lrwxrwxrwx 1 root root 10 May 29 22:44 wwn-0x5001b448b4e0fad8-part1 -> ../../sde1
lrwxrwxrwx 1 root root  9 May 29 22:44 wwn-0x5001b448b4e20448 -> ../../sdb
lrwxrwxrwx 1 root root 10 May 29 22:44 wwn-0x5001b448b4e20448-part1 -> ../../sdb1
lrwxrwxrwx 1 root root  9 May 29 22:44 wwn-0x5001b448b4ed1d62 -> ../../sda
lrwxrwxrwx 1 root root 10 May 29 22:44 wwn-0x5001b448b4ed1d62-part1 -> ../../sda1
# ls -l /dev/disk/by-path
total 0
lrwxrwxrwx 1 root root  9 May 29 22:44 pci-0000:02:00.0-usb-0:8:1.0-scsi-0:0:0:0 -> ../../sdd
lrwxrwxrwx 1 root root 10 May 29 22:44 pci-0000:02:00.0-usb-0:8:1.0-scsi-0:0:0:0-part1 -> ../../sdd1
lrwxrwxrwx 1 root root 10 May 29 22:44 pci-0000:02:00.0-usb-0:8:1.0-scsi-0:0:0:0-part2 -> ../../sdd2
lrwxrwxrwx 1 root root 10 May 29 22:44 pci-0000:02:00.0-usb-0:8:1.0-scsi-0:0:0:0-part3 -> ../../sdd3
lrwxrwxrwx 1 root root 10 May 29 22:44 pci-0000:02:00.0-usb-0:8:1.0-scsi-0:0:0:0-part4 -> ../../sdd4
lrwxrwxrwx 1 root root  9 May 29 22:44 pci-0000:02:00.0-usb-0:9:1.0-scsi-0:0:0:0 -> ../../sdg
lrwxrwxrwx 1 root root 10 May 29 22:44 pci-0000:02:00.0-usb-0:9:1.0-scsi-0:0:0:0-part1 -> ../../sdg1
lrwxrwxrwx 1 root root 10 May 29 22:44 pci-0000:02:00.0-usb-0:9:1.0-scsi-0:0:0:0-part2 -> ../../sdg2
lrwxrwxrwx 1 root root  9 May 29 22:44 pci-0000:02:00.1-ata-5 -> ../../sda
lrwxrwxrwx 1 root root 10 May 29 22:44 pci-0000:02:00.1-ata-5-part1 -> ../../sda1
lrwxrwxrwx 1 root root  9 May 29 22:44 pci-0000:02:00.1-ata-6 -> ../../sdb
lrwxrwxrwx 1 root root 10 May 29 22:44 pci-0000:02:00.1-ata-6-part1 -> ../../sdb1
lrwxrwxrwx 1 root root  9 May 29 22:44 pci-0000:08:00.0-ata-1 -> ../../sdc
lrwxrwxrwx 1 root root 10 May 29 22:44 pci-0000:08:00.0-ata-1-part1 -> ../../sdc1
lrwxrwxrwx 1 root root  9 May 29 22:44 pci-0000:08:00.0-ata-2 -> ../../sde
lrwxrwxrwx 1 root root 10 May 29 22:44 pci-0000:08:00.0-ata-2-part1 -> ../../sde1
lrwxrwxrwx 1 root root  9 May 29 22:44 pci-0000:0a:00.0-ata-1 -> ../../sdf
lrwxrwxrwx 1 root root 10 May 29 22:44 pci-0000:0a:00.0-ata-1-part1 -> ../../sdf1
lrwxrwxrwx 1 root root  9 May 29 22:43 pci-0000:0a:00.0-ata-2 -> ../../sdh
lrwxrwxrwx 1 root root 10 May 29 22:43 pci-0000:0a:00.0-ata-2-part1 -> ../../sdh1

EDIT AGAIN: It appears that now the RAID is showing correctly in Debian with the PCIe card installed, but the main server terminal (the monitor hard connected to the server itself) keeps spitting out these Exception Emask errors and Failed command: READ DMA EXT and when I tried to mount the RAID it hung the terminal instance. The drives do not have these problems when connected to onboard SATA.

Code:

# lsblk
NAME        MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
sda           8:0    0 232.9G  0 disk
└─sda1        8:1    0 232.9G  0 part  /
sdb           8:16   0 232.9G  0 disk
└─sdb1        8:17   0 232.9G  0 part
sdc           8:32   0 232.9G  0 disk
└─sdc1        8:33   0 232.9G  0 part
sdd           8:48   1   981M  0 disk
├─sdd1        8:49   1   244K  0 part
├─sdd2        8:50   1   2.8M  0 part
├─sdd3        8:51   1 809.4M  0 part
└─sdd4        8:52   1   300K  0 part
sde           8:64   0 232.9G  0 disk
└─sde1        8:65   0 232.9G  0 part
sdf           8:80   0   5.5T  0 disk
└─sdf1        8:81   0   5.5T  0 part
  └─md0       9:0    0   5.5T  0 raid1
    └─md0p1 259:0    0   5.5T  0 part
sdg           8:96   1   7.5G  0 disk
├─sdg1        8:97   1   1.3G  0 part
└─sdg2        8:98   1   2.9M  0 part
sdh           8:112  0   5.5T  0 disk
└─sdh1        8:113  0   5.5T  0 part
  └─md0       9:0    0   5.5T  0 raid1
    └─md0p1 259:0    0   5.5T  0 part
# mount /dev/md0p1 /home/<user>/mountstuff
lsblk

Here are the results of dmesg | grep ata11 and dmesg | grep ata12. I will only post the results for ata11, but the results for ata12 are the same.

Code:

# dmesg | grep ata11
[    1.723503] ata11: SATA max UDMA/133 abar m512@0xfce10000 port 0xfce10100 irq 44
[    2.198913] ata11: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[    2.199944] ata11.00: ATA-9: WDC WD60EFRX-68L0BN1, 82.00A82, max UDMA/133
[    2.199945] ata11.00: 11721045168 sectors, multi 0: LBA48 NCQ (depth 32), AA
[    2.201019] ata11.00: configured for UDMA/133
[   34.511537] ata11.00: exception Emask 0x0 SAct 0x10 SErr 0x0 action 0x6 frozen
[   34.511608] ata11.00: failed command: READ FPDMA QUEUED
[   34.511674] ata11.00: cmd 60/d0:20:00:f0:a0/00:00:ba:02:00/40 tag 4 ncq dma 106496 in
[   34.511771] ata11.00: status: { DRDY }
[   34.511830] ata11: hard resetting link
[   34.983195] ata11: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[   34.985388] ata11.00: configured for UDMA/133
[   34.985392] ata11.00: device reported invalid CHS sector 0
[   34.985547] ata11: EH complete
[   65.231220] ata11.00: exception Emask 0x0 SAct 0x1c0 SErr 0x0 action 0x6 frozen
[   65.231297] ata11.00: failed command: READ FPDMA QUEUED
[   65.231363] ata11.00: cmd 60/b8:30:68:f2:a0/00:00:ba:02:00/40 tag 6 ncq dma 94208 in
[   65.231472] ata11.00: status: { DRDY }
[   65.231535] ata11.00: failed command: READ FPDMA QUEUED
[   65.231605] ata11.00: cmd 60/80:38:28:f3:a0/00:00:ba:02:00/40 tag 7 ncq dma 65536 in
[   65.231716] ata11.00: status: { DRDY }
[   65.231780] ata11.00: failed command: READ FPDMA QUEUED
[   65.231849] ata11.00: cmd 60/48:40:b8:f3:a0/00:00:ba:02:00/40 tag 8 ncq dma 36864 in
[   65.231961] ata11.00: status: { DRDY }
[   65.232025] ata11: hard resetting link
[   65.703190] ata11: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[   65.705572] ata11.00: configured for UDMA/133
[   65.705577] ata11.00: device reported invalid CHS sector 0
[   65.705579] ata11.00: device reported invalid CHS sector 0
[   65.705580] ata11.00: device reported invalid CHS sector 0
[   65.705970] ata11: EH complete
[   95.951237] ata11.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen
[   95.951317] ata11.00: failed command: READ FPDMA QUEUED
[   95.951387] ata11.00: cmd 60/30:00:48:08:04/00:00:00:00:00/40 tag 0 ncq dma 24576 in
[   95.951499] ata11.00: status: { DRDY }
[   95.951563] ata11: hard resetting link
[   96.423218] ata11: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[   96.426165] ata11.00: configured for UDMA/133
[   96.426169] ata11.00: device reported invalid CHS sector 0
[   96.426173] ata11: EH complete

It keeps hard resetting the link. The card cannot maintain communication between the drives and the motherboard, I think? It's just a shitty SATA controller I guess?

leesteken · May 30, 2021

Maybe its the combination of enabling IOMMU (which needs to be strict about DMA) for passthrough and this particular SATA controller.

You can pass the drives /dev/sdf and /dev/sdh (or something) to a VM by adding something like

scsi11: /dev/sdf
scsi10: /dev/sdh

to the VM configuration file. It is more reliable to use something like /dev/disk/by-id/..... because those don't change, but the drives need to be usable (without so many errors) to show up there. If you use VirtIO SCSI for the VM, the overhead will be low. The VM will have access to the whole drive and all partitions, but they will appear as QEMU harddisks and SMART won't work inside the VM, but this is probably all you need for your RAID. This also works for optical drives but only for data, not for DVD/Blue-ray video or music.

Yes, for any Linux software or services, that doesn't need a GUI, you can make a container (preferably unprivileged) to run then. You can give a container access to devices and disk partitions (but they require some user mappings or use a privileged container) or just the file systems on them. This might not be the best for running untrusted software that provides services to external/unknown users, but otherwise it is much easier and less overhead. Since the RAID works for you in Debian, maybe it will (or can be made) to show up in Proxmox (which is also Debian) and you can just easily pass the file system directories to the container.

Huecuva · Jun 4, 2021

Well, it looks like the passthrough issue is resolved with the new PCIe SATA controller I ordered. It showed up today and passthrough just worked. It takes an eternity for the VM to start now, but the passthrough works.

leesteken · Jun 4, 2021

Good to hear! I don't know why it would start slow, except that all VM memory must be cleared and locked into actual RAM because of passthrough. Anything in journalctl about starting the VM?
Can you please share which PCIe SATA controller you bought (maybe a link to a website and/or a lspci -nnkv), so people know which one works with passthrough?

Huecuva · Jun 5, 2021

The PCIe SATA controller I got shows up as this in lspci -nnkv:

Code:

SATA controller [0106]: JMicron Technology Corp. Device [197b:0585] (prog-if 01 [AHCI 1.0])
    Subsystem: JMicron Technology Corp. Device [197b:0000]
    Flags: bus master, fast devsel, latency 0, IRQ 45
    I/O ports at e200 [size=128]
    I/O ports at e180 [size=128]
    I/O ports at e100 [size=128]
    I/O ports at e080 [size=128]
    I/O ports at e000 [size=128]
    Memory at fce10000 (32-bit, non-prefetchable) [size=8K]
    Expansion ROM at fce00000 [disabled] [size=64K]
    Capabilities: [80] Power Management version 3
    Capabilities: [90] MSI: Enable+ Count=1/8 Maskable- 64bit+
    Capabilities: [c0] Express Legacy Endpoint, MSI 00
    Capabilities: [100] Advanced Error Reporting
    Capabilities: [150] Device Serial Number 00-00-00-00-00-00-00-00
    Capabilities: [160] Power Budgeting <?>
    Capabilities: [1b8] Latency Tolerance Reporting
    Capabilities: [300] #19
    Capabilities: [900] L1 PM Substates
    Kernel driver in use: vfio-pci
    Kernel modules: ahci

I got it from Amazon here.

As for the VM taking forever to start, it hangs at the first screen where it says the SeaBIOS version and then provides the machine UUID for something like a full minute before it flashes the Proxmox BIOS splash screen where it says "Press ESC for boot menu". After that it loads normally. When the VM starts, the main Proxmox console (on the monitor connected to the server) says something about vfio-pci and the iommu group that card is in and vfio-ecap-init and hiding ecap or something. I don't see anything that's obviously about starting the VM in journalctl, but admittedly I don't know what I'm looking for. The last entries in Journalctl are about adding all the PCI devices to their respective iommu groups.

leesteken · Jun 5, 2021

Thanks a lot for sharing.

Maybe it is the initialization of the BIOS of the SATA controller. Some systems show messages before Linux boots, like network cards for booting from the network or hardware RAID cards to allow entering a setup screen. PCI devices can have ROM/firmware, just like a motherboard has a BIOS. Does the SATA controller show something like that during physical boot of your system? Maybe it does it again (and waits a minute for a key press) then starting the VM, as the devices is reset via VFIO and reinitialized by SeaBIOS. Does it also do it when using OVMF?
It could also be because of locking all memory into actual RAM, which is necessary because of PCI passthrough, but I guess not because the SATA controller is not the only passthrough for this VM. Maybe someone else has more experience with this?

Huecuva · Jun 5, 2021

No, there are no messages regarding the SATA controller initializing BIOS during physical boot of the server that I can see. At least, not ones that take any extra time or pause for a keypress.

I can never get my VMs to boot in OVMF mode. Even when installed that way they fail to boot and I have to switch it to SeaBIOS. I installed Proxmox in UEFI mode too, if that makes any difference. Still can't boot my VMs in OVMF mode. No idea why.

Also, this PCIe SATA card is the only thing I've gone out of my way to pass through to any VM so I'm not sure what you mean by it not being the only passthrough unless PVE passes through other things just by default, like the 8GB of RAM I assigned to it? I think you're probably right about it being the locking memory into RAM due to the PCI passthrough because it only started happening once the passthrough was successful. It's annoying that it takes so long, though. I mean, this is a Ryzen 7 3700x with 64GB RAM running Proxmox on a four disk RAIDZ2 of SSDs. You'd think it would have enough balls to do it significantly faster.

Search

Search

PCIe passthrough attempt results in rpool encountering uncorrectible I/O failure error and crashing PVE

Huecuva

New Member

leesteken

Distinguished Member

Huecuva

New Member

Huecuva

New Member

leesteken

Distinguished Member

Huecuva

New Member

Dunuin

Distinguished Member

Huecuva

New Member

leesteken

Distinguished Member

Huecuva

New Member

Dunuin

Distinguished Member

Huecuva

New Member

leesteken

Distinguished Member

Huecuva

New Member

leesteken

Distinguished Member

Huecuva

New Member

leesteken

Distinguished Member

Huecuva

New Member