System crash mptSAS when starting VM incl. hardware passtrough.

easyronny

New Member
Jul 30, 2021
9
1
3
40
Netherlands
Dear Proxmox forum members,

I hope that someone on this forum could help me, because i could not find what i configured wrong or what is in this case exactly wrong.
I am fairly new to Proxmox and this is my third post on this forum, (first two are deleted reason unknow to me) but I have been working on this issue now for several weeks. I read a lot of forum posts and I keep falling back into the same error message.

I got this error on proxmox release 7.01 and also release 6.4 and both give the same error. (so it should be a human mistake i think)
After installation of proxmox I configured the first virtual machine (ID100) that includes a hardware PCI-Device with the OS Windows or Linux.
Also If I use a OVMF (uEFI) bios or SeaBIOS it does work in all situations It give me the same result,

On the console screen the following tekst is showing at the moment when a start a virtual machine that includes a hardware passtrough.
In this case a SAS to SATA controller I also try it with a Radeon RX480 and later a NVDIA GT 710.

After I press start in the web console for starting up a virtual machine the entire console is not responding anymore an i can only hardreset the physical machine.

The ERROR on the console screen (see also attachment):

mpt2sas_cm0 sending message unit reset !!
mpt2sas_cm0 sending message reset : SUCCESS
whci_hcd 0000:01:00.0: Remove state 4
usb_usb2: USB disconnect, device nummer 1
usb 1-6 USB Disconnect device nummer 2
usb 1-6.4 USB Disconnect device nummer 4
usb 1-6 .4.3 USB Disconnect device nummer 6
usb 1-7 USB Disconnect device nummer 3
usb 1-10 USB Disconnect device nummer 5
usb 1-10.4 USB Disconnect device nummer 7
whci_hcd 0000:01:00.0 USB bus 1 deregistered
R8169 0000:0a:00.0 eno1: Link is Down
vmbr0: port 1(eno1) entered diabled state
device eno1 left promiscuous mode
vmbr0: port 1(eno1) entered diabled state
ata3.00: disabled
sd 3:0:0:0: [sda] Synchronizing SCSI cache
sd 3:0:0:0: [sda] Synchronizing cache(10) failed: Result : hostbyte=DID_BAD_TARGET driverbyte+DRIVER_OK
sd 3:0:0:0: [sda] Stopping disk
sd 3:0:0:0: [sda] Start/Stop Unit failed: Result : hostbyte=DID_BAD_TARGET driverbyte+DRIVER_OK
ata4.00: disabled
sd 4:0:0:0: [sda] Synchronizing SCSI cache
sd 4:0:0:0: [sda] Synchronizing cache(10) failed: Result : hostbyte=DID_BAD_TARGET driverbyte+DRIVER_OK
sd 4:0:0:0: [sda] Stopping disk
sd 4:0:0:0: [sda] Start/Stop Unit failed: Result : hostbyte=DID_BAD_TARGET driverbyte+DRIVER_OK
ata5.00: disabled
sd 5:0:0:0: [sda] Synchronizing SCSI cache
sd 5:0:0:0: [sda] Synchronizing cache(10) failed: Result : hostbyte=DID_BAD_TARGET driverbyte+DRIVER_OK
sd 5:0:0:0: [sda] Stopping disk
sd 5:0:0:0: [sda] Start/Stop Unit failed: Result : hostbyte=DID_BAD_TARGET driverbyte+DRIVER_OK
ata6.00: disabled
zio pool=rpool vdev=/dev/disk/by-id/ata-CT1000BX500SSD1_2105E4F27C14-part3 error=5 type-1 offset=8607170560 size=4096 flags=180880
zio pool=rpool vdev=/dev/disk/by-id/ata-CT1000BX500SSD1_2105E4F27C14-part3 error=5 type-1 offset=270336 size=8192 flags=bc08c1
zio pool=rpool vdev=/dev/disk/by-id/ata-CT1000BX500SSD1_2105E4F27C14-part3 error=5 type-1 offset=999666229248 size=8192 flags=b08c1
zio pool=rpool vdev=/dev/disk/by-id/ata-CT1000BX500SSD1_2105E4F27C14-part3 error=5 type-1 offset=999666491392 size=8192 flags=b08c1
WARNING: Pool 'rpool' has encountered an uncorrectable I/O failure and has been suspended.
zio pool=rpool vdev=/dev/disk/by-id/ata-CT1000BX500SSD1_2105E4F27EF1-part3 error=5 type-2 offset=1811103239168 size=4096 flags=184880
zio pool=rpool vdev=/dev/disk/by-id/ata-CT1000BX500SSD1_2105E4F27EF1-part3 error=5 type-2 offset=163228299264 size=4096 flags=184880
io pool=rpool vdev=/dev/disk/by-id/ata-CT1000BX500SSD1_2105E4F27EF1-part3 error=5 type-2 offset=188982915072 size=4096 flags=184880
zio pool=rpool vdev=/dev/disk/by-id/ata-CT1000BX500SSD1_2105E4F27EF1-part3 error=5 type-2 offset=197587263488 size=8192 flags=40080c80
zio pool=rpool vdev=/dev/disk/by-id/ata-CT1000BX500SSD1_2105E4F27EF1-part3 error=5 type-2 offset=206163980288 size=8192 flags=40080c80

----- more of the same zio pool error message as indicated as above changes are the offset=, size= and flags= ----

last two lines are:
WARNING: Pool 'rpool' has encountered an uncorrectable I/O failure and has been suspended.
WARNING: Pool 'rpool' has encountered an uncorrectable I/O failure and has been suspended.


My Proxmox hardware config:
AMD Ryzen 3700X
Gigabyte B550 Aorus Pro V2 (1st PCIe slot is 16x-16x 2nd PCI 16x-8x 3rd PCI 16x-8x)
64GB DDR4 Crucial Memory (4x16GB)
2x 1TB Crucial BX500 SSD
Realtek 1GB Quad NIC (last PCI1x)
Asus Strix RX480
Dell Perc H200 (LSI SAS9211-8I) (Flashed in IT Mode) for a future virtual Xpenology config.

My Proxmox software config
ZFS RAID 0, with LZ4 compression (2x Crucial BX500 1TB SSD)
A Linux Network bond config 4 Realtek nics in a 802.3ad config.
2 CIFS connections one towards my NAS and the other to a domain controller.

The below hardware configurations I tested and the following is working or give me the above error message:

Config 1 (first preference):
1st PCI 16x-16x Asus Strix RX480
2nd PCI 16x-8x Dell Perc H200
3rd PCI 16x-8x (emty)
SATA0 Crucial BX500 SSD
SATA1 Crucial BX500 SSD
Passtrough Only PCI 16x-8x Dell Perc H200 (03:00.0)
Result : Above error message

Config 2:
1st PCI 16x-16x Dell Perc H200
2nd PCI 16x-8x Asus Strix RX480
3rd PCI 16x-8x (emty)
SATA0 Crucial BX500 SSD
SATA1 Crucial BX500 SSD
Passtrough Only PCI 16x-8x Dell Perc H200 (0b:00.0)
Result Working (only no VGA passtrough)

Config 4:
1st PCI 16x-16x Dell Perc H200
2nd PCI 16x-8x Asus Strix RX480
3rd PCI 16x-8x (emty)
SATA0 Crucial BX500 SSD
SATA1 Crucial BX500 SSD
Passtrough 1st PCI 16x-8x Dell Perc H200 (0b:00.0)
Passtrough 2nd PCI 16x-8x Asus Strix RX480 (03:00.0)
Error : Above error message

Config 4:
1st PCI 16x-16x Dell Perc H200
2nd PCI 16x-8x Asus Strix RX480
3rd PCI 16x-8x (emty)
SATA2 Crucial BX500 SSD
SATA3 Crucial BX500 SSD
Passtrough 1st PCI 16x-8x Dell Perc H200 (0b:00.0)
Passtrough 2nd PCI 16x-8x Asus Strix RX480 (03:00.0)
Error : Above error message

Config 5:
1st PCI 16x-16x Dell Perc H200
2nd PCI 16x-8x (emty)
3rd PCI 16x-8x Asus Strix RX480
SATA0 Crucial BX500 SSD
SATA1 Crucial BX500 SSD
Passtrough 1st PCI 16x-8x Dell Perc H200 (0b:00.0)
Result Working concept (only no VGA passtrough)

Config 5 (second preference):
1st PCI 16x-16x Asus Strix RX480
2nd PCI 16x-8x (emty)
3rd PCI 16x-8x Dell Perc H200
SATA0 Crucial BX500 SSD
SATA1 Crucial BX500 SSD
Result: Dell Perc H200 is not detected !


Bios setting changed form default:
Enabled : IMMO
Disabled: CSM Support (got also these errors) (if it is enabled same errors)
SATA Mode : AHCI

Changes to Proxmox config files:
Grub (Changes)
GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on"

Modules (added)
vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd

IOMMU interrupt remapping (unknow by me if it is needed)
echo "options vfio_iommu_type1 allow_unsafe_interrupts=1" > /etc/modprobe.d/iommu_unsafe_interrupts.conf
echo "options kvm ignore_msrs=1" > /etc/modprobe.d/kvm.conf

Steps like blacklist (VGA) drivers and Adding GPU to VFIO, have also been tried but gave the same result.

Conclusion: for now is that passtrough is only working via the first PCIe 16x-16x slot, i hope that it can also with the second 16x-8x slot that will have the the Dell PERC H200 card (LSI SAS9211-8I) connected. Regarding the manual of Gigabyte the third PCI 16x-8x is shared with other onboard devices (SATA port 5 and 6 and M2 connectors)

If someone has the fix to got it working, Dell Perc H200 and AMD RX480 both passtrough to seperated virtual machines I am very grateful because I am at the end of my ways. Sorry for my long post but this does reflect all I have tried to fix this, by my self.

Many thanks for you time and help,
Ronny V
 

Attachments

  • 2021_08_01_23_39_23_Photos.jpg
    2021_08_01_23_39_23_Photos.jpg
    585.1 KB · Views: 6
could you try to get a full log (netconsole/serial console?)?
 
@fabian and orthers

Serial console is for me not possible because i dont have a com port on my mainboard
How is a netconsole is working and what do i need to configure? This is new / unknow by me sorry.

There is only one think which i think that it could be the cause.
How can I check if the ACS (override) patch is applied to the Proxmox kernel?
I added to grub the below line only that did not change anything.
pcie_acs_override=downstream,multifunction

And how can I configure all my devices in seperated groups?
Because the devices in iommu_groups/13 are crashing whith each orther as fas as I can see now.

Code:
root@RVProxmox:~# find /sys/kernel/iommu_groups/ -type l

/sys/kernel/iommu_groups/17/devices/0000:0d:00.1

/sys/kernel/iommu_groups/7/devices/0000:00:07.0

/sys/kernel/iommu_groups/15/devices/0000:0c:00.0

/sys/kernel/iommu_groups/5/devices/0000:00:04.0

/sys/kernel/iommu_groups/13/devices/0000:03:00.0

/sys/kernel/iommu_groups/13/devices/0000:09:00.0

/sys/kernel/iommu_groups/13/devices/0000:02:00.0

/sys/kernel/iommu_groups/13/devices/0000:05:05.0

/sys/kernel/iommu_groups/13/devices/0000:08:00.0

/sys/kernel/iommu_groups/13/devices/0000:01:00.2

/sys/kernel/iommu_groups/13/devices/0000:01:00.0

/sys/kernel/iommu_groups/13/devices/0000:0a:00.0

/sys/kernel/iommu_groups/13/devices/0000:02:06.0

/sys/kernel/iommu_groups/13/devices/0000:07:00.0

/sys/kernel/iommu_groups/13/devices/0000:05:01.0

/sys/kernel/iommu_groups/13/devices/0000:06:00.0

/sys/kernel/iommu_groups/13/devices/0000:05:07.0

/sys/kernel/iommu_groups/13/devices/0000:02:08.0

/sys/kernel/iommu_groups/13/devices/0000:01:00.1

/sys/kernel/iommu_groups/13/devices/0000:05:03.0

/sys/kernel/iommu_groups/13/devices/0000:04:00.0

/sys/kernel/iommu_groups/3/devices/0000:00:03.0

/sys/kernel/iommu_groups/11/devices/0000:00:14.3

/sys/kernel/iommu_groups/11/devices/0000:00:14.0

/sys/kernel/iommu_groups/1/devices/0000:00:01.2

/sys/kernel/iommu_groups/18/devices/0000:0d:00.3

/sys/kernel/iommu_groups/8/devices/0000:00:07.1

/sys/kernel/iommu_groups/16/devices/0000:0d:00.0

/sys/kernel/iommu_groups/6/devices/0000:00:05.0

/sys/kernel/iommu_groups/14/devices/0000:0b:00.0

/sys/kernel/iommu_groups/14/devices/0000:0b:00.1

/sys/kernel/iommu_groups/4/devices/0000:00:03.1

/sys/kernel/iommu_groups/12/devices/0000:00:18.3

/sys/kernel/iommu_groups/12/devices/0000:00:18.1

/sys/kernel/iommu_groups/12/devices/0000:00:18.6

/sys/kernel/iommu_groups/12/devices/0000:00:18.4

/sys/kernel/iommu_groups/12/devices/0000:00:18.2

/sys/kernel/iommu_groups/12/devices/0000:00:18.0

/sys/kernel/iommu_groups/12/devices/0000:00:18.7

/sys/kernel/iommu_groups/12/devices/0000:00:18.5

/sys/kernel/iommu_groups/2/devices/0000:00:02.0

/sys/kernel/iommu_groups/10/devices/0000:00:08.1

/sys/kernel/iommu_groups/0/devices/0000:00:01.0

/sys/kernel/iommu_groups/19/devices/0000:0d:00.4

/sys/kernel/iommu_groups/9/devices/0000:00:08.0

root@RVProxmox:~#

Code:
root@RVProxmox:~# lspci

00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Root Complex

00:00.2 IOMMU: Advanced Micro Devices, Inc. [AMD] Starship/Matisse IOMMU

00:01.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge

00:01.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge

00:02.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge

00:03.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge

00:03.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge

00:04.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge

00:05.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge

00:07.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge

00:07.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B]

00:08.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge

00:08.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B]

00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller (rev 61)

00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge (rev 51)

00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 0

00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 1

00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 2

00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 3

00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 4

00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 5

00:18.6 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 6

00:18.7 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 7

01:00.0 USB controller: Advanced Micro Devices, Inc. [AMD] Device 43ee

01:00.1 SATA controller: Advanced Micro Devices, Inc. [AMD] Device 43eb

01:00.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 43e9

02:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 43ea

02:06.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 43ea

02:08.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 43ea

03:00.0 Serial Attached SCSI controller: Broadcom / LSI SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] (rev 03)

04:00.0 PCI bridge: ASMedia Technology Inc. ASM1184e PCIe Switch Port

05:01.0 PCI bridge: ASMedia Technology Inc. ASM1184e PCIe Switch Port

05:03.0 PCI bridge: ASMedia Technology Inc. ASM1184e PCIe Switch Port

05:05.0 PCI bridge: ASMedia Technology Inc. ASM1184e PCIe Switch Port

05:07.0 PCI bridge: ASMedia Technology Inc. ASM1184e PCIe Switch Port

06:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 07)

07:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 07)

08:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 07)

09:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 07)

0a:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8125 2.5GbE Controller (rev 05)

0b:00.0 VGA compatible controller: NVIDIA Corporation GK208B [GeForce GT 710] (rev a1)

0b:00.1 Audio device: NVIDIA Corporation GK208 HDMI/DP Audio Controller (rev a1)

0c:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Function

0d:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Reserved SPP

0d:00.1 Encryption controller: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Cryptographic Coprocessor PSPCPP

0d:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Matisse USB 3.0 Host Controller

0d:00.4 Audio device: Advanced Micro Devices, Inc. [AMD] Starship/Matisse HD Audio Controller
 
Last edited:
Hi @fabian and others,

It looks that the passtrough is working now only im not sure what the solution was.
The following items where executed by me.

1. I did a clean install on a btrfs volume (instead of ZFS)
2. I did a kernel update to version 5.12.2-acso link
A. Tested and passtrough did not work.

3. Change to config files grub and modules as descibed in link.
4. Install the newly drivers that are provided by @fabian in link.

B. Test it again and now im able to passtrough a
1. Dell PERC H200 to a virtual Xpenology system conntect on a 16x8 PCIe slot.
2. Asus RX480 to a virtual Windows 10 system connected on a 16x16 PCIe slot.

Many many thanks @fabian i think the kernel update or the new drivers provided by you solved the issue for me.

For now i will start 3 virtual machines (Windows 10, Windows Server 2019 and Xpeno DSM) and check if everything is working (no memory leaks).

I will post a update later on this week.


Ronny
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!