Crash Proxmox 7 mptSAS when starting VM that incl. HW passtrough.

easyronny

New Member
Jul 30, 2021
9
1
3
41
Netherlands
Dear Proxmox forum members,

I hope that someone on this forum could help me, because i could not find what i configured wrong or what is exactly wrong.
I am fairly new to Proxmox and this is my first post on this forum, but I have been working on this issue now for several weeks.
I read a lot of forum posts an reddit articles and I keep falling back into the same error message.

I got this on proxmox release 7.01 and also release 6.4 and both give the same error. (so it should be a human mistake i think)
After installation of proxmox i configure the first virtual machine (ID100) that includes a hardware PCI-Device with the OS Windows or Linux it does not matter.
Also when I use a OVMF (uEFI) bios or SeaBIOS it does work in all situations It give me the same result,

On the console screen the following tekst is showing at the moment when a start a virtual machine that includes a hardware passtrough.
In this case a SAS to SATA controller I also try it with a Radeon RX480 and a NVDIA GT 710.

After I press start in the web console for starting a virtual machine the entire console is not responding anymore an i can only hardreset the physical machine.


Console ERROR screen (see also attachment):

mpt2sas_cm0 sending message unit reset !!
mpt2sas_cm0 sending message reset : SUCCESS
whci_hcd 0000:01:00.0: Remove state 4
usb_usb2: USB disconnect, device nummer 1
usb 1-6 USB Disconnect device nummer 2
usb 1-6.4 USB Disconnect device nummer 4
usb 1-6 .4.3 USB Disconnect device nummer 6
usb 1-7 USB Disconnect device nummer 3
usb 1-10 USB Disconnect device nummer 5
usb 1-10.4 USB Disconnect device nummer 7
whci_hcd 0000:01:00.0 USB bus 1 deregistered
R8169 0000:0a:00.0 eno1: Link is Down
vmbr0: port 1(eno1) entered diabled state
device eno1 left promiscuous mode
vmbr0: port 1(eno1) entered diabled state
ata3.00: disabled
sd 3:0:0:0: [sda] Synchronizing SCSI cache
sd 3:0:0:0: [sda] Synchronizing cache(10) failed: Result : hostbyte=DID_BAD_TARGET driverbyte+DRIVER_OK
sd 3:0:0:0: [sda] Stopping disk
sd 3:0:0:0: [sda] Start/Stop Unit failed: Result : hostbyte=DID_BAD_TARGET driverbyte+DRIVER_OK
ata4.00: disabled
sd 4:0:0:0: [sda] Synchronizing SCSI cache
sd 4:0:0:0: [sda] Synchronizing cache(10) failed: Result : hostbyte=DID_BAD_TARGET driverbyte+DRIVER_OK
sd 4:0:0:0: [sda] Stopping disk
sd 4:0:0:0: [sda] Start/Stop Unit failed: Result : hostbyte=DID_BAD_TARGET driverbyte+DRIVER_OK
ata5.00: disabled
sd 5:0:0:0: [sda] Synchronizing SCSI cache
sd 5:0:0:0: [sda] Synchronizing cache(10) failed: Result : hostbyte=DID_BAD_TARGET driverbyte+DRIVER_OK
sd 5:0:0:0: [sda] Stopping disk
sd 5:0:0:0: [sda] Start/Stop Unit failed: Result : hostbyte=DID_BAD_TARGET driverbyte+DRIVER_OK
ata6.00: disabled
zio pool=rpool vdev=/dev/disk/by-id/ata-CT1000BX500SSD1_2105E4F27C14-part3 error=5 type-1 offset=8607170560 size=4096 flags=180880
zio pool=rpool vdev=/dev/disk/by-id/ata-CT1000BX500SSD1_2105E4F27C14-part3 error=5 type-1 offset=270336 size=8192 flags=bc08c1
zio pool=rpool vdev=/dev/disk/by-id/ata-CT1000BX500SSD1_2105E4F27C14-part3 error=5 type-1 offset=999666229248 size=8192 flags=b08c1
zio pool=rpool vdev=/dev/disk/by-id/ata-CT1000BX500SSD1_2105E4F27C14-part3 error=5 type-1 offset=999666491392 size=8192 flags=b08c1
WARNING: Pool 'rpool' has encountered an uncorrectable I/O failure and has been suspended.
zio pool=rpool vdev=/dev/disk/by-id/ata-CT1000BX500SSD1_2105E4F27EF1-part3 error=5 type-2 offset=1811103239168 size=4096 flags=184880
zio pool=rpool vdev=/dev/disk/by-id/ata-CT1000BX500SSD1_2105E4F27EF1-part3 error=5 type-2 offset=163228299264 size=4096 flags=184880
io pool=rpool vdev=/dev/disk/by-id/ata-CT1000BX500SSD1_2105E4F27EF1-part3 error=5 type-2 offset=188982915072 size=4096 flags=184880
zio pool=rpool vdev=/dev/disk/by-id/ata-CT1000BX500SSD1_2105E4F27EF1-part3 error=5 type-2 offset=197587263488 size=8192 flags=40080c80
zio pool=rpool vdev=/dev/disk/by-id/ata-CT1000BX500SSD1_2105E4F27EF1-part3 error=5 type-2 offset=206163980288 size=8192 flags=40080c80

----- 25 of the same zio pool error message as indicated as above changes are the offset=, size= and flags= ----

last two lines are:
WARNING: Pool 'rpool' has encountered an uncorrectable I/O failure and has been suspended.
WARNING: Pool 'rpool' has encountered an uncorrectable I/O failure and has been suspended.


My configuration:
AMD Ryzen 3700X
Gigabyte B550 Aorus Pro V2 (1st PCIe slot is 16x-16x 2nd PCI 16x-8x 3rd PCI 16x-8x)
64GB DDR4 Crucial Memory (4x16GB)
2x 1TB Crucial BX500 SSD
Realtek 1GB Quad NIC (last PCI1x)
Asus Strix RX480
Dell Perc H200 (Flashed in IT Mode) for a future virtual Xpenology config.

My Proxmox configuration.
ZFS RAID 0, with LZ4 compression on both Crucial BX500 1TB SSD
A Linux Network bond config 4 Realtek nics in a 802.3ad setup
2 CIFS connections towards my NAS and domain controller.

The below configurations I tested and the following is working or give me the above error message:

Config 1:
1st PCI 16x-16x Asus Strix RX480
2nd PCI 16x-8x Dell Perc H200
3rd PCI 16x-8x (emty)
SATA0 Crucial BX500 SSD
SATA1 Crucial BX500 SSD
Passtrough Only PCI 16x-8x Dell Perc H200 (03:00.0)
Result : Above error message

Config 2:
1st PCI 16x-16x Dell Perc H200
2nd PCI 16x-8x Asus Strix RX480
3rd PCI 16x-8x (emty)
SATA0 Crucial BX500 SSD
SATA1 Crucial BX500 SSD
Passtrough Only PCI 16x-8x Dell Perc H200 (0b:00.0)
Result Working (only no VGA passtrough)

Config 4:
1st PCI 16x-16x Dell Perc H200
2nd PCI 16x-8x Asus Strix RX480
3rd PCI 16x-8x (emty)
SATA0 Crucial BX500 SSD
SATA1 Crucial BX500 SSD
Passtrough 1st PCI 16x-8x Dell Perc H200 (0b:00.0)
Passtrough 2nd PCI 16x-8x Asus Strix RX480 (03:00.0)
Error : Above error message

Config 4:
1st PCI 16x-16x Dell Perc H200
2nd PCI 16x-8x Asus Strix RX480
3rd PCI 16x-8x (emty)
SATA2 Crucial BX500 SSD
SATA3 Crucial BX500 SSD
Passtrough 1st PCI 16x-8x Dell Perc H200 (0b:00.0)
Passtrough 2nd PCI 16x-8x Asus Strix RX480 (03:00.0)
Error : Above error message

Config 5:
1st PCI 16x-16x Dell Perc H200
2nd PCI 16x-8x (emty)
3rd PCI 16x-8x Asus Strix RX480
SATA0 Crucial BX500 SSD
SATA1 Crucial BX500 SSD
Passtrough 1st PCI 16x-8x Dell Perc H200 (0b:00.0)
Result Working concept (only no VGA passtrough)

Config 6:
1st PCI 16x-16x Asus Strix RX480
2nd PCI 16x-8x (emty)
3rd PCI 16x-8x Dell Perc H200
SATA0 Crucial BX500 SSD
SATA1 Crucial BX500 SSD
Result: Dell Perc H200 is not detected !



Conclusion for now is that passtrough is only working via the first PCIe 16x-16x slot, i hope that I also can use the second 16x-8x slot for the Dell PERC H200 card.
Regarding the manual of Gigabyte the third PCI 16x-8x is shared with other onboard devices (SATA port 5 and 6 and M2 connectors).


Bios setting changed form default:
Enabled : IMMO
Disabled: CSM Support (got also these errors) if it is enabled: CSM Support (same errors)
SATA Mode : AHCI

Changes to Proxmox config files:
Grub (Changes)
GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on"

Modules (added)
vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd

IOMMU interrupt remapping (unknow by me if it is needed)
echo "options vfio_iommu_type1 allow_unsafe_interrupts=1" > /etc/modprobe.d/iommu_unsafe_interrupts.conf
echo "options kvm ignore_msrs=1" > /etc/modprobe.d/kvm.conf

Steps like blacklist (VGA) drivers and Adding GPU to VFIO, have also been tried but gave the same result.

If someone has the tip to fix this I am very grateful I am at the end of my ways.

Sorry for my long post but this does reflect all I have tried to fix this, by my self.


Many thanks for you time and help,
Ronny V
 

Attachments

  • 2021_08_01_23_39_23_Photos.jpg
    2021_08_01_23_39_23_Photos.jpg
    585.1 KB · Views: 5
Last edited:
Did you find a solution to this problem?
I'm trying to get gpu passthru working ( got it to work and have been using it on another box), the moment the vm is started it boots just fine BUT the sas controller that is not at all involved in this all of a sudden decides to drop all disks (luckily the os is on plain old sata drive so still ok)

So VM works fine with gpu passthru but kills the sas controller in the process of booting.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!