Hardware Raid lost after Guest reboot

maltej

Member
Feb 21, 2022
7
0
6
36
Hi,

after several years of using OpenMediaVault (OMV) in a standalone server I want to "upgrade".
I have an older Xeon Motherboard (C236A/MSI) with an Adapter 6805T Raid controller.

Years ago i used the Board without the Hardware Raid Controller with an Arch System and several VMs (e.g. OMV and Windows) which worked well, but was a pain for
->me<- to maintain. I always was afraid to kill the whole setup.

So i decided now to use the Board+Raid with Proxmox. I passthrough the Raid Controller (4 Disk, Raid 10) to the OMV VM. Also inside the VM is the Adaptec Software to control the Raid Controller (website with Smart Info, Mail notification,...)

The setup works well, until I want to reboot the OMV VM. It seems like the Raid Controller doesnt get recognized on the second boot of the VM itself. It works if I reboot the whole Proxmox Server. But then I always have to shutdown all other VMs


The error message after reboot of the VM is:
aacraid: aac_fib_send: first asynchronous command timed out. Usually a result of a PCI interrupt routing problem; update mother board BIOS or consider utilizing one of the SAFE mode kernel options (acpi, apic etc)

During boot of the VM the Raid controller detects the Disks, but the whole Raid isnt visible inside OMV anymore. Bios is updated to the latest firmware

Here is the same error message, but i dont have any BIOS setting similar to "Turned off IRQ Emulation in BIOS for support of old PCI" which solved the Problem
https://forum.proxmox.com/threads/proxmox-adaptec-raid-5805q.52970/

- Any hints on how to get rid of this error?
- Is it a "good" idea the way I did it?
- How to identify if the problem is the OMV VM or Proxmox
- Is there a workaround? I think not to passthrough the controller and attach the file system to only! one VM. Would this be ok or is there a chance that Proxmox access the raid at the same time like the VM an "corrupt" data?

If it does not work with this Adaptec Controller i would sell it again
-> does anyone have a suggestion of a Hardware Controller Raid10 (max 200 dollar used) with ideally a Cache/SSD

Thanks and sorry for this ling post
 
The setup works well, until I want to reboot the OMV VM. It seems like the Raid Controller doesnt get recognized on the second boot of the VM itself. It works if I reboot the whole Proxmox Server. But then I always have to shutdown all other VMs

- Any hints on how to get rid of this error?
Sounds like the common "reset bug" to me, where a PCIe device can only be initialized once.
 
from: https://wiki.archlinux.org/title/PC...ough_a_device_that_does_not_support_resetting
Since Libvirt and Qemu both expect all host PCI devices to be ready to reattach to the host before completely stopping the virtual machine, when encountering a device that will not reset, they will hang in a "Shutting down" state where they will not be able to be restarted until the host system has been rebooted. It is therefore recommended to only pass through PCI devices which the kernel is able to reset, as evidenced by the presence of a reset file in the PCI device sysfs node, such as /sys/bus/pci/devices/0000:00:1a.0/reset.

My IOMMUs:
IOMMU Group 0: 00:00.0 Host bridge [0600]: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Host Bridge/DRAM Registers [8086:1918] (rev 07) IOMMU Group 1: 00:01.0 PCI bridge [0604]: Intel Corporation 6th-10th Gen Core Processor PCIe Controller (x16) [8086:1901] (rev 07) 00:01.1 PCI bridge [0604]: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor PCIe Controller (x8) [8086:1905] (rev 07) 02:00.0 RAID bus controller [0104]: Adaptec Series 6 - 6G SAS/PCIe 2 [9005:028b] (rev 01) IOMMU Group 2: 00:02.0 VGA compatible controller [0300]: Intel Corporation HD Graphics P530 [8086:191d] (rev 06) IOMMU Group 3: 00:08.0 System peripheral [0880]: Intel Corporation Xeon E3-1200 v5/v6 / E3-1500 v5 / 6th/7th/8th Gen Core Processor Gaussian Mixture Model [8086:1911] IOMMU Group 4: 00:14.0 USB controller [0c03]: Intel Corporation 100 Series/C230 Series Chipset Family USB 3.0 xHCI Controller [8086:a12f] (rev 31) 00:14.2 Signal processing controller [1180]: Intel Corporation 100 Series/C230 Series Chipset Family Thermal Subsystem [8086:a131] (rev 31) IOMMU Group 5: 00:16.0 Communication controller [0780]: Intel Corporation 100 Series/C230 Series Chipset Family MEI Controller #1 [8086:a13a] (rev 31) IOMMU Group 6: 00:17.0 SATA controller [0106]: Intel Corporation Q170/Q150/B150/H170/H110/Z170/CM236 Chipset SATA Controller [AHCI Mode] [8086:a102] (rev 31) IOMMU Group 7: 00:1b.0 PCI bridge [0604]: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #17 [8086:a167] (rev f1) IOMMU Group 8: 00:1b.2 PCI bridge [0604]: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #19 [8086:a169] (rev f1) IOMMU Group 9: 00:1c.0 PCI bridge [0604]: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #1 [8086:a110] (rev f1) IOMMU Group 10: 00:1d.0 PCI bridge [0604]: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #9 [8086:a118] (rev f1) IOMMU Group 11: 00:1f.0 ISA bridge [0601]: Intel Corporation C236 Chipset LPC/eSPI Controller [8086:a149] (rev 31) 00:1f.2 Memory controller [0580]: Intel Corporation 100 Series/C230 Series Chipset Family Power Management Controller [8086:a121] (rev 31) 00:1f.3 Audio device [0403]: Intel Corporation 100 Series/C230 Series Chipset Family HD Audio Controller [8086:a170] (rev 31) 00:1f.4 SMBus [0c05]: Intel Corporation 100 Series/C230 Series Chipset Family SMBus [8086:a123] (rev 31) IOMMU Group 12: 00:1f.6 Ethernet controller [0200]: Intel Corporation Ethernet Connection (2) I219-V [8086:15b8] (rev 31) IOMMU Group 13: 04:00.0 Ethernet controller [0200]: Mellanox Technologies MT27500 Family [ConnectX-3] [15b3:1003] IOMMU Group 14: 05:00.0 USB controller [0c03]: ASMedia Technology Inc. ASM1142 USB 3.1 Host Controller [1b21:1242] IOMMU Group 15: 06:00.0 Non-Volatile memory controller [0108]: Sandisk Corp WD Blue SN550 NVMe SSD [15b7:5009] (rev 01)
Raid Contoller isit 2:00.0
If I look for "reset" i find:
root@pve:~# sudo cat /sys/bus/pci/devices/0000\:02\:00.0/reset reset reset_method

But if i ran the script out of the link from above, the Raid Controller appears, which means it does not support the reset. But in the file system I can find the reset which indicates it should
root@pve:~# for iommu_group in $(find /sys/kernel/iommu_groups/ -maxdepth 1 -mindepth 1 -type d);do echo "IOMMU group $(basename "$iommu_group")"; for device in $(\ls -1 "$iommu_group"/devices/); do if [[ -e "$iommu_group"/devices/"$device"/reset ]]; then echo -n "[RESET]"; fi; echo -n $'\t';lspci -nns "$device"; done; done IOMMU group 7 [RESET] 00:1b.0 PCI bridge [0604]: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #17 [8086:a167] (rev f1) IOMMU group 15 [RESET] 06:00.0 Non-Volatile memory controller [0108]: Sandisk Corp WD Blue SN550 NVMe SSD [15b7:5009] (rev 01) IOMMU group 5 00:16.0 Communication controller [0780]: Intel Corporation 100 Series/C230 Series Chipset Family MEI Controller #1 [8086:a13a] (rev 31) IOMMU group 13 [RESET] 04:00.0 Ethernet controller [0200]: Mellanox Technologies MT27500 Family [ConnectX-3] [15b3:1003] IOMMU group 3 [RESET] 00:08.0 System peripheral [0880]: Intel Corporation Xeon E3-1200 v5/v6 / E3-1500 v5 / 6th/7th/8th Gen Core Processor Gaussian Mixture Model [8086:1911] IOMMU group 11 00:1f.0 ISA bridge [0601]: Intel Corporation C236 Chipset LPC/eSPI Controller [8086:a149] (rev 31) 00:1f.2 Memory controller [0580]: Intel Corporation 100 Series/C230 Series Chipset Family Power Management Controller [8086:a121] (rev 31) 00:1f.3 Audio device [0403]: Intel Corporation 100 Series/C230 Series Chipset Family HD Audio Controller [8086:a170] (rev 31) 00:1f.4 SMBus [0c05]: Intel Corporation 100 Series/C230 Series Chipset Family SMBus [8086:a123] (rev 31) IOMMU group 1 00:01.0 PCI bridge [0604]: Intel Corporation 6th-10th Gen Core Processor PCIe Controller (x16) [8086:1901] (rev 07) 00:01.1 PCI bridge [0604]: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor PCIe Controller (x8) [8086:1905] (rev 07) [RESET] 02:00.0 RAID bus controller [0104]: Adaptec Series 6 - 6G SAS/PCIe 2 [9005:028b] (rev 01) IOMMU group 8 [RESET] 00:1b.2 PCI bridge [0604]: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #19 [8086:a169] (rev f1) IOMMU group 6 00:17.0 SATA controller [0106]: Intel Corporation Q170/Q150/B150/H170/H110/Z170/CM236 Chipset SATA Controller [AHCI Mode] [8086:a102] (rev 31) IOMMU group 14 [RESET] 05:00.0 USB controller [0c03]: ASMedia Technology Inc. ASM1142 USB 3.1 Host Controller [1b21:1242] IOMMU group 4 00:14.0 USB controller [0c03]: Intel Corporation 100 Series/C230 Series Chipset Family USB 3.0 xHCI Controller [8086:a12f] (rev 31) 00:14.2 Signal processing controller [1180]: Intel Corporation 100 Series/C230 Series Chipset Family Thermal Subsystem [8086:a131] (rev 31) IOMMU group 12 [RESET] 00:1f.6 Ethernet controller [0200]: Intel Corporation Ethernet Connection (2) I219-V [8086:15b8] (rev 31) IOMMU group 2 [RESET] 00:02.0 VGA compatible controller [0300]: Intel Corporation HD Graphics P530 [8086:191d] (rev 06) IOMMU group 10 [RESET] 00:1d.0 PCI bridge [0604]: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #9 [8086:a118] (rev f1) IOMMU group 0 00:00.0 Host bridge [0600]: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Host Bridge/DRAM Registers [8086:1918] (rev 07) IOMMU group 9 [RESET] 00:1c.0 PCI bridge [0604]: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #1 [8086:a110] (rev f1)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!