MegaRaid controller issues Kernel 5.13.19-4-pve

retrojp

New Member
Apr 29, 2021
17
2
3
44
Scotland
After upgrading the kernel, I was having some issues with my hardware raid controller LSI MegaRAID SAS 2008 [Falcon] (rev 03). I've not had any problems with this for the several months i've been using it.

I normally pass this through to a VM, and noticed the error message...


Code:
Error: Cannot bind 0000:01:00.0 to vfio

I ran some commands and lsmod was not loading the module, which i believe the right thing to do, since i want it to pass through, but more weird was one of my drives missing from fdisk -l or blkid. I would be able to find it with lsblock and smartctl --scan. The bug was not localized to any one drive. After each boot, it would be a different drive. There's no warning lights on the server, and the drives sound healthy. I was also having difficulty powering down, sometimes a black screen, i'd plug a keyboard and it would remain stuck with message of a keyboard being plugged in.

Anyway, i reverted back to 5.13.19-3-pve and everything seems ok again.

At the time, i never checked dmesg, i have looked through syslog and nothing jumps out at me (i'm no log specialist). I can include them if anyone wants them. There might have been one thing, but it might have been me doing a hot swap while checking the drives. I didn't get these messages 24hrs ago when it happened, so guessing it was hot swapping? Below messages.

These messages never appeared with the 5.13.19-3-pve kernel, but, they never appeared 24hrs ago either.

Code:
Feb  8 23:08:46 jp kernel: [80782.942506] blk_update_request: I/O error, dev sdb, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Feb  8 23:08:46 jp kernel: [80782.942642] Buffer I/O error on dev sdb, logical block 0, async page read
Feb  8 23:08:46 jp kernel: [80782.942781] sd 0:0:14:0: [sdb] tag#22 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
Feb  8 23:08:46 jp kernel: [80782.942918] sd 0:0:14:0: [sdb] tag#22 CDB: Read(10) 28 00 00 00 00 00 00 00 08 00
 
Last edited:
Hi, can you ensure that you upgrade to pve-kernel-5.13.19-4-pve in version 5.13.19-9, that is the same kernel ABI than the broken one (which was version 5.13.19-8), but with a regression reverted that sounds like your issue here.
 
Hi, can you ensure that you upgrade to pve-kernel-5.13.19-4-pve in version 5.13.19-9, that is the same kernel ABI than the broken one (which was version 5.13.19-8), but with a regression reverted that sounds like your issue here.
Thanks for getting back to me about this. Well, i can confirm broken one :)

Code:
root@jp:~# dpkg --list | egrep -i "5.13.19-|Architecture Description"
||/ Name                                 Version                        Architecture Description
ii  pve-kernel-5.13.19-2-pve             5.13.19-4                      amd64        The Proxmox PVE Kernel Image
ii  pve-kernel-5.13.19-3-pve             5.13.19-7                      amd64        The Proxmox PVE Kernel Image
ri  pve-kernel-5.13.19-4-pve             5.13.19-8                      amd64        The Proxmox PVE Kernel Image