On new Dell T160 I installed latest version of proxmox (updated yesterday).
Yesterday I also migrated the fist vm (windows server 2019) and I noticed that during some operations where it has to operate on disks significantly it got to the point of blocking, in 2 cases the vm even crashed.
I found the cause in the host logs, reset of the controller/disks, here some example:
kernel: sd 0:0:2:0: [sdb] tag#388 BRCM Debug mfi stat 0x2d, data len requested/completed 0x800/0x0
kernel: sd 0:0:3:0: [sdc] tag#327 BRCM Debug mfi stat 0x2d, data len requested/completed 0x30000/0x0
kernel: sd 0:0:3:0: Power-on or device reset occurred
I also had issue on other megaraid controller of older servers and kernel 6.8 I solved adding these parameters "intel_iommu=on iommu=pt" or using kernel 6.5.
I first tried the parameters and also disabling pcie power management "pcie_aspm=off" (based on another search) but not solved.
I also tried to install and boot kernel 6.5 (more exactly 6.5.13-6-pve) but the issue persist and I don't found other things to try.
Controller firmware is already updated, disks don't seem to have issue, are Samsung SSD 870 EVO 1TB setted as JBOD on controller and with software raid1.
Has anyone had a similar problem and can tell me how to fix it or what to try?
Yesterday I also migrated the fist vm (windows server 2019) and I noticed that during some operations where it has to operate on disks significantly it got to the point of blocking, in 2 cases the vm even crashed.
I found the cause in the host logs, reset of the controller/disks, here some example:
kernel: sd 0:0:2:0: [sdb] tag#388 BRCM Debug mfi stat 0x2d, data len requested/completed 0x800/0x0
kernel: sd 0:0:3:0: [sdc] tag#327 BRCM Debug mfi stat 0x2d, data len requested/completed 0x30000/0x0
kernel: sd 0:0:3:0: Power-on or device reset occurred
I also had issue on other megaraid controller of older servers and kernel 6.8 I solved adding these parameters "intel_iommu=on iommu=pt" or using kernel 6.5.
I first tried the parameters and also disabling pcie power management "pcie_aspm=off" (based on another search) but not solved.
I also tried to install and boot kernel 6.5 (more exactly 6.5.13-6-pve) but the issue persist and I don't found other things to try.
Controller firmware is already updated, disks don't seem to have issue, are Samsung SSD 870 EVO 1TB setted as JBOD on controller and with software raid1.
Has anyone had a similar problem and can tell me how to fix it or what to try?