mdadm crash or kernel dump

NickH

Active Member
Aug 13, 2020
33
1
28
64
I have a 3 * 18TB RAID5 in Proxmox which is configured as an NFS disk and used by various VM's as a data disk. A bit earlier today I had a stack trace the but the two key VM continued to work. However their data folders became unresponsive. I managed to shut down one VM normally after a long time. The other one would not shut down. I tried shutting down the VM and then lost control of it and had to do a hard reset. Everything has come back up OK and I am doing a RAID check but it will take 1.5 days to complete.

I am running Proxmox 7.4 - pve-manager/7.4-17/513c62be (running kernel: 5.15.143-1-pve) on an HPE ML110 Gen 9 with 72GB RAM. The O/S is on a separate HDD and the VMs on an NVMe drive.

The stack dump I got is attached.

Also from the syslog:
Code:
INFO: task md127_raid5:534 blocked for more than 120 seconds.

Can anyone give any insight into the issue or help me fix it? I think I saw the same sometime late last year.
 

Attachments

Last edited: