Hi.
We have recently installed Proxmox 8.3.2 on a new Dell server. We have imported some VMs from one of our ESXi nodes which is going to be scrapped.
Everything went smooth during the import and on the few hours of operations but we have noticed that two out of eight VMs become stuck after 20-24 hours of operation. It looks like they are loosing access to their drives. We have restarted both of them and one was stuck few hours later again.
Below is the log from one of our VMs:
This time the OS was able to handle this issue and VM kept working, but same errors occurred during next 4 hours which eventually resulted in system complete freeze.
Virtual SCSI controller: LSI 53C895A
QEMU agent is installed in VM and it's enabled in Proxmox.
No swap on the host.
No logs from hosts which indicate any problems.
Hardware below is DELL PERC HW RAID with RAID5 SSD disks.
We have recently installed Proxmox 8.3.2 on a new Dell server. We have imported some VMs from one of our ESXi nodes which is going to be scrapped.
Everything went smooth during the import and on the few hours of operations but we have noticed that two out of eight VMs become stuck after 20-24 hours of operation. It looks like they are loosing access to their drives. We have restarted both of them and one was stuck few hours later again.
Below is the log from one of our VMs:
Code:
Jan 20 17:36:04 app-log kernel: scsi target2:0:6: No MSG IN phase after reselection
Jan 20 17:40:24 app-log kernel: sd 2:0:0:0: [sda] tag#335 ABORT operation started
Jan 20 17:40:24 app-log kernel: sd 2:0:0:0: ABORT operation timed-out.
Jan 20 17:40:24 app-log kernel: sd 2:0:0:0: [sda] tag#334 ABORT operation started
Jan 20 17:40:24 app-log kernel: sd 2:0:0:0: ABORT operation timed-out.
Jan 20 17:40:24 app-log kernel: sd 2:0:0:0: [sda] tag#333 ABORT operation started
Jan 20 17:40:24 app-log kernel: sd 2:0:0:0: ABORT operation timed-out.
Jan 20 17:40:24 app-log kernel: sd 2:0:0:0: [sda] tag#332 ABORT operation started
Jan 20 17:40:24 app-log kernel: sd 2:0:0:0: ABORT operation timed-out.
(...)
Jan 20 17:40:24 app-log kernel: sd 2:0:0:0: [sda] tag#335 DEVICE RESET operation started
Jan 20 17:40:24 app-log kernel: sd 2:0:0:0: DEVICE RESET operation timed-out.
Jan 20 17:40:24 app-log kernel: sd 2:0:1:0: [sdb] tag#339 DEVICE RESET operation started
Jan 20 17:40:24 app-log kernel: sd 2:0:1:0: DEVICE RESET operation timed-out.
(...)
Jan 20 17:40:24 app-log kernel: sd 2:0:6:0: BUS RESET operation complete.
Jan 20 17:40:24 app-log kernel: sd 2:0:6:0: Power-on or device reset occurred
Jan 20 17:40:24 app-log kernel: sd 2:0:0:0: Power-on or device reset occurred
This time the OS was able to handle this issue and VM kept working, but same errors occurred during next 4 hours which eventually resulted in system complete freeze.
Virtual SCSI controller: LSI 53C895A
QEMU agent is installed in VM and it's enabled in Proxmox.
No swap on the host.
No logs from hosts which indicate any problems.
Hardware below is DELL PERC HW RAID with RAID5 SSD disks.