Hi, we are experiencing regular cuts and guest system hangs during backups over a NFS share . Sometimes even root disk complete corruption (mbr corruption since we are able to see partitions after recovery attemps).
Oddly the backup is done and is ok once we try to recover the corrupted system.
Unfortunately, even if the backup is done and the machine is not corrupted the system hangs to the point we need to restart it to recover it.
When it happens it seems than the disk access gets stuck and it produces zombie proccess if we try to kill them (when we are able to access them with a already ssh opened terminal)
this is happening in a regular basis but not all the time. Sometimes it works and the guest resist the backup.
Some info:
we are in 5.0-23.
We have 4 guest with debian 8
The disk subsystem is RAID 5 with SSD (samsung) and storage is LVM-THIN.
The guest themselves are configured with root lvm fs as well.
The network is e1000 in all the guest
we have 40 cores and 64GB RAM for 4 machines (Zimbra mail system) so we have plenty of power yet
the guest disk controller is a mix of SATA (for root) and SCSI (SCSI VIRTIO) .
Any clue or advice to debug this problem.
thanks
Oddly the backup is done and is ok once we try to recover the corrupted system.
Unfortunately, even if the backup is done and the machine is not corrupted the system hangs to the point we need to restart it to recover it.
When it happens it seems than the disk access gets stuck and it produces zombie proccess if we try to kill them (when we are able to access them with a already ssh opened terminal)
this is happening in a regular basis but not all the time. Sometimes it works and the guest resist the backup.
Some info:
we are in 5.0-23.
We have 4 guest with debian 8
The disk subsystem is RAID 5 with SSD (samsung) and storage is LVM-THIN.
The guest themselves are configured with root lvm fs as well.
The network is e1000 in all the guest
we have 40 cores and 64GB RAM for 4 machines (Zimbra mail system) so we have plenty of power yet
the guest disk controller is a mix of SATA (for root) and SCSI (SCSI VIRTIO) .
Any clue or advice to debug this problem.
thanks