Unresponsive VM during backups

Jan 24, 2017
8
0
1
43
Dear all,

We have a recurring issue where our VM become either very slow or unresponsive during backups. We have a cluster of 6 hypervisors (3 with SAS disks, 2 with SSDs, 1 with SATA disks, all servers have 3 identical disks on a LSI MegaRAID RAID5) backing up to a shared NFS mount (1 Gbps link). All hypervisors have the issue and since we created the cluster (several months ago).

The backup is configured as follow:
- Compression: LZO
- Mode: Snapshot

I agree a small freeze is required to snapshot, but this is not the case. The slowness/unresponsiveness lasts as long as the backup lasts.

We were able to get some logs from the VM during the issue, the following message is displayed (varying CPU ID/time): NMI watchdog: BUG: soft lockup - CPU#14 stuck for 22s

Any idea how to improve it?

Thanks!
 
Hi,
I agree a small freeze is required to snapshot,
I assume you are talking about KVM/Qemu VM's. There is no snapshot on file layer the snapshot happens on Qemu level.

Your problem is your NFS is to slow or/and your VM write to much.
To solve this problem limit the backup bandwidth.

See man vzdump
 
Dear Wolfgang,

Thanks for your reply. I am indeed talking about KVM/Qemu VM's.

I limited the bandwidth to 10 MBps on the SSD hypervisor and as soon as the backup starts the load peaks on the VM (but not on the hypervisor).

Best regards.