Backups break virtual machines

dev-null

Member
Nov 10, 2015
7
1
23
Dear all,

we have a problem regarding virtual machines breaking down during backup, we are trying for a while to figure this problem out.

First here are the versions

proxmox-ve: 4.1-28 (running kernel: 4.2.6-1-pve)
pve-manager: 4.1-2 (running version: 4.1-2/78c5f4a2)
pve-kernel-4.2.6-1-pve: 4.2.6-28
pve-kernel-4.2.2-1-pve: 4.2.2-16
pve-kernel-4.2.3-2-pve: 4.2.3-22
lvm2: 2.02.116-pve2
corosync-pve: 2.3.5-2
libqb0: 0.17.2-1
pve-cluster: 4.0-29
qemu-server: 4.0-42
pve-firmware: 1.1-7
libpve-common-perl: 4.0-42
libpve-access-control: 4.0-10
libpve-storage-perl: 4.0-38
pve-libspice-server1: 0.12.5-2
vncterm: 1.2-1
pve-qemu-kvm: 2.4-18
pve-container: 1.0-35
pve-firewall: 2.0-14
pve-ha-manager: 1.0-16
ksm-control-daemon: 1.2-1
glusterfs-client: 3.7.6-1
lxc-pve: 1.1.5-5
lxcfs: 0.13-pve2
cgmanager: 0.39-pve1
criu: 1.6.0-1
zfsutils: 0.6.5-pve6~jessie

We do not these are not the latest version, but i checked all changelogs and nothing indicates something about this problem, we will upgrade to the latest patches soon.

The environment is as follows

4 Proxmox Servers with the version above
2 Servers for NFS in a mirrored Setup for the VMS
1 Server with NFS for Backup running opendedup for deduplication.

- the storage traffic runs within a dedicated storage network
- the backup machine with the opendedup has more than enough memory and is running fine
- the NFS servers for the virtual machines are running fine and show no signs of problems

The problem with virtual machines breaking down occurs sporadically and not predictable on specific machines, the only correlation we can (not confirmed, only a guess so far) see is that we have a higher load on the guest, e.g. through a puppet run or things like that.
The guests are breaking down in the middle of the backup, not at the end or the beginning.
We already had this problem when using ZFS as the backup target with deduplication - with opendedup it got much better, but still is not gone.

We can also see that when this problem happens, the guests react very slow and sometimes (sometimes because i was not able to confirm that it is every time) have a very high load. We are having a hard time figuring out where the problem is, especially because the backup is done off a snapshot which should not have any impact on the running guest.

Does anybody have an idea where to look ?

We would be very happy if we can solve this problem, because especially with opendedup we are experiencing very good deduplication ratios, to give you an idea see the following figures:

----

Volume Capacity : 35 TB
Volume Current Logical Size : 12.33 TB
Volume Max Percentage Full : 95.0%
Volume Duplicate Data Written : 11.89 TB
Unique Blocks Stored: 2.93 TB
Unique Blocks Stored after Compression : 1.63 TB
Cluster Block Copies : 2
Volume Virtual Dedup Rate (Unique Blocks Stored/Current Size) : -7.36%
Volume Actual Storage Savings (Compressed Unique Blocks Stored/Current Size) : 86.76%
Compression Rate: 44.18%

---

So, we are stumped, any hints into the right direction ?
Any additional data that might help ?

Regards
Soeren
 
Hi,

i read this post and the answers, however, the problem here is not even the poor performance it is virtual machines that are crashing.

The network connection for storage network and backup is the same, but, i have 4 x 1Gbit on each server (computes, storage and backup) and the utilization is below 50% even when looking at each physical link separately.

I also do understand that backups might take a long time, but how can the backup have such a bad impact on the running virtual machine, that is the real problem, i looked at the timing again, it could be that the problem occurs when the snapshot is being done, but this should not have such an impact.

Regards
Soeren
 
So when you do backups it goes like this ?

VM-Storage-Server(s) <-> NFS <-> Proxmox-Node <-> Vzdump <-> NFS <-> Opendedup <-> Backup-Server ?

You say you use 4x1G - what config you using ? If bonded, which type ?
Have you checked to see what your VM-Storage Servers Storage-Subsystem is doing IO-wise ?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!