Dear all,
we have a problem regarding virtual machines breaking down during backup, we are trying for a while to figure this problem out.
First here are the versions
proxmox-ve: 4.1-28 (running kernel: 4.2.6-1-pve)
pve-manager: 4.1-2 (running version: 4.1-2/78c5f4a2)
pve-kernel-4.2.6-1-pve: 4.2.6-28
pve-kernel-4.2.2-1-pve: 4.2.2-16
pve-kernel-4.2.3-2-pve: 4.2.3-22
lvm2: 2.02.116-pve2
corosync-pve: 2.3.5-2
libqb0: 0.17.2-1
pve-cluster: 4.0-29
qemu-server: 4.0-42
pve-firmware: 1.1-7
libpve-common-perl: 4.0-42
libpve-access-control: 4.0-10
libpve-storage-perl: 4.0-38
pve-libspice-server1: 0.12.5-2
vncterm: 1.2-1
pve-qemu-kvm: 2.4-18
pve-container: 1.0-35
pve-firewall: 2.0-14
pve-ha-manager: 1.0-16
ksm-control-daemon: 1.2-1
glusterfs-client: 3.7.6-1
lxc-pve: 1.1.5-5
lxcfs: 0.13-pve2
cgmanager: 0.39-pve1
criu: 1.6.0-1
zfsutils: 0.6.5-pve6~jessie
We do not these are not the latest version, but i checked all changelogs and nothing indicates something about this problem, we will upgrade to the latest patches soon.
The environment is as follows
4 Proxmox Servers with the version above
2 Servers for NFS in a mirrored Setup for the VMS
1 Server with NFS for Backup running opendedup for deduplication.
- the storage traffic runs within a dedicated storage network
- the backup machine with the opendedup has more than enough memory and is running fine
- the NFS servers for the virtual machines are running fine and show no signs of problems
The problem with virtual machines breaking down occurs sporadically and not predictable on specific machines, the only correlation we can (not confirmed, only a guess so far) see is that we have a higher load on the guest, e.g. through a puppet run or things like that.
The guests are breaking down in the middle of the backup, not at the end or the beginning.
We already had this problem when using ZFS as the backup target with deduplication - with opendedup it got much better, but still is not gone.
We can also see that when this problem happens, the guests react very slow and sometimes (sometimes because i was not able to confirm that it is every time) have a very high load. We are having a hard time figuring out where the problem is, especially because the backup is done off a snapshot which should not have any impact on the running guest.
Does anybody have an idea where to look ?
We would be very happy if we can solve this problem, because especially with opendedup we are experiencing very good deduplication ratios, to give you an idea see the following figures:
----
Volume Capacity : 35 TB
Volume Current Logical Size : 12.33 TB
Volume Max Percentage Full : 95.0%
Volume Duplicate Data Written : 11.89 TB
Unique Blocks Stored: 2.93 TB
Unique Blocks Stored after Compression : 1.63 TB
Cluster Block Copies : 2
Volume Virtual Dedup Rate (Unique Blocks Stored/Current Size) : -7.36%
Volume Actual Storage Savings (Compressed Unique Blocks Stored/Current Size) : 86.76%
Compression Rate: 44.18%
---
So, we are stumped, any hints into the right direction ?
Any additional data that might help ?
Regards
Soeren
we have a problem regarding virtual machines breaking down during backup, we are trying for a while to figure this problem out.
First here are the versions
proxmox-ve: 4.1-28 (running kernel: 4.2.6-1-pve)
pve-manager: 4.1-2 (running version: 4.1-2/78c5f4a2)
pve-kernel-4.2.6-1-pve: 4.2.6-28
pve-kernel-4.2.2-1-pve: 4.2.2-16
pve-kernel-4.2.3-2-pve: 4.2.3-22
lvm2: 2.02.116-pve2
corosync-pve: 2.3.5-2
libqb0: 0.17.2-1
pve-cluster: 4.0-29
qemu-server: 4.0-42
pve-firmware: 1.1-7
libpve-common-perl: 4.0-42
libpve-access-control: 4.0-10
libpve-storage-perl: 4.0-38
pve-libspice-server1: 0.12.5-2
vncterm: 1.2-1
pve-qemu-kvm: 2.4-18
pve-container: 1.0-35
pve-firewall: 2.0-14
pve-ha-manager: 1.0-16
ksm-control-daemon: 1.2-1
glusterfs-client: 3.7.6-1
lxc-pve: 1.1.5-5
lxcfs: 0.13-pve2
cgmanager: 0.39-pve1
criu: 1.6.0-1
zfsutils: 0.6.5-pve6~jessie
We do not these are not the latest version, but i checked all changelogs and nothing indicates something about this problem, we will upgrade to the latest patches soon.
The environment is as follows
4 Proxmox Servers with the version above
2 Servers for NFS in a mirrored Setup for the VMS
1 Server with NFS for Backup running opendedup for deduplication.
- the storage traffic runs within a dedicated storage network
- the backup machine with the opendedup has more than enough memory and is running fine
- the NFS servers for the virtual machines are running fine and show no signs of problems
The problem with virtual machines breaking down occurs sporadically and not predictable on specific machines, the only correlation we can (not confirmed, only a guess so far) see is that we have a higher load on the guest, e.g. through a puppet run or things like that.
The guests are breaking down in the middle of the backup, not at the end or the beginning.
We already had this problem when using ZFS as the backup target with deduplication - with opendedup it got much better, but still is not gone.
We can also see that when this problem happens, the guests react very slow and sometimes (sometimes because i was not able to confirm that it is every time) have a very high load. We are having a hard time figuring out where the problem is, especially because the backup is done off a snapshot which should not have any impact on the running guest.
Does anybody have an idea where to look ?
We would be very happy if we can solve this problem, because especially with opendedup we are experiencing very good deduplication ratios, to give you an idea see the following figures:
----
Volume Capacity : 35 TB
Volume Current Logical Size : 12.33 TB
Volume Max Percentage Full : 95.0%
Volume Duplicate Data Written : 11.89 TB
Unique Blocks Stored: 2.93 TB
Unique Blocks Stored after Compression : 1.63 TB
Cluster Block Copies : 2
Volume Virtual Dedup Rate (Unique Blocks Stored/Current Size) : -7.36%
Volume Actual Storage Savings (Compressed Unique Blocks Stored/Current Size) : 86.76%
Compression Rate: 44.18%
---
So, we are stumped, any hints into the right direction ?
Any additional data that might help ?
Regards
Soeren