KVM machines very slow /unreachable during vmtar backup

I assume your snapshot is full. check with 'lvdisplay'
 
see this example, showing that ht snapshot is used by 0.02 % (Allocated to snapshot 0.02%)

Code:
lvdisplay /dev/pve/vzsnap*
  
--- Logical volume ---
  LV Name                /dev/pve/vzsnap-ns227086-0
  VG Name                pve
  LV UUID                aDHUMn-mkWJ-P0ct-a2es-OxUF-tH9D-za708x
  LV Write Access        read/write
  LV snapshot status     active destination for /dev/pve/data
  LV Status              available
  # open                 1
  LV Size                500.00 GiB
  Current LE             128000
  COW-table size         1.00 GiB
  COW-table LE           256
  Allocated to snapshot  [COLOR=#ff0000]0.02%[/COLOR]
  Snapshot chunk size    4.00 KiB
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:2
 
yes, the snapshot will be removed after the backup process.
 
Changed to noop and deadline (tried both), same exact issue. Performance at all VMs is at a near-standstill right now, with many soft-locking processes. Yikes.

Virtual server has been unreachable via ssh for the last few minutes, though pingable.

Interestingly, this does not happen when we gzip backups (because, I assume, it bottlenecks at CPU before hitting I/O hard).

Any ideas?

Hardware:
i7-870, 16GB DDR3
Areca 1280 Raid w/ 8x 1TB SAS (raw lvm volumes for vms)
Backup to 3TB SATA drive, approx 90MB/s write speed
 
More info - The issue started around midnight (00:00) tonight, right in the middle of the backup for VM 107. Here are the logs for backups of VM 107 and 108:

vmserver:/var/log/vzdump# cat qemu-107.log
Nov 29 23:50:20 INFO: Starting Backup of VM 107 (qemu)
Nov 29 23:50:20 INFO: running
Nov 29 23:50:20 INFO: status = running
Nov 29 23:50:21 INFO: backup mode: snapshot
Nov 29 23:50:21 INFO: bandwidth limit: 65536 KB/s
Nov 29 23:50:21 INFO: ionice priority: 7
Nov 29 23:50:21 INFO: Logical volume "vzsnap-vmserver-0" created
Nov 29 23:50:21 INFO: creating archive '/offsite/vzdump-qemu-107-2011_11_29-23_50_20.tar'
Nov 29 23:50:21 INFO: adding '/offsite/vzdumptmp147138/qemu-server.conf' to archive ('qemu-server.conf')
Nov 29 23:50:21 INFO: adding '/dev/array/vzsnap-vmserver-0' to archive ('vm-disk-virtio0.raw')
Nov 30 00:20:26 INFO: Total bytes written: 86851864576 (45.89 MiB/s)
Nov 30 00:20:26 INFO: archive file size: 80.89GB
Nov 30 00:20:26 INFO: delete old backup '/offsite/vzdump-qemu-107-2011_11_28-23_46_34.tar'
Nov 30 00:20:45 INFO: Logical volume "vzsnap-vmserver-0" successfully removed
Nov 30 00:20:45 INFO: Finished Backup of VM 107 (00:30:25)
vmserver:/var/log/vzdump# cat qemu-108.log
Nov 30 00:20:45 INFO: Starting Backup of VM 108 (qemu)
Nov 30 00:20:45 INFO: running
Nov 30 00:20:45 INFO: status = running
Nov 30 00:20:46 INFO: backup mode: snapshot
Nov 30 00:20:46 INFO: bandwidth limit: 65536 KB/s
Nov 30 00:20:46 INFO: ionice priority: 7
Nov 30 00:20:46 INFO: Logical volume "vzsnap-vmserver-0" created
Nov 30 00:20:46 INFO: creating archive '/offsite/vzdump-qemu-108-2011_11_30-00_20_45.tar'
 
These are all kvm vms - You can see that I have the 'bandwidth limit' in vzdump at 65536 KB/s in vzdump.conf, but I'm not sure what that actually does, as i/o rates on the source and backup drives go _far_ above that during backup.
 
your limit seems quite high (to high), try with 25000.
 
pve-manager: 1.9-26 (pve-manager/1.9/6567)
running kernel: 2.6.32-6-pve
proxmox-ve-2.6.32: 1.9-53
pve-kernel-2.6.24-9-pve: 2.6.24-18
pve-kernel-2.6.24-8-pve: 2.6.24-16
pve-kernel-2.6.32-6-pve: 2.6.32-53
qemu-server: 1.1-32
pve-firmware: 1.0-14
libpve-storage-perl: 1.0-19
vncterm: 0.9-2
vzctl: 3.0.29-3pve1
vzdump: 1.2-16
vzprocps: 2.0.11-2
vzquota: 3.0.11-1
pve-qemu-kvm: 0.15.0-2
ksm-control-daemon: 1.0-6
 
What does the 'limit' represent? I assumed it was KB/s. I have set it to 25000 and will report back after tonight's backup attempt.
 
yes KB/s, for details see 'man vzdump'.
 
I see that. However, I'm not sure it's actually affecting anything for these kvm/lvm snapshot images, I have it set to 65536 and it didn't seem to limit i/o on any device... Regardless, I have made the change to 25000 and will report back after tonight's backup.
 
I see that. However, I'm not sure it's actually affecting anything for these kvm/lvm snapshot images, I have it set to 65536 and it didn't seem to limit i/o on any device... Regardless, I have made the change to 25000 and will report back after tonight's backup.

Your backup speed is 45.89 MiB/s (from the backup logs)
You limit is 65MB/s (65536KB) - so why should that trigger?
 
Interesting. I see i/o on devices much higher than that throughout the process using iostat -xk 1... At any rate, I'll give the bwlimit of 25000 a shot and post back here.
 
OK, last night (w/ 25000 bwlimit) was much better - machines were generally reachable during entire backup. However, system load was still extremely high at points during backup (with corresponding high load on guest VMs - backup starts at 23:00):

load.PNG

VZdump logs: https://gist.github.com/1417896

*EDIT* After looking at logs and the load graph above, it seems that the start of each
Code:
Adding '/dev/array/vzsnap-vmserver-0' to archive ('vm-disk-virtio0.raw')
is what causes the high load.

I'm wondering why the backup affects system performance so much? Is lvm working hard to maintain the snapshot? There's not much i/o going on on the guests during backup... Would a faster backup target drive improve things?
 
Last edited:
I'm wondering why the backup affects system performance so much? Is lvm working hard to maintain the snapshot? There's not much i/o going on on the guests during backup... Would a faster backup target drive improve things?

How fast is your disk subsystem? Please run:

# pveperf

(when there is no load on the host)
 
pveperf on PVE root (on SATA SSD):
Code:
CPU BOGOMIPS:      44756.82
REGEX/SECOND:      772976
HD SIZE:           7.14 GB (/dev/mapper/pve-root)
BUFFERED READS:    141.48 MB/sec
AVERAGE SEEK TIME: 0.25 ms
FSYNCS/SECOND:     149.38
DNS EXT:           53.63 ms
DNS INT:           42.97 ms (praece.com)

pveperf of backup storage location (on SATA HDD):
Code:
CPU BOGOMIPS:      44756.82
REGEX/SECOND:      788643
HD SIZE:           2750.67 GB (/dev/sdd1)
BUFFERED READS:    87.43 MB/sec
AVERAGE SEEK TIME: 14.88 ms
FSYNCS/SECOND:     21.73
DNS EXT:           51.28 ms
DNS INT:           52.18 ms (praece.com)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!