Serious problem with backups QEMU/KVM on mode snapshots

abkrim

Well-Known Member
Sep 5, 2009
97
1
48
Zamora (España)
castris.com
Scenario

QEMU/KVM VPS on host machine for only QEMU VPS (3 VPS)
8GB RAM / 2 Disk SATA / 1 XEON
backup mode: snapshot
Disk (try with raw & qcow2) on BUS IDE or SCSI

Code:
vzdump --compress --dumpdir=/backup --mailto fenzen@gmail.com 516
INFO: starting new backup job: vzdump --compress --dumpdir=/backup --mailto fenzen@gmail.com 516
INFO: Starting Backup of VM 516 (qemu)
INFO: running
INFO: status = running
INFO: backup mode: snapshot
INFO: bandwidth limit: 10240 KB/s
INFO:   Logical volume "vzsnap-vz003.islaserver.com-0" created
INFO: creating archive '/backup/vzdump-qemu-516-2010_05_02-21_16_10.tgz'
INFO: adding '/backup/vzdump-qemu-516-2010_05_02-21_16_10.tmp/qemu-server.conf' to archive ('qemu-server.conf')
INFO: adding '/mnt/vzsnap0/images/516/vm-516-disk-1.qcow2' to archive ('vm-disk-ide0.qcow2')
Just on this momet, VPS don't work. (Apache, SMTP, ...) also not ping to VPS. Dead. Stopped.

Load on host 1,49 to 2

Frustrated.
 
With OpenVZ work perfectly.

pveperf
CPU BOGOMIPS: 22537.14
REGEX/SECOND: 970755
HD SIZE: 9.92 GB (/dev/sda1)
BUFFERED READS: 74.77 MB/sec
AVERAGE SEEK TIME: 7.43 ms
FSYNCS/SECOND: 982.78
DNS EXT: 32.39 ms
DNS INT: 1.37 ms (ovh.net)
 
Looks quite normal. But other users already reported such problems:

http://forum.proxmox.com/threads/434-I-O-scheduler

Maybe you should try to set the IO Scheduler to 'deadline'. Edit /boot/grub/menu.lst and set

Code:
# kopt=root=/dev/XYZ ro elevator=deadline

The run

# update-grub

and reboot. Does that help?
 
Well.

For change scheluder, i dont' use modifiy grub. Don't need. Only change on system.

Of course, we try with normal scheluder and try with
echo deadline > /sys/block/sda/queue/scheduler

http://wiki.openvz.org/I/O_priorities_for_containers

Desesperate. Just like move from OpenVZ to KVM but poor I/O Perfomance and a lot of problems with I/O Overloads
 
AFter several test.

1.- If put cfq scheluder, KVM VPS don't work. Very heavy load on server
2.- If put deadline scheluder, backup grow, grow, grow

Example:
VPS 999
43810819 7,2G -rw-r--r-- 1 root root 7,1G may 10 20:22 vm-999-disk-1.qcow2
43810820 5,1G -rw-r--r-- 1 root root 5,1G may 9 22:26 vm-999-disk-2.qcow2

Backup after 10 hours
34 -rw-r--r-- 1 root root 32G may 10 20:21 vzdump-qemu-999-2010_05_10-15_48_35.dat

Frustrated. Desesperated.
 
On log any... because I don't like see how growing backup size



may 11 06:02:56 INFO: Starting Backup of VM 516 (qemu)
may 11 06:02:56 INFO: running
may 11 06:02:56 INFO: status = running
may 11 06:02:56 INFO: backup mode: snapshot
may 11 06:02:56 INFO: bandwidth limit: 10240 KB/s
may 11 06:02:56 INFO: Logical volume "vzsnap-vz003.islaserver.com-0" created
may 11 06:02:57 INFO: creating archive '/backup/vzdump-qemu-516-2010_05_11-06_02_56.tar'
may 11 06:02:57 INFO: adding '/backup/vzdump-qemu-516-2010_05_11-06_02_56.tmp/qemu-server.conf' to archive ('qemu-server.conf')
may 11 06:02:57 INFO: adding '/mnt/vzsnap0/images/516/vm-516-disk-1.raw' to archive ('vm-disk-ide0.raw')
may 11 06:03:53 INFO: Logical volume "vzsnap-vz003.islaserver.com-0" successfully removed
may 11 06:03:53 ERROR: Backup of VM 516 failed - interrupted by signal
 
pveversion -v
pve-manager: 1.5-8 (pve-manager/1.5/4674)
running kernel: 2.6.24-10-pve
proxmox-ve-2.6.24: 1.5-21
pve-kernel-2.6.24-10-pve: 2.6.24-21
qemu-server: 1.1-11
pve-firmware: 1.0-3
libpve-storage-perl: 1.0-10
vncterm: 0.9-2
vzctl: 3.0.23-1pve8
vzdump: 1.2-5
vzprocps: 2.0.11-1dso2
vzquota: 3.0.11-1dso1
pve-qemu-kvm: 0.11.1-2
 
Panic for me. Machine it's on production.

Not backup for sites.

First like create clones of VPS on another machine.

After try change kernel.
 
We`ve got the same problems with snapshots of large kvm images or vm`s with more than one virtual disk (raw)

"endless growing" backup.dat files

node1:~# pveversion -v
pve-manager: 1.5-5 (pve-manager/1.5/4627)
running kernel: 2.6.24-8-pve
proxmox-ve-2.6.18: 1.5-4
pve-kernel-2.6.24-7-pve: 2.6.24-11
pve-kernel-2.6.24-8-pve: 2.6.24-16
pve-kernel-2.6.18-1-pve: 2.6.18-4
qemu-server: 1.1-11
pve-firmware: 1.0-3
libpve-storage-perl: 1.0-8
vncterm: 0.9-2
vzctl: 3.0.23-1pve8
vzdump: 1.2-5
vzprocps: 2.0.11-1dso2
vzquota: 3.0.11-1
pve-qemu-kvm-2.6.18: 0.9.1-5

node1:~# pveperf
CPU BOGOMIPS: 18468.54
REGEX/SECOND: 455959
HD SIZE: 94.49 GB (/dev/pve/root)
BUFFERED READS: 96.25 MB/sec
AVERAGE SEEK TIME: 10.66 ms
FSYNCS/SECOND: 1085.60
DNS EXT: 44.68 ms
DNS INT: 13.88 ms (landesstelle.kjh.de)

System HP DL185G5 Hardware Raid SmartArray P400 BBWC with 7.2k S-ATA Raid1
 
uhmm..

Easy.

1.- Ask for keys of mahcine.
2.- Sysadmin give you key
3.- Goi to system
4.- Run backup command for XX VID whit problems..

Other way... i don't kown
 
We got the problem with proxmox 1.4 and now with 1.5. VMs are Windows 2008R2 on different hardware (now HP DL185G5 - 8GB RAM)

Reproducing the bug seems to be difficult (still no idea) I`m "happy" to read that on another hardware the same problems exists.

Build a VM with more than 100GB used space and for example 2 vm-disk. Maybe you will also get the problem. Anybody else in the Forum with such problems?

Is our I/O performance to low???
 
you are running an quite old 2.6.24 kernel and you are using the kvm module for the 2.6.18 kernel. this leads to a lot of issues.

so make sure you got the right packages, for a howto upgrade see: http://pve.proxmox.com/wiki/Downloads
 
...

Build a VM with more than 100GB used space and for example 2 vm-disk. Maybe you will also get the problem. Anybody else in the Forum with such problems?

Is our I/O performance to low???
Hi,
I backup one VM with two rawdisk (32+145GB) without trouble... but on a fast raid:
Code:
cat vzdump-qemu-125-2010_05_29-07_12_51.log
May 29 07:12:51 INFO: Starting Backup of VM 125 (qemu)
May 29 07:12:51 INFO: running
May 29 07:12:51 INFO: status = running
May 29 07:12:52 INFO: backup mode: snapshot
May 29 07:12:52 INFO: bandwidth limit: 10240 KB/s
May 29 07:12:52 INFO:   Logical volume "vzsnap-proxmox2-0" created
May 29 07:12:52 INFO: creating archive '/bckup/vzdump-qemu-125-2010_05_29-07_12_51.tgz'
May 29 07:12:52 INFO: adding '/bckup/vzdump-qemu-125-2010_05_29-07_12_51.tmp/qemu-server.conf' to archive ('
qemu-server.conf')
May 29 07:12:52 INFO: adding '/mnt/vzsnap0/images/125/vm-125-disk-1.raw' to archive ('vm-disk-virtio0.raw')
May 29 07:35:18 INFO: adding '/mnt/vzsnap0/images/125/vm-125-disk-2.raw' to archive ('vm-disk-virtio1.raw')
May 29 09:02:01 INFO: Total bytes written: 157522399744 (22.94 MiB/s)
May 29 09:02:01 INFO: archive file size: 56.69GB
May 29 09:02:01 INFO: delete old backup '/bckup/vzdump-qemu-125-2010_05_22-07_14_13.tgz'
May 29 09:02:03 INFO:   Logical volume "vzsnap-proxmox2-0" successfully removed
May 29 09:02:03 INFO: Finished Backup of VM 125 (01:49:12)
What is the output of
Code:
vgdisplay pve | grep Free
especially during the backup process? Only an idea...

Udo