Backup vzdump hanging at vmtar

bms-ck

New Member
Nov 4, 2010
4
0
1
Hello,

We have some windows servers on a proxmox cluster an do the backup with vzdump. Sometimes, the backup hang during vmtar.

vzdump log file:
Nov 04 00:32:47 INFO: Starting Backup of VM 102 (qemu)
Nov 04 00:32:47 INFO: running
Nov 04 00:32:47 INFO: status = running
Nov 04 00:32:48 INFO: backup mode: snapshot
Nov 04 00:32:48 INFO: ionice priority: 7
Nov 04 00:32:48 INFO: Logical volume "vzsnap-ebiovns0008-0" created
Nov 04 00:32:48 INFO: creating archive '/mnt/pve/backup_05/vzdump-qemu-102-2010_11_04-00_32_46.tgz'

On the backup device is a dat file which has the expected size of the backup, but the tgz files has not been created.

I checked the syslog and other log files and found following message in th kernel.log:
Nov 4 03:33:25 ebiovns0008 kernel: Buffer I/O error on device dm-10, logical block 39509926
Nov 4 03:33:25 ebiovns0008 kernel: Buffer I/O error on device dm-10, logical block 39509927
Nov 4 03:33:25 ebiovns0008 kernel: Buffer I/O error on device dm-10, logical block 39509928
Nov 4 03:33:25 ebiovns0008 kernel: Buffer I/O error on device dm-10, logical block 39509929
Nov 4 03:33:25 ebiovns0008 kernel: Buffer I/O error on device dm-10, logical block 39509930
Nov 4 03:33:25 ebiovns0008 kernel: Buffer I/O error on device dm-10, logical block 39509931
Nov 4 03:33:25 ebiovns0008 kernel: Buffer I/O error on device dm-10, logical block 39509932
Nov 4 03:33:25 ebiovns0008 kernel: Buffer I/O error on device dm-10, logical block 39509933
Nov 4 03:33:25 ebiovns0008 kernel: Buffer I/O error on device dm-10, logical block 39509934
Nov 4 03:33:30 ebiovns0008 kernel: __ratelimit: 4463687 callbacks suppressed
Nov 4 03:33:30 ebiovns0008 kernel: Buffer I/O error on device dm-10, logical block 39509925
Nov 4 03:33:30 ebiovns0008 kernel: Buffer I/O error on device dm-10, logical block 39509925
Nov 4 03:33:30 ebiovns0008 kernel: Buffer I/O error on device dm-10, logical block 39509925
Nov 4 03:33:30 ebiovns0008 kernel: Buffer I/O error on device dm-10, logical block 39509925
.......
Nov 4 03:33:45 ebiovns0008 kernel: Buffer I/O error on device dm-10, logical block 39509925
Nov 4 03:33:45 ebiovns0008 kernel: Buffer I/O error on device dm-10, logical block 39509925
Nov 4 03:33:45 ebiovns0008 kernel: Buffer I/O error on device dm-10, logical block 39509925
Nov 4 03:33:45 ebiovns0008 kernel: Buffer I/O error on device dm-10, logical block 39509925
Nov 4 06:25:01 ebiovns0008 kernel: Kernel logging (proc) stopped.
Nov 4 06:25:02 ebiovns0008 kernel: imklog 3.18.6, log source = /proc/kmsg started.


Output pveversion -v:
pve-manager: 1.6-2 (pve-manager/1.6/5087)
running kernel: 2.6.32-4-pve
proxmox-ve-2.6.32: 1.6-19
pve-kernel-2.6.32-4-pve: 2.6.32-19
pve-kernel-2.6.24-8-pve: 2.6.24-16
qemu-server: 1.1-18
pve-firmware: 1.0-8
libpve-storage-perl: 1.0-14
vncterm: 0.9-2
vzctl: 3.0.24-1pve4
vzdump: 1.2-7
vzprocps: 2.0.11-1dso2
vzquota: 3.0.11-1
pve-qemu-kvm: 0.12.5-1
ksm-control-daemon: 1.0-4

ps-ef | grep back:
root 25397 25395 0 00:15 ? 00:00:00 /usr/bin/perl -w /usr/sbin/vzdump --quiet --node 1 --snapshot --compress --storage backup_05 --mailto eBioscience.ViennaIT@ebioscience.com 101 102 104 105 110
root 25910 25397 0 00:32 ? 00:00:00 sh -c /usr/lib/qemu-server/vmtar '/mnt/pve/backup_05/vzdump-qemu-102-2010_11_04-00_32_46.tmp/qemu-server.conf' 'qemu-server.conf' '/dev/drbdvg1/vzsnap-ebiovns0008-0' 'vm-disk-virtio0.raw' |gzip >/mnt/pve/backup_05/vzdump-qemu-102-2010_11_04-00_32_46.dat
root 25911 25910 68 00:32 ? 06:13:17 /usr/lib/qemu-server/vmtar /mnt/pve/backup_05/vzdump-qemu-102-2010_11_04-00_32_46.tmp/qemu-server.conf qemu-server.conf /dev/drbdvg1/vzsnap-ebiovns0008-0 vm-disk-virtio0.raw


vmtar is stil running :

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
25911 root 20 0 15468 11m 408 R 100 0.0 378:10.63 vmtar

On the backdevice is enough space (> 450 GB available). This problem did not occure every day.

In the past I killed these processes, but I would like to find the root cause and fix it.

Any ideas - thanks.
 
LVM snapshot space is running full?

Do you have any setting in /etc/vzdump.conf?

when the 'hang' occurs, what do you get here:
Code:
lvdisplay /dev/pve/vzsnap-*

take a look on the 'Allocated to snapshot 0.42%' - when this reaches 100 % the backup never get finished.
 
Hi Tom,

I cancelled already the process to continue the backup on the production sytem and the snapshot has been removed. But I did an lvdisplay beforehand and found the output:


--- Logical volume ---
LV Name /dev/drbdvg1/vzsnap-ebiovns0008-0
VG Name drbdvg1
LV UUID cEfzqI-OhW2-agmm-ppPO-UWd9-dUgw-hDqRu7
LV Write Access read/write
LV snapshot status INACTIVE destination for /dev/drbdvg1/vm-102-disk-1
LV Status available
# open 1
LV Size 160.00 GB
Current LE 40960
COW-table size 1.00 GB
COW-table LE 256
Snapshot chunk size 4.00 KB
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 254:10
 
its only 1 gb, increase to 2 gb (see below).

as you did not answer to my question regarding vzdump.conf I assume you do not have any settings there.

just create the file and enter:
Code:
nano /etc/vzdump.conf

Code:
size: 2048
 
Hi Tom,

File zvdump.conf was empty. I updated it. Let's see if the error will occure again

Thank you
Christian
 
yes, this file does not exist - you need to create it. an make sure you name it vzdump.conf (not zvdump.conf).
 
hi tom,
what is the 'size: 2048' in vzdump.conf good for? Actually I have the same problem. I
- set up the backup of my fileserver (about 700GB of data) to 00:00 this morning
- the fileserver is a .raw-disk on the main hdd
- the backup destination is a separated hdd/ nfs4 volume mounted in /var/lib/bak (with 1TiB of free space)
- about 8 o clock the harddisk stops struggling
- since 4 hours there is no significant activity inspite that 'vmtar' needs 100% cpu

result:
- two vm's were backupped correct (as shown in /var/log/vzdump/quemu10?.log)
- the log of fileserver ends up with "Mar 20 00:10:18 INFO: creating archive "/var/lib/bak/vzdump...tgz'
- the syslog shows NO errors or abnormal activity (everythings seems to be fine)
- for the other vms there is a 'tgz'-file in /var/lib/bak
- for the fileserver there is a 'dat'-file in /var/lib/bak (created 4hrs ago) with no changing size (463GB)
- vmtar is still running

is vmtar still doing his job compressing the 'dat' to a 'tgz'? could there be an end? there is no significant hdd activity...

[finally an ABSTRACT of lvdisplay /dev/pve/vzsnap-xx-0:
LV Write - Access read/write
LV snapshot status - INACTIVE destination for /dev/pve/data
LV status - available
# open - 1
LV Size - 827,02 GB
Current LE - 211716
COW-table-size - 1,00 GB
COW-table LE 256
Snapshot chunk size - 4,00 KB
Segments - 1
Allocation - inherit
Read ahead sectors - auto
- currently set to - 256
Block device 251:3
]

i have no vzdump.conf
 
Last edited:
I had the same problem. And I used size: 8192 for my Windows VMs because it defines the lvm-snapshot-size. So if the backup takes a long time you need to raise the snapshot-size.
 
- what is the 'lvm-snapshot-size' good for?
- with which specifications of the original vm is it to compare with
- what is the 'lvm-snapshot-size' depending on? (my fileserver has a 600gb hdd, 4gb ram,...) -> how high should that value approx be?
- how high could i set the 'lvm-snapshot-size'? what is that value limited to?

Edit: if I understand it right, the lvm-snapshot-size is just recording the traffic during the backup, right? that means for my fileserver (doing backups normally at night) a 'worst-case'-size of 20480 (20GB) should be ok (single-user). could the lvm-snapshot-size be too high?
 
Last edited: