backup fails

Erwin123

Member
May 14, 2008
207
1
16
Hi,

Every night I backup using vzdump to a usbdisk attached to the node.
Sometimes the backup fails.
These are the last lines of my log this morning:

117: Feb 23 06:18:53 INFO: Logical volume "vzsnap" successfully removed
117: Feb 23 06:18:53 INFO: Finished Backup of VM 117 (00:18:50)

119: Feb 23 06:18:53 INFO: Starting Backup of VM 119 (openvz)
119: Feb 23 06:18:53 INFO: status = CTID 119 exist mounted running
119: Feb 23 06:18:53 INFO: creating lvm snapshot of /dev/mapper/pve-data ('/dev/pve/vzsnap')
119: Feb 23 06:18:53 INFO: Logical volume "vzsnap" created
119: Feb 23 06:18:53 INFO: mounting lvm snapshot
119: Feb 23 06:18:53 INFO: creating archive '/mnt/usbdrive/vzdump-119.dat' (/mnt/vzsnap/private/119)
119: Feb 23 07:21:44 INFO: Total bytes written: 5825638400 (5.5GiB, 1.5MiB/s)
119: Feb 23 07:21:44 INFO: file size 5.43GB
119: Feb 23 07:21:53 INFO: umount: /mnt/vzsnap: device is busy
119: Feb 23 07:21:53 INFO: umount: /mnt/vzsnap: device is busy
119: Feb 23 07:21:53 INFO: command 'umount /mnt/vzsnap' failed with exit code 1
119: Feb 23 07:21:56 INFO: Can't remove open logical volume "vzsnap"
119: Feb 23 07:21:56 INFO: command '/sbin/lvremove -f /dev/pve/vzsnap' failed with exit code 5
119: Feb 23 07:21:56 INFO: Finished Backup of VM 119 (01:03:03)

134: Feb 23 07:21:57 INFO: Starting Backup of VM 134 (openvz)
134: Feb 23 07:21:57 INFO: status = CTID 134 exist mounted running
134: Feb 23 07:21:58 INFO: trying to remove stale snapshot '/dev/pve/vzsnap'
134: Feb 23 07:21:58 INFO: umount: /mnt/vzsnap: device is busy
134: Feb 23 07:21:58 INFO: umount: /mnt/vzsnap: device is busy
134: Feb 23 07:21:58 INFO: command 'umount /mnt/vzsnap' failed with exit code 1
134: Feb 23 07:21:58 INFO: Can't remove open logical volume "vzsnap"
134: Feb 23 07:21:58 INFO: command '/sbin/lvremove -f /dev/pve/vzsnap' failed with exit code 5
134: Feb 23 07:21:58 INFO: creating lvm snapshot of /dev/mapper/pve-data ('/dev/pve/vzsnap')
134: Feb 23 07:21:59 INFO: Logical volume "vzsnap" already exists in volume group "pve"
134: Feb 23 07:21:59 ERROR: Backup of VM 134 failed - command '/sbin/lvcreate --size 3072M --snapshot --name vzsnap /dev/pve/data' failed with exit code 5

When I look at the usbdisk now all seems well.
This is the current situation:

node5:/# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/pve/root 64G 1.4G 60G 3% /
tmpfs 5.9G 0 5.9G 0% /lib/init/rw
udev 10M 76K 10M 1% /dev
tmpfs 5.9G 0 5.9G 0% /dev/shm
/dev/mapper/pve-data 178G 154G 24G 87% /var/lib/vz
/dev/sda1 504M 33M 446M 7% /boot
/dev/sdc1 587G 136G 422G 25% /mnt/usbdrive
/dev/mapper/pve-vzsnap
178G 154G 24G 87% /mnt/vzsnap

Anyone any idea what happend and how to prevent this?

Thanks in advance!
 
Hi Dietmar,

I did this yesterday, but today the backup of this container is extremely slow.
Its been running since 6:30 this morning and usaully only takes an hour.
By now he's only at a quarter, and I guess it will take him hours to complete, during wich the whole node is slow (load 10).

I see this:
root 26295 0.0 0.0 8836 1124 ? S 06:24 0:00 sh -c (cd /mnt/vzsnap/private/134; find . '(' -type 's' -prune ')' -o -print0|tar cpf - --totals --sparse --numeric-owner --no-recursion --ignore-failed-read --null -T -|cstream -t 10485760 >/mnt/usbdrive/vzdump-134.dat)
root 26296 0.0 0.0 8836 608 ? S 06:24 0:00 sh -c (cd /mnt/vzsnap/private/134; find . '(' -type 's' -prune ')' -o -print0|tar cpf - --totals --sparse --numeric-owner --no-recursion --ignore-failed-read --null -T -|cstream -t 10485760 >/mnt/usbdrive/vzdump-134.dat)
root 27540 0.0 0.0 35556 7008 ? Ss 02:00 0:01 /usr/bin/perl -w /usr/sbin/vzdump --quiet --node 11 --snapshot --dumpdir /mnt/usbdrive --mailto support@myemail.com 107 112 114 116 117 119 134

Is it running double?
Can I just kill the processes?
 
Last edited:
I killed the vzdump and tar processes and try to umount /mnt/vzsnap

node5:/mnt/usbdrive# umount /mnt/vzsnap
umount: /mnt/vzsnap: device is busy
umount: /mnt/vzsnap: device is busy

is there a way to see what it is busy with?
I'm sure if I wait an hour orso that I can umount it, I could yesterday, but what is it?
 
Hi Dietmar,

This morning the same problem.
The backup did end this time although it took over three hours to backup a 30GB container (normally an hour)
It failed to umount again after the last backup
I just did lsof:

node5:/mnt/usbdrive# lsof /mnt/vzsnap
COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
updatedb. 30172 root cwd DIR 254,3 4096 9710481 /mnt/vzsnap/private/116/var/www/vhosts/domainname.nl/httpdocs/site/administrator/components/com_joomlapack/views/nag/tmpl
updatedb. 30172 root 8r DIR 254,3 4096 2 /mnt/vzsnap
updatedb. 30172 root 9r DIR 254,3 4096 16385 /mnt/vzsnap/private
updatedb. 30172 root 10r DIR 254,3 4096 7864334 /mnt/vzsnap/private/116
updatedb. 30172 root 11r DIR 254,3 4096 7864354 /mnt/vzsnap/private/116/var
updatedb. 30172 root 12r DIR 254,3 4096 8667689 /mnt/vzsnap/private/116/var/www
updatedb. 30172 root 13r DIR 254,3 4096 8702398 /mnt/vzsnap/private/116/var/www/vhosts
updatedb. 30172 root 14r DIR 254,3 4096 8708147 /mnt/vzsnap/private/116/var/www/vhosts/domainname.nl
updatedb. 30172 root 15r DIR 254,3 4096 9675613 /mnt/vzsnap/private/116/var/www/vhosts/domainname.nl/httpdocs
updatedb. 30172 root 16r DIR 254,3 4096 9709987 /mnt/vzsnap/private/116/var/www/vhosts/domainname.nl/httpdocs/site
updatedb. 30172 root 17r DIR 254,3 4096 9709991 /mnt/vzsnap/private/116/var/www/vhosts/domainname.nl/httpdocs/site/administrator
updatedb. 30172 root 18r DIR 254,3 4096 9709995 /mnt/vzsnap/private/116/var/www/vhosts/domainname.nl/httpdocs/site/administrator/components
updatedb. 30172 root 19r DIR 254,3 4096 9709989 /mnt/vzsnap/private/116/var/www/vhosts/domainname.nl/httpdocs/site/administrator/components/com_joomlapack
updatedb. 30172 root 20r DIR 254,3 4096 9710473 /mnt/vzsnap/private/116/var/www/vhosts/domainname.nl/httpdocs/site/administrator/components/com_joomlapack/views
updatedb. 30172 root 21r DIR 254,3 4096 9710480 /mnt/vzsnap/private/116/var/www/vhosts/domainname.nl/httpdocs/site/administrator/components/com_joomlapack/views/nag
updatedb. 30172 root 22r DIR 254,3 4096 9710481 /mnt/vzsnap/private/116/var/www/vhosts/domainname.nl/httpdocs/site/administrator/components/com_joomlapack/views/nag/tmpl

I'm sure this makes sense to someone :)

Edit:
After typing this message I did it again and:

node5:/mnt/usbdrive# lsof /mnt/vzsnap
COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
updatedb. 30172 root cwd DIR 254,3 4096 313555 /mnt/vzsnap/private/134/home/httpd/vhosts/.nl/httpdocs/WEB/templates/ja_purity/styles/header
updatedb. 30172 root 8r DIR 254,3 4096 2 /mnt/vzsnap
updatedb. 30172 root 9r DIR 254,3 4096 16385 /mnt/vzsnap/private
updatedb. 30172 root 10r DIR 254,3 4096 204806 /mnt/vzsnap/private/134
updatedb. 30172 root 11r DIR 254,3 4096 204820 /mnt/vzsnap/private/134/home
updatedb. 30172 root 12r DIR 254,3 4096 205879 /mnt/vzsnap/private/134/home/httpd
updatedb. 30172 root 13r DIR 254,3 12288 205881 /mnt/vzsnap/private/134/home/httpd/vhosts
updatedb. 30172 root 14r DIR 254,3 4096 205906 /mnt/vzsnap/private/134/home/httpd/vhosts/.nl
updatedb. 30172 root 15r DIR 254,3 4096 303185 /mnt/vzsnap/private/134/home/httpd/vhosts/.nl/httpdocs
updatedb. 30172 root 16r DIR 254,3 4096 303199 /mnt/vzsnap/private/134/home/httpd/vhosts/.nl/httpdocs/WEB
updatedb. 30172 root 17r DIR 254,3 4096 305788 /mnt/vzsnap/private/134/home/httpd/vhosts/.nl/httpdocs/WEB/templates
updatedb. 30172 root 18r DIR 254,3 4096 313261 /mnt/vzsnap/private/134/home/httpd/vhosts/.nl/httpdocs/WEB/templates/ja_purity
updatedb. 30172 root 19r DIR 254,3 4096 313433 /mnt/vzsnap/private/134/home/httpd/vhosts/.nl/httpdocs/WEB/templates/ja_purity/styles

Another container.
I guess disabling updatedb is a good start?
 
Last edited:
Try to edit /etc/updatedb.conf, add '/mnt' to PRUNEPATHS:

PRUNEPATHS="/tmp /var/spool /media /mnt"
 
Thanks,

Apparently the updatedb's are done now.
I let a small container make a backup to clear the mounted snapshot etc:
Feb 25 10:06:02 INFO: Starting Backup of VM 104 (openvz)
Feb 25 10:06:02 INFO: status = CTID 104 exist mounted running
Feb 25 10:06:07 INFO: trying to remove stale snapshot '/dev/pve/vzsnap'
Feb 25 10:06:11 INFO: Logical volume "vzsnap" successfully removed
Feb 25 10:06:11 INFO: creating lvm snapshot of /dev/mapper/pve-data ('/dev/pve/vzsnap')
Feb 25 10:06:11 INFO: /dev/dm-4: stat failed: No such file or directory
Feb 25 10:06:11 INFO: Path /dev/dm-4 no longer valid for device(254,4)
Feb 25 10:06:11 INFO: /dev/disk/by-id/dm-name-pve-data-real: stat failed: No such file or directory
Feb 25 10:06:11 INFO: Path /dev/disk/by-id/dm-name-pve-data-real no longer valid for device(254,4)
Feb 25 10:06:11 INFO: /dev/disk/by-id/dm-uuid-LVM-2OvKh9lZQGCvEnQuYWwE7bALuJ2R2MVHYvlYsUK9SIWhnqbOeIdTl8lFDY9oawq7-real: stat failed: No such file or directory
Feb 25 10:06:11 INFO: Path /dev/disk/by-id/dm-uuid-LVM-2OvKh9lZQGCvEnQuYWwE7bALuJ2R2MVHYvlYsUK9SIWhnqbOeIdTl8lFDY9oawq7-real no longer valid for device(254,4)
Feb 25 10:06:13 INFO: Logical volume "vzsnap" created
Feb 25 10:06:13 INFO: mounting lvm snapshot
Feb 25 10:06:16 INFO: creating archive '/mnt/usbdrive/vzdump-104.dat' (/mnt/vzsnap/private/104)
Feb 25 10:09:49 INFO: Total bytes written: 338780160 (324MiB, 3.0MiB/s)
Feb 25 10:09:49 INFO: file size 323MB
Feb 25 10:10:01 INFO: Logical volume "vzsnap" successfully removed
Feb 25 10:10:01 INFO: Finished Backup of VM 104 (00:03:59)

The load however is still high I guess because of the io delays (between 10 and 50%).
In syslog I see
Feb 25 10:21:05 node5 pvetunnel[5142]: trying to finish tunnel 12576
Feb 25 10:21:06 node5 pvetunnel[5142]: trying to finish tunnel 12576
Feb 25 10:21:07 node5 pvetunnel[5142]: trying to finish tunnel 12576
Feb 25 10:21:08 node5 pvetunnel[5142]: trying to finish tunnel 12576
Feb 25 10:21:09 node5 pvetunnel[5142]: trying to finish tunnel 12576
Feb 25 10:21:10 node5 pvetunnel[5142]: trying to finish tunnel 12576
Feb 25 10:21:11 node5 pvetunnel[5142]: trying to finish tunnel 12576
Feb 25 10:21:12 node5 pvetunnel[5142]: trying to finish tunnel 12576
Feb 25 10:21:13 node5 pvetunnel[5142]: trying to finish tunnel 12576
Feb 25 10:21:14 node5 pvetunnel[5142]: trying to finish tunnel 12576
Feb 25 10:21:15 node5 pvetunnel[5142]: trying to finish tunnel 12576
etc...

Could this be the cause of the weird load and io problems?
Any other suggestions where to look?
Top gives no clue's.
 
Deleted: sorry found it..
 
Last edited: