Container Backups Failing

  • Thread starter Thread starter deejross
  • Start date Start date
D

deejross

Guest
My node has an uptime of 28 days now and I just noticed that all of my container backups have been failing for the last couple of days. I am running version 2.0-30 and this is what my backup log says:

Code:
[COLOR=#000000][FONT=tahoma]INFO: starting new backup job: vzdump --quiet 1 --mode snapshot --compress gzip --maxfiles 2 --storage mac --node cloud --all 1[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: filesystem type on dumpdir is 'fuse.sshfs' -using /var/tmp/vzdumptmp740285 for temporary files[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: Starting Backup of VM 100 (openvz)[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: CTID 100 exist mounted running[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status = running[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: backup mode: snapshot[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: ionice priority: 7[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: trying to remove stale snapshot '/dev/storage-group/vzsnap-cloud-0'[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: umount: /mnt/vzsnap0: not mounted[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]ERROR: command 'umount /mnt/vzsnap0' failed: exit code 1[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO:   Limit for the maximum number of semaphores reached. You can check and set the limits in /proc/sys/kernel/sem.[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO:   Unable to deactivate storage--group-vzsnap--cloud--0 (253:4)[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO:   Unable to deactivate logical volume "vzsnap-cloud-0"[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]ERROR: command 'lvremove -f /dev/storage-group/vzsnap-cloud-0' failed: exit code 5[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: creating lvm snapshot of /dev/mapper/storage--group-storage ('/dev/storage-group/vzsnap-cloud-0')[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO:   Logical volume "vzsnap-cloud-0" already exists in volume group "storage-group"[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: lvremove failed - trying again in 8 seconds[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: lvremove failed - trying again in 16 seconds[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: lvremove failed - trying again in 32 seconds[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]ERROR: command 'lvremove -f /dev/storage-group/vzsnap-cloud-0' failed: exit code 5[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]ERROR: Backup of VM 100 failed - command 'lvcreate --size 1024M --snapshot --name vzsnap-cloud-0 /dev/storage-group/storage' failed: exit code 5[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: filesystem type on dumpdir is 'fuse.sshfs' -using /var/tmp/vzdumptmp740285 for temporary files[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: Starting Backup of VM 101 (openvz)[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: CTID 101 exist mounted running[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status = running[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: backup mode: snapshot[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: ionice priority: 7[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: trying to remove stale snapshot '/dev/storage-group/vzsnap-cloud-0'[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: umount: /mnt/vzsnap0: not mounted[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]ERROR: command 'umount /mnt/vzsnap0' failed: exit code 1[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO:   Limit for the maximum number of semaphores reached. You can check and set the limits in /proc/sys/kernel/sem.[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO:   Unable to deactivate storage--group-vzsnap--cloud--0 (253:4)[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO:   Unable to deactivate logical volume "vzsnap-cloud-0"[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]ERROR: command 'lvremove -f /dev/storage-group/vzsnap-cloud-0' failed: exit code 5[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: creating lvm snapshot of /dev/mapper/storage--group-storage ('/dev/storage-group/vzsnap-cloud-0')[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO:   Logical volume "vzsnap-cloud-0" already exists in volume group "storage-group"[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: lvremove failed - trying again in 8 seconds[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: lvremove failed - trying again in 16 seconds[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: lvremove failed - trying again in 32 seconds[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]ERROR: command 'lvremove -f /dev/storage-group/vzsnap-cloud-0' failed: exit code 5[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]ERROR: Backup of VM 101 failed - command 'lvcreate --size 1024M --snapshot --name vzsnap-cloud-0 /dev/storage-group/storage' failed: exit code 5[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: filesystem type on dumpdir is 'fuse.sshfs' -using /var/tmp/vzdumptmp740285 for temporary files[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: Starting Backup of VM 102 (openvz)[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: CTID 102 exist mounted running[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status = running[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: backup mode: snapshot[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: ionice priority: 7[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: trying to remove stale snapshot '/dev/storage-group/vzsnap-cloud-0'[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: umount: /mnt/vzsnap0: not mounted[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]ERROR: command 'umount /mnt/vzsnap0' failed: exit code 1[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO:   Limit for the maximum number of semaphores reached. You can check and set the limits in /proc/sys/kernel/sem.[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO:   Unable to deactivate storage--group-vzsnap--cloud--0 (253:4)[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO:   Unable to deactivate logical volume "vzsnap-cloud-0"[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]ERROR: command 'lvremove -f /dev/storage-group/vzsnap-cloud-0' failed: exit code 5[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: creating lvm snapshot of /dev/mapper/storage--group-storage ('/dev/storage-group/vzsnap-cloud-0')[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO:   Logical volume "vzsnap-cloud-0" already exists in volume group "storage-group"[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: lvremove failed - trying again in 8 seconds[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: lvremove failed - trying again in 16 seconds[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: lvremove failed - trying again in 32 seconds[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]ERROR: command 'lvremove -f /dev/storage-group/vzsnap-cloud-0' failed: exit code 5[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]ERROR: Backup of VM 102 failed - command 'lvcreate --size 1024M --snapshot --name vzsnap-cloud-0 /dev/storage-group/storage' failed: exit code 5[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: Backup job finished with errors[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]TASK ERROR: job errors[/FONT][/COLOR]

I Googled a few of those lines, but the "maximum number of semaphores reached" INFO line is what lead me to a temporary solution. I increased the number of semaphores from 128 to 256, and was able to manually run
Code:
lvremove -f /dev/storage-group/vzsnap-cloud-0
From my Google searching, it appears there may be an issue with lvcreate/lvremove that doesn't release the arrays it uses. Not that I understand all of this, I am just repeating it.

Does anyone know of a way fix this issue? I had thought about running a cron script that kills semaphores that start with 0x0D4D, but I have no idea what affect that will have on the system. Any ideas? Thanks.
 
do you run the latest version? pls run aptitude update && aptitude full-upgrade.
 
I ran it a couple of weeks ago. But I also just ran it before posting this message. The version number just changed to 2.0-37. Was this a known issue with the version I was running, or do you expect that I will run into the problem again once the 256 semaphore limit is reached again?
 
Updates are complete. I also just rebooted the node for good measure. We will know in about 25 days if this problem still exists.
 
I had the same or quite similar error last night backing up to NFS Share: Version 2.0.59 / 18400f07
PHP:
 INFO: starting new backup job: vzdump 102 103 101 --quiet 1 --mailto serverwatch@finaware.ch --mode snapshot --compress lzo --storage NFSData INFO: Starting Backup of VM 101 (qemu) INFO: status = stopped INFO: backup mode: stop INFO: ionice priority: 7 INFO: creating archive '/mnt/pve/NFSData/dump/vzdump-qemu-101-2012_04_18-22_00_01.tar.lzo' INFO: adding '/mnt/pve/NFSData/dump/vzdump-qemu-101-2012_04_18-22_00_01.tmp/qemu-server.conf' to archive ('qemu-server.conf') INFO: adding '/dev/lvm.iscsi.lun0/vm-101-disk-1' to archive ('vm-disk-ide0.raw') INFO: Total bytes written: 26847742464 (32.21 MiB/s) INFO: archive file size: 10.77GB INFO: Finished Backup of VM 101 (00:13:56) INFO: Starting Backup of VM 102 (qemu) INFO: status = running INFO: backup mode: snapshot INFO: ionice priority: 7 INFO:   Logical volume "vzsnap-vmhost1-0" already exists in volume group "lvm.iscsi.lun0" ERROR: Backup of VM 102 failed - command 'lvcreate --size 1024M --snapshot --name 'vzsnap-vmhost1-0' '/dev/lvm.iscsi.lun0/vm-102-disk-1'' failed: exit code 5 INFO: Starting Backup of VM 103 (qemu) INFO: status = stopped INFO: backup mode: stop INFO: ionice priority: 7 INFO: creating archive '/mnt/pve/NFSData/dump/vzdump-qemu-103-2012_04_18-22_14_02.tar.lzo' INFO: adding '/mnt/pve/NFSData/dump/vzdump-qemu-103-2012_04_18-22_14_02.tmp/qemu-server.conf' to archive ('qemu-server.conf') INFO: adding '/dev/lvm.iscsi.lun0/vm-103-disk-1' to archive ('vm-disk-ide0.raw') INFO: Total bytes written: 16110324224 (29.04 MiB/s) INFO: archive file size: 8.25GB INFO: Finished Backup of VM 103 (00:09:13) INFO: Backup job finished with errors TASK ERROR: job errors
 
Last edited:
I've started getting the same error after months of backups running without a hitch. Was there a solution found for this?
I'm on version 3.0-20/0428106c

Thanks,
Alan

INFO: starting new backup job: vzdump 107 --quiet 1 --mode snapshot --mailto --compress lzo --storage proxmoxbackupslarry
INFO: Starting Backup of VM 107 (openvz)
INFO: CTID 107 exist mounted running
INFO: status = running
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: trying to remove stale snapshot '/dev/pve/vzsnap-larry-0'
INFO: umount: /mnt/vzsnap0: not mounted
ERROR: command 'umount /mnt/vzsnap0' failed: exit code 1
INFO: Limit for the maximum number of semaphores reached. You can check and set the limits in /proc/sys/kernel/sem.
INFO: Unable to deactivate pve-vzsnap--larry--0 (253:3)
INFO: Unable to deactivate logical volume "vzsnap-larry-0"
ERROR: command 'lvremove -f /dev/pve/vzsnap-larry-0' failed: exit code 5
INFO: creating lvm snapshot of /dev/mapper/pve-data ('/dev/pve/vzsnap-larry-0')
INFO: Logical volume "vzsnap-larry-0" already exists in volume group "pve"
INFO: lvremove failed - trying again in 8 seconds
INFO: lvremove failed - trying again in 16 seconds
INFO: lvremove failed - trying again in 32 seconds
ERROR: command 'lvremove -f /dev/pve/vzsnap-larry-0' failed: exit code 5
ERROR: Backup of VM 107 failed - command 'lvcreate --size 1024M --snapshot --name vzsnap-larry-0 /dev/pve/data' failed: exit code 5
INFO: Backup job finished with errors
TASK ERROR: job errors
 
Hello,
I am having the same issue:

100: Jan 18 04:15:01 INFO: Starting Backup of VM 100 (openvz)
100: Jan 18 04:15:01 INFO: CTID 100 exist mounted running
100: Jan 18 04:15:01 INFO: status = running
100: Jan 18 04:15:01 INFO: backup mode: snapshot
100: Jan 18 04:15:01 INFO: ionice priority: 7
100: Jan 18 04:15:01 INFO: trying to remove stale snapshot '/dev/pve/vzsnap-xx363900-0'
100: Jan 18 04:15:01 INFO: umount: /mnt/vzsnap0: not mounted
100: Jan 18 04:15:01 ERROR: command 'umount /mnt/vzsnap0' failed: exit code 1
100: Jan 18 04:15:01 INFO: Limit for the maximum number of semaphores reached. You can check and set the limits in /proc/sys/kernel/sem.
100: Jan 18 04:15:01 INFO: Unable to deactivate pve-vzsnap--xx363900--0 (253:1)
100: Jan 18 04:15:01 INFO: Unable to deactivate logical volume "vzsnap-xx363900-0"
100: Jan 18 04:15:01 ERROR: command 'lvremove -f /dev/pve/vzsnap-xx363900-0' failed: exit code 5
100: Jan 18 04:15:01 INFO: creating lvm snapshot of /dev/mapper/pve-data ('/dev/pve/vzsnap-xx363900-0')
100: Jan 18 04:15:01 INFO: Logical volume "vzsnap-xx363900-0" already exists in volume group "pve"
100: Jan 18 04:15:08 INFO: lvremove failed - trying again in 8 seconds
100: Jan 18 04:15:16 INFO: lvremove failed - trying again in 16 seconds
100: Jan 18 04:15:32 INFO: lvremove failed - trying again in 32 seconds
100: Jan 18 04:16:04 ERROR: command 'lvremove -f /dev/pve/vzsnap-xx363900-0' failed: exit code 5
100: Jan 18 04:16:04 ERROR: Backup of VM 100 failed - command 'lvcreate --size 1024M --snapshot --name vzsnap-xx363900-0 /dev/pve/data' failed: exit code 5

If I encrease the maximum number of semaphore like this:
printf '250\t32000\t32\t200' >/proc/sys/kernel/sem

It works for a couple of days but once the amount of sem is reached, I get the backup error again!

pveversion:
pve-manager: 3.0-20 (pve-manager/3.0/0428106c)
running kernel: 2.6.32-20-pve
proxmox-ve-2.6.32: 3.0-100
pve-kernel-2.6.32-20-pve: 2.6.32-100
lvm2: 2.02.95-pve3
clvm: 2.02.95-pve3
corosync-pve: 1.4.5-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.0-1
pve-cluster: 3.0-4
qemu-server: 3.0-15
pve-firmware: 1.0-22
libpve-common-perl: 3.0-4
libpve-access-control: 3.0-4
libpve-storage-perl: 3.0-6
vncterm: 1.1-3
vzctl: 4.0-1pve3
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 1.4-12
ksm-control-daemon: 1.1-1

Anyone knows how to solve this?

Thank you
 
this is the new version details:
pveversion -v
proxmox-ve-2.6.32: 3.1-114 (running kernel: 2.6.32-26-pve)
pve-manager: 3.1-21 (running version: 3.1-21/93bf03d4)
pve-kernel-2.6.32-20-pve: 2.6.32-100
pve-kernel-2.6.32-26-pve: 2.6.32-114
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.5-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.0-2
pve-cluster: 3.0-8
qemu-server: 3.1-8
pve-firmware: 1.0-23
libpve-common-perl: 3.0-8
libpve-access-control: 3.0-7
libpve-storage-perl: 3.0-17
pve-libspice-server1: 0.12.4-2
vncterm: 1.1-4
vzctl: 4.0-1pve4
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 1.4-17
ksm-control-daemon: 1.1-1
glusterfs-client: 3.4.1-1

I get only this in my syslog:
task UPID:ns263900:00000C85:0000035F:52E0E890:startall::root@pam:: command 'vzctl start 101' failed: exit code 9
Screen Shot 2014-01-23 at 12.06.20 PM.png