Container Backups Failing

D

deejross

Guest
My node has an uptime of 28 days now and I just noticed that all of my container backups have been failing for the last couple of days. I am running version 2.0-30 and this is what my backup log says:

Code:
[COLOR=#000000][FONT=tahoma]INFO: starting new backup job: vzdump --quiet 1 --mode snapshot --compress gzip --maxfiles 2 --storage mac --node cloud --all 1[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: filesystem type on dumpdir is 'fuse.sshfs' -using /var/tmp/vzdumptmp740285 for temporary files[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: Starting Backup of VM 100 (openvz)[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: CTID 100 exist mounted running[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status = running[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: backup mode: snapshot[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: ionice priority: 7[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: trying to remove stale snapshot '/dev/storage-group/vzsnap-cloud-0'[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: umount: /mnt/vzsnap0: not mounted[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]ERROR: command 'umount /mnt/vzsnap0' failed: exit code 1[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO:   Limit for the maximum number of semaphores reached. You can check and set the limits in /proc/sys/kernel/sem.[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO:   Unable to deactivate storage--group-vzsnap--cloud--0 (253:4)[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO:   Unable to deactivate logical volume "vzsnap-cloud-0"[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]ERROR: command 'lvremove -f /dev/storage-group/vzsnap-cloud-0' failed: exit code 5[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: creating lvm snapshot of /dev/mapper/storage--group-storage ('/dev/storage-group/vzsnap-cloud-0')[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO:   Logical volume "vzsnap-cloud-0" already exists in volume group "storage-group"[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: lvremove failed - trying again in 8 seconds[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: lvremove failed - trying again in 16 seconds[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: lvremove failed - trying again in 32 seconds[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]ERROR: command 'lvremove -f /dev/storage-group/vzsnap-cloud-0' failed: exit code 5[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]ERROR: Backup of VM 100 failed - command 'lvcreate --size 1024M --snapshot --name vzsnap-cloud-0 /dev/storage-group/storage' failed: exit code 5[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: filesystem type on dumpdir is 'fuse.sshfs' -using /var/tmp/vzdumptmp740285 for temporary files[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: Starting Backup of VM 101 (openvz)[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: CTID 101 exist mounted running[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status = running[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: backup mode: snapshot[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: ionice priority: 7[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: trying to remove stale snapshot '/dev/storage-group/vzsnap-cloud-0'[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: umount: /mnt/vzsnap0: not mounted[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]ERROR: command 'umount /mnt/vzsnap0' failed: exit code 1[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO:   Limit for the maximum number of semaphores reached. You can check and set the limits in /proc/sys/kernel/sem.[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO:   Unable to deactivate storage--group-vzsnap--cloud--0 (253:4)[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO:   Unable to deactivate logical volume "vzsnap-cloud-0"[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]ERROR: command 'lvremove -f /dev/storage-group/vzsnap-cloud-0' failed: exit code 5[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: creating lvm snapshot of /dev/mapper/storage--group-storage ('/dev/storage-group/vzsnap-cloud-0')[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO:   Logical volume "vzsnap-cloud-0" already exists in volume group "storage-group"[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: lvremove failed - trying again in 8 seconds[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: lvremove failed - trying again in 16 seconds[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: lvremove failed - trying again in 32 seconds[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]ERROR: command 'lvremove -f /dev/storage-group/vzsnap-cloud-0' failed: exit code 5[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]ERROR: Backup of VM 101 failed - command 'lvcreate --size 1024M --snapshot --name vzsnap-cloud-0 /dev/storage-group/storage' failed: exit code 5[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: filesystem type on dumpdir is 'fuse.sshfs' -using /var/tmp/vzdumptmp740285 for temporary files[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: Starting Backup of VM 102 (openvz)[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: CTID 102 exist mounted running[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status = running[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: backup mode: snapshot[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: ionice priority: 7[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: trying to remove stale snapshot '/dev/storage-group/vzsnap-cloud-0'[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: umount: /mnt/vzsnap0: not mounted[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]ERROR: command 'umount /mnt/vzsnap0' failed: exit code 1[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO:   Limit for the maximum number of semaphores reached. You can check and set the limits in /proc/sys/kernel/sem.[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO:   Unable to deactivate storage--group-vzsnap--cloud--0 (253:4)[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO:   Unable to deactivate logical volume "vzsnap-cloud-0"[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]ERROR: command 'lvremove -f /dev/storage-group/vzsnap-cloud-0' failed: exit code 5[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: creating lvm snapshot of /dev/mapper/storage--group-storage ('/dev/storage-group/vzsnap-cloud-0')[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO:   Logical volume "vzsnap-cloud-0" already exists in volume group "storage-group"[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: lvremove failed - trying again in 8 seconds[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: lvremove failed - trying again in 16 seconds[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: lvremove failed - trying again in 32 seconds[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]ERROR: command 'lvremove -f /dev/storage-group/vzsnap-cloud-0' failed: exit code 5[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]ERROR: Backup of VM 102 failed - command 'lvcreate --size 1024M --snapshot --name vzsnap-cloud-0 /dev/storage-group/storage' failed: exit code 5[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: Backup job finished with errors[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]TASK ERROR: job errors[/FONT][/COLOR]

I Googled a few of those lines, but the "maximum number of semaphores reached" INFO line is what lead me to a temporary solution. I increased the number of semaphores from 128 to 256, and was able to manually run
Code:
lvremove -f /dev/storage-group/vzsnap-cloud-0
From my Google searching, it appears there may be an issue with lvcreate/lvremove that doesn't release the arrays it uses. Not that I understand all of this, I am just repeating it.

Does anyone know of a way fix this issue? I had thought about running a cron script that kills semaphores that start with 0x0D4D, but I have no idea what affect that will have on the system. Any ideas? Thanks.
 
do you run the latest version? pls run aptitude update && aptitude full-upgrade.
 
I ran it a couple of weeks ago. But I also just ran it before posting this message. The version number just changed to 2.0-37. Was this a known issue with the version I was running, or do you expect that I will run into the problem again once the 256 semaphore limit is reached again?
 
Updates are complete. I also just rebooted the node for good measure. We will know in about 25 days if this problem still exists.
 
I had the same or quite similar error last night backing up to NFS Share: Version 2.0.59 / 18400f07
PHP:
 INFO: starting new backup job: vzdump 102 103 101 --quiet 1 --mailto serverwatch@finaware.ch --mode snapshot --compress lzo --storage NFSData INFO: Starting Backup of VM 101 (qemu) INFO: status = stopped INFO: backup mode: stop INFO: ionice priority: 7 INFO: creating archive '/mnt/pve/NFSData/dump/vzdump-qemu-101-2012_04_18-22_00_01.tar.lzo' INFO: adding '/mnt/pve/NFSData/dump/vzdump-qemu-101-2012_04_18-22_00_01.tmp/qemu-server.conf' to archive ('qemu-server.conf') INFO: adding '/dev/lvm.iscsi.lun0/vm-101-disk-1' to archive ('vm-disk-ide0.raw') INFO: Total bytes written: 26847742464 (32.21 MiB/s) INFO: archive file size: 10.77GB INFO: Finished Backup of VM 101 (00:13:56) INFO: Starting Backup of VM 102 (qemu) INFO: status = running INFO: backup mode: snapshot INFO: ionice priority: 7 INFO:   Logical volume "vzsnap-vmhost1-0" already exists in volume group "lvm.iscsi.lun0" ERROR: Backup of VM 102 failed - command 'lvcreate --size 1024M --snapshot --name 'vzsnap-vmhost1-0' '/dev/lvm.iscsi.lun0/vm-102-disk-1'' failed: exit code 5 INFO: Starting Backup of VM 103 (qemu) INFO: status = stopped INFO: backup mode: stop INFO: ionice priority: 7 INFO: creating archive '/mnt/pve/NFSData/dump/vzdump-qemu-103-2012_04_18-22_14_02.tar.lzo' INFO: adding '/mnt/pve/NFSData/dump/vzdump-qemu-103-2012_04_18-22_14_02.tmp/qemu-server.conf' to archive ('qemu-server.conf') INFO: adding '/dev/lvm.iscsi.lun0/vm-103-disk-1' to archive ('vm-disk-ide0.raw') INFO: Total bytes written: 16110324224 (29.04 MiB/s) INFO: archive file size: 8.25GB INFO: Finished Backup of VM 103 (00:09:13) INFO: Backup job finished with errors TASK ERROR: job errors
 
Last edited:
I've started getting the same error after months of backups running without a hitch. Was there a solution found for this?
I'm on version 3.0-20/0428106c

Thanks,
Alan

INFO: starting new backup job: vzdump 107 --quiet 1 --mode snapshot --mailto --compress lzo --storage proxmoxbackupslarry
INFO: Starting Backup of VM 107 (openvz)
INFO: CTID 107 exist mounted running
INFO: status = running
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: trying to remove stale snapshot '/dev/pve/vzsnap-larry-0'
INFO: umount: /mnt/vzsnap0: not mounted
ERROR: command 'umount /mnt/vzsnap0' failed: exit code 1
INFO: Limit for the maximum number of semaphores reached. You can check and set the limits in /proc/sys/kernel/sem.
INFO: Unable to deactivate pve-vzsnap--larry--0 (253:3)
INFO: Unable to deactivate logical volume "vzsnap-larry-0"
ERROR: command 'lvremove -f /dev/pve/vzsnap-larry-0' failed: exit code 5
INFO: creating lvm snapshot of /dev/mapper/pve-data ('/dev/pve/vzsnap-larry-0')
INFO: Logical volume "vzsnap-larry-0" already exists in volume group "pve"
INFO: lvremove failed - trying again in 8 seconds
INFO: lvremove failed - trying again in 16 seconds
INFO: lvremove failed - trying again in 32 seconds
ERROR: command 'lvremove -f /dev/pve/vzsnap-larry-0' failed: exit code 5
ERROR: Backup of VM 107 failed - command 'lvcreate --size 1024M --snapshot --name vzsnap-larry-0 /dev/pve/data' failed: exit code 5
INFO: Backup job finished with errors
TASK ERROR: job errors
 
Hello,
I am having the same issue:

100: Jan 18 04:15:01 INFO: Starting Backup of VM 100 (openvz)
100: Jan 18 04:15:01 INFO: CTID 100 exist mounted running
100: Jan 18 04:15:01 INFO: status = running
100: Jan 18 04:15:01 INFO: backup mode: snapshot
100: Jan 18 04:15:01 INFO: ionice priority: 7
100: Jan 18 04:15:01 INFO: trying to remove stale snapshot '/dev/pve/vzsnap-xx363900-0'
100: Jan 18 04:15:01 INFO: umount: /mnt/vzsnap0: not mounted
100: Jan 18 04:15:01 ERROR: command 'umount /mnt/vzsnap0' failed: exit code 1
100: Jan 18 04:15:01 INFO: Limit for the maximum number of semaphores reached. You can check and set the limits in /proc/sys/kernel/sem.
100: Jan 18 04:15:01 INFO: Unable to deactivate pve-vzsnap--xx363900--0 (253:1)
100: Jan 18 04:15:01 INFO: Unable to deactivate logical volume "vzsnap-xx363900-0"
100: Jan 18 04:15:01 ERROR: command 'lvremove -f /dev/pve/vzsnap-xx363900-0' failed: exit code 5
100: Jan 18 04:15:01 INFO: creating lvm snapshot of /dev/mapper/pve-data ('/dev/pve/vzsnap-xx363900-0')
100: Jan 18 04:15:01 INFO: Logical volume "vzsnap-xx363900-0" already exists in volume group "pve"
100: Jan 18 04:15:08 INFO: lvremove failed - trying again in 8 seconds
100: Jan 18 04:15:16 INFO: lvremove failed - trying again in 16 seconds
100: Jan 18 04:15:32 INFO: lvremove failed - trying again in 32 seconds
100: Jan 18 04:16:04 ERROR: command 'lvremove -f /dev/pve/vzsnap-xx363900-0' failed: exit code 5
100: Jan 18 04:16:04 ERROR: Backup of VM 100 failed - command 'lvcreate --size 1024M --snapshot --name vzsnap-xx363900-0 /dev/pve/data' failed: exit code 5

If I encrease the maximum number of semaphore like this:
printf '250\t32000\t32\t200' >/proc/sys/kernel/sem

It works for a couple of days but once the amount of sem is reached, I get the backup error again!

pveversion:
pve-manager: 3.0-20 (pve-manager/3.0/0428106c)
running kernel: 2.6.32-20-pve
proxmox-ve-2.6.32: 3.0-100
pve-kernel-2.6.32-20-pve: 2.6.32-100
lvm2: 2.02.95-pve3
clvm: 2.02.95-pve3
corosync-pve: 1.4.5-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.0-1
pve-cluster: 3.0-4
qemu-server: 3.0-15
pve-firmware: 1.0-22
libpve-common-perl: 3.0-4
libpve-access-control: 3.0-4
libpve-storage-perl: 3.0-6
vncterm: 1.1-3
vzctl: 4.0-1pve3
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 1.4-12
ksm-control-daemon: 1.1-1

Anyone knows how to solve this?

Thank you
 
this is the new version details:
pveversion -v
proxmox-ve-2.6.32: 3.1-114 (running kernel: 2.6.32-26-pve)
pve-manager: 3.1-21 (running version: 3.1-21/93bf03d4)
pve-kernel-2.6.32-20-pve: 2.6.32-100
pve-kernel-2.6.32-26-pve: 2.6.32-114
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.5-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.0-2
pve-cluster: 3.0-8
qemu-server: 3.1-8
pve-firmware: 1.0-23
libpve-common-perl: 3.0-8
libpve-access-control: 3.0-7
libpve-storage-perl: 3.0-17
pve-libspice-server1: 0.12.4-2
vncterm: 1.1-4
vzctl: 4.0-1pve4
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 1.4-17
ksm-control-daemon: 1.1-1
glusterfs-client: 3.4.1-1

I get only this in my syslog:
task UPID:ns263900:00000C85:0000035F:52E0E890:startall::root@pam:: command 'vzctl start 101' failed: exit code 9
Screen Shot 2014-01-23 at 12.06.20 PM.png
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!