This is an up to date Proxmox VE 3.4 server, using Adaptec HW raid / LVM / ext4. It hosts about 20 containers and 2-3 VMs.
During nightly backups, the first couple of containers get backed up without error. Then a container (always the same number) starts giving these errors:
This is the last container that gets backed up, but the snapshot can't be released. After that, all container backups fail due to this error:
In the morning I check with lvs: the snapshot is usually 75% used (never been full), and I can manually umount and lvremove it, but next night the same thing happens.
I tried to migrate away the container in question, but then the next one in line gives the same error.
Any idea what to do next?
During nightly backups, the first couple of containers get backed up without error. Then a container (always the same number) starts giving these errors:
Code:
Jun 01 03:50:53 INFO: Starting Backup of VM 215 (openvz)
Jun 01 03:50:53 INFO: CTID 215 exist mounted running
Jun 01 03:50:53 INFO: status = running
Jun 01 03:50:53 INFO: backup mode: snapshot
Jun 01 03:50:53 INFO: bandwidth limit: 131072 KB/s
Jun 01 03:50:53 INFO: ionice priority: 7
Jun 01 03:50:53 INFO: creating lvm snapshot of /dev/mapper/pve-data ('/dev/pve/vzsnap-proxmox2-0')
Jun 01 03:50:54 INFO: Logical volume "vzsnap-proxmox2-0" created
Jun 01 03:50:55 INFO: creating archive '/mnt/pve/Backups-Weekly/dump/vzdump-openvz-215-2015_06_01-03_50_53.tar.lzo'
Jun 01 06:44:04 INFO: Total bytes written: 68104570880 (64GiB, 6.3MiB/s)
Jun 01 06:44:04 INFO: archive file size: 53.66GB
Jun 01 06:44:04 INFO: delete old backup '/mnt/pve/Backups-Weekly/dump/vzdump-openvz-215-2015_04_20-01_51_51.tar.lzo'
[COLOR=#ff0000]Jun 01 06:44:07 INFO: umount: /mnt/vzsnap0: device is busy.[/COLOR]
Jun 01 06:44:07 INFO: (In some cases useful info about processes that use
Jun 01 06:44:07 INFO: the device is found by lsof(8) or fuser(1))
Jun 01 06:44:07 ERROR: command 'umount /mnt/vzsnap0' failed: exit code 1
Jun 01 06:44:16 INFO: lvremove failed - trying again in 8 seconds
Jun 01 06:44:24 INFO: lvremove failed - trying again in 16 seconds
Jun 01 06:44:40 INFO: lvremove failed - trying again in 32 seconds
Jun 01 06:45:12 ERROR: command 'lvremove -f /dev/pve/vzsnap-proxmox2-0' failed: exit code 5
Jun 01 06:45:13 INFO: Finished Backup of VM 215 (02:54:20)
This is the last container that gets backed up, but the snapshot can't be released. After that, all container backups fail due to this error:
Code:
Jun 01 06:53:50 INFO: Starting Backup of VM 237 (openvz)
Jun 01 06:53:50 INFO: CTID 237 exist mounted running
Jun 01 06:53:50 INFO: status = running
Jun 01 06:53:50 INFO: backup mode: snapshot
Jun 01 06:53:50 INFO: bandwidth limit: 131072 KB/s
Jun 01 06:53:50 INFO: ionice priority: 7
[COLOR=#ff0000]Jun 01 06:53:50 INFO: trying to remove stale snapshot '/dev/pve/vzsnap-proxmox2-0'
Jun 01 06:53:50 INFO: umount: /mnt/vzsnap0: device is busy.[/COLOR]
Jun 01 06:53:50 INFO: (In some cases useful info about processes that use
Jun 01 06:53:50 INFO: the device is found by lsof(8) or fuser(1))
Jun 01 06:53:50 ERROR: command 'umount /mnt/vzsnap0' failed: exit code 1
Jun 01 06:53:50 INFO: Logical volume pve/vzsnap-proxmox2-0 contains a filesystem in use.
Jun 01 06:53:50 ERROR: command 'lvremove -f /dev/pve/vzsnap-proxmox2-0' failed: exit code 5
Jun 01 06:53:50 INFO: creating lvm snapshot of /dev/mapper/pve-data ('/dev/pve/vzsnap-proxmox2-0')
Jun 01 06:53:50 INFO: Logical volume "vzsnap-proxmox2-0" already exists in volume group "pve"
Jun 01 06:53:57 INFO: lvremove failed - trying again in 8 seconds
Jun 01 06:54:05 INFO: lvremove failed - trying again in 16 seconds
Jun 01 06:54:22 INFO: lvremove failed - trying again in 32 seconds
Jun 01 06:54:54 ERROR: command 'lvremove -f /dev/pve/vzsnap-proxmox2-0' failed: exit code 5
Jun 01 06:54:54 ERROR: Backup of VM 237 failed - command 'lvcreate --size 12288M --snapshot --name vzsnap-proxmox2-0 /dev/pve/data' failed: exit code 5
In the morning I check with lvs: the snapshot is usually 75% used (never been full), and I can manually umount and lvremove it, but next night the same thing happens.
I tried to migrate away the container in question, but then the next one in line gives the same error.
Any idea what to do next?
Last edited: