LVM snapshot problems during backup (umount / lvremove fails)

gkovacs

Renowned Member
Dec 22, 2008
516
51
93
Budapest, Hungary
This is an up to date Proxmox VE 3.4 server, using Adaptec HW raid / LVM / ext4. It hosts about 20 containers and 2-3 VMs.

During nightly backups, the first couple of containers get backed up without error. Then a container (always the same number) starts giving these errors:

Code:
Jun 01 03:50:53 INFO: Starting Backup of VM 215 (openvz)
Jun 01 03:50:53 INFO: CTID 215 exist mounted running
Jun 01 03:50:53 INFO: status = running
Jun 01 03:50:53 INFO: backup mode: snapshot
Jun 01 03:50:53 INFO: bandwidth limit: 131072 KB/s
Jun 01 03:50:53 INFO: ionice priority: 7
Jun 01 03:50:53 INFO: creating lvm snapshot of /dev/mapper/pve-data ('/dev/pve/vzsnap-proxmox2-0')
Jun 01 03:50:54 INFO:   Logical volume "vzsnap-proxmox2-0" created
Jun 01 03:50:55 INFO: creating archive '/mnt/pve/Backups-Weekly/dump/vzdump-openvz-215-2015_06_01-03_50_53.tar.lzo'
Jun 01 06:44:04 INFO: Total bytes written: 68104570880 (64GiB, 6.3MiB/s)
Jun 01 06:44:04 INFO: archive file size: 53.66GB
Jun 01 06:44:04 INFO: delete old backup '/mnt/pve/Backups-Weekly/dump/vzdump-openvz-215-2015_04_20-01_51_51.tar.lzo'
[COLOR=#ff0000]Jun 01 06:44:07 INFO: umount: /mnt/vzsnap0: device is busy.[/COLOR]
Jun 01 06:44:07 INFO:         (In some cases useful info about processes that use
Jun 01 06:44:07 INFO:          the device is found by lsof(8) or fuser(1))
Jun 01 06:44:07 ERROR: command 'umount /mnt/vzsnap0' failed: exit code 1
Jun 01 06:44:16 INFO: lvremove failed - trying again in 8 seconds
Jun 01 06:44:24 INFO: lvremove failed - trying again in 16 seconds
Jun 01 06:44:40 INFO: lvremove failed - trying again in 32 seconds
Jun 01 06:45:12 ERROR: command 'lvremove -f /dev/pve/vzsnap-proxmox2-0' failed: exit code 5
Jun 01 06:45:13 INFO: Finished Backup of VM 215 (02:54:20)

This is the last container that gets backed up, but the snapshot can't be released. After that, all container backups fail due to this error:

Code:
Jun 01 06:53:50 INFO: Starting Backup of VM 237 (openvz)
Jun 01 06:53:50 INFO: CTID 237 exist mounted running
Jun 01 06:53:50 INFO: status = running
Jun 01 06:53:50 INFO: backup mode: snapshot
Jun 01 06:53:50 INFO: bandwidth limit: 131072 KB/s
Jun 01 06:53:50 INFO: ionice priority: 7
[COLOR=#ff0000]Jun 01 06:53:50 INFO: trying to remove stale snapshot '/dev/pve/vzsnap-proxmox2-0'
Jun 01 06:53:50 INFO: umount: /mnt/vzsnap0: device is busy.[/COLOR]
Jun 01 06:53:50 INFO:         (In some cases useful info about processes that use
Jun 01 06:53:50 INFO:          the device is found by lsof(8) or fuser(1))
Jun 01 06:53:50 ERROR: command 'umount /mnt/vzsnap0' failed: exit code 1
Jun 01 06:53:50 INFO:   Logical volume pve/vzsnap-proxmox2-0 contains a filesystem in use.
Jun 01 06:53:50 ERROR: command 'lvremove -f /dev/pve/vzsnap-proxmox2-0' failed: exit code 5
Jun 01 06:53:50 INFO: creating lvm snapshot of /dev/mapper/pve-data ('/dev/pve/vzsnap-proxmox2-0')
Jun 01 06:53:50 INFO:   Logical volume "vzsnap-proxmox2-0" already exists in volume group "pve"
Jun 01 06:53:57 INFO: lvremove failed - trying again in 8 seconds
Jun 01 06:54:05 INFO: lvremove failed - trying again in 16 seconds
Jun 01 06:54:22 INFO: lvremove failed - trying again in 32 seconds
Jun 01 06:54:54 ERROR: command 'lvremove -f /dev/pve/vzsnap-proxmox2-0' failed: exit code 5
Jun 01 06:54:54 ERROR: Backup of VM 237 failed - command 'lvcreate --size 12288M --snapshot --name vzsnap-proxmox2-0 /dev/pve/data' failed: exit code 5

In the morning I check with lvs: the snapshot is usually 75% used (never been full), and I can manually umount and lvremove it, but next night the same thing happens.
I tried to migrate away the container in question, but then the next one in line gives the same error.

Any idea what to do next?
 
Last edited:
Forgot to add: lvs gives 'file descriptor leaked' messages on the same host, not sure if related:

Code:
root@proxmox2:/etc# lvs
File descriptor 7 (pipe:[156027980]) leaked on lvs invocation. Parent PID 541254: bash
  LV   VG   Attr      LSize  Pool Origin Data%  Move Log Copy%  Convert
  data pve  -wi-ao---  1.30t
  root pve  -wi-ao--- 32.00g
  swap pve  -wi-ao--- 16.00g