Backup hang up @ "INFO: create storage snapshot 'vzdump'"

AngusMcGyver

Member
Jul 20, 2020
12
3
8
44
Hello there,

from time to time the automatic backup process of a lxc container hang forever at step INFO: create storage snapshot 'vzdump'.
not clear when this happens. if i start the backup process by hand everything works fine. the container data is located in lvm-thin volume.

Any idea how to fix this?

Code:
Detailed backup logs:

vzdump 102 --mailnotification failure --storage backup --quiet 1 --mode snapshot --node pm --compress lzo

102: 2020-07-19 07:00:02 INFO: Starting Backup of VM 102 (lxc)
102: 2020-07-19 07:00:02 INFO: status = running
102: 2020-07-19 07:00:02 INFO: CT Name: repo
102: 2020-07-19 07:00:02 INFO: backup mode: snapshot
102: 2020-07-19 07:00:02 INFO: ionice priority: 7
102: 2020-07-19 07:00:02 INFO: create storage snapshot 'vzdump'

102: 2020-07-20 08:07:57 ERROR: Backup of VM 102 failed - interrupted by signal


Code:
# pveversion -v
proxmox-ve: 6.2-1 (running kernel: 5.4.44-2-pve)
pve-manager: 6.2-6 (running version: 6.2-6/ee1d7754)
pve-kernel-5.4: 6.2-4
pve-kernel-helper: 6.2-4
pve-kernel-5.3: 6.1-6
pve-kernel-5.0: 6.0-11
pve-kernel-5.4.44-2-pve: 5.4.44-2
pve-kernel-5.4.41-1-pve: 5.4.41-1
pve-kernel-4.15: 5.4-13
pve-kernel-5.3.18-3-pve: 5.3.18-3
pve-kernel-5.3.18-2-pve: 5.3.18-2
pve-kernel-5.3.18-1-pve: 5.3.18-1
pve-kernel-5.0.21-5-pve: 5.0.21-10
pve-kernel-4.15.18-25-pve: 4.15.18-53
pve-kernel-4.15.18-15-pve: 4.15.18-40
pve-kernel-4.10.17-3-pve: 4.10.17-23
pve-kernel-4.10.17-2-pve: 4.10.17-20
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.4-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.16-pve1
libproxmox-acme-perl: 1.0.4
libpve-access-control: 6.1-1
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.1-3
libpve-guest-common-perl: 3.0-10
libpve-http-server-perl: 3.0-5
libpve-storage-perl: 6.1-8
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.2-1
lxcfs: 4.0.3-pve3
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.2-8
pve-cluster: 6.1-8
pve-container: 3.1-8
pve-docs: 6.2-4
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-2
pve-firmware: 3.1-1
pve-ha-manager: 3.0-9
pve-i18n: 2.1-3
pve-qemu-kvm: 5.0.0-4
pve-xtermjs: 4.3.0-1
qemu-server: 6.2-3
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.4-pve1
 
Hi,

Please try to upgrade to the current version and try again. pve-manager: 6.2-6 should to be pve-manager: 6.2-10

hope that help :)
 
hi, thanks for response.

i will give it a try, but i have not that much hope. the problem persists for over a year now. back then i used proxmox version 5. my hope was to get ride of the problem with version 6.1. and now it still persists with 6.2

also found a thread from 2017 with no real solution other then try to upgrade to the newest release
https://forum.proxmox.com/threads/vzdump-hangs.35192/
 
  • Like
Reactions: Tmanok
hi agian,

as i guesed updade didn't help. backupprocess hangs up this time with pve-manager: 6.2-10 (running version: 6.2-10/a20769ed)


Code:
INFO: Starting Backup of VM 117 (lxc)
INFO: Backup started at 2020-07-24 02:41:20
INFO: status = running
INFO: CT Name: npm
INFO: including mount point rootfs ('/') in backup
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: create storage snapshot 'vzdump'


another backup of the same backup job earlier works fine
Code:
INFO: Starting Backup of VM 106 (lxc)
INFO: Backup started at 2020-07-24 01:51:18
INFO: status = running
INFO: CT Name: radius
INFO: including mount point rootfs ('/') in backup
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: create storage snapshot 'vzdump'
  Logical volume "snap_vm-106-disk-0_vzdump" created.
INFO: creating vzdump archive '/mnt/pve/backup-daily/dump/vzdump-lxc-106-2020_07_24-01_51_18.tar.lzo'
INFO: Total bytes written: 1936261120 (1.9GiB, 5.9MiB/s)
INFO: archive file size: 1.06GB
INFO: delete old backup '/mnt/pve/backup-daily/dump/vzdump-lxc-106-2020_07_21-01_51_53.tar.lzo'
INFO: remove vzdump snapshot
  Logical volume "snap_vm-106-disk-0_vzdump" successfully removed
INFO: Finished Backup of VM 106 (00:05:21)

/var/log/kern.log has no entry around the time the backup of this container starts

/var/log/syslog around this time
Code:
Jul 24 02:41:20 barney vzdump[35790]: INFO: Finished Backup of VM 116 (00:01:58)
Jul 24 02:41:20 barney vzdump[35790]: INFO: Starting Backup of VM 117 (lxc)
Jul 24 02:43:36 barney corosync[2163]: [TOTEM ] Retransmit List: 4664e
Jul 24 02:55:28 barney pmxcfs[2142]: [status] notice: received log
Jul 24 02:55:46 barney pmxcfs[2142]: [status] notice: received log[CODE]
# no more vzdump after here
 
could you post the config of both containers as well?
 
sure here we go:

the working conatiner
Code:
# cat /etc/pve/lxc/106.conf
arch: amd64
cores: 1
hostname: radius
memory: 1024
net0: name=eth0,bridge=vmbr0,hwaddr=ca:df:35:33:9c:38,ip=dhcp,type=veth
onboot: 1
ostype: ubuntu
rootfs: local-hdd-sdc-lvm-thin:vm-106-disk-0,size=8G
startup: order=10,up=30
swap: 512
lxc.apparmor.profile: lxc-default-with-mounting

and the container with the problem
Code:
# cat /etc/pve/lxc/117.conf
arch: amd64
cores: 2
hostname: npm
memory: 2048
net0: name=eth0,bridge=vmbr0,hwaddr=E6:6D:60:41:C0:B7,ip=dhcp,type=veth
onboot: 1
ostype: debian
rootfs: local-hdd-sdc-lvm-thin:vm-117-disk-0,size=10G
startup: order=70,up=30
swap: 0
lxc.apparmor.profile: lxc-default-with-mounting
 
lxc-default-with-mounting what is this for? also, the failing container is bigger, could you include 'lvs' output?
 
lxc-default-with-mounting is to mount our nfs server into the container

Code:
# cat /etc/apparmor.d/lxc/lxc-default-with-mounting
# Do not load this file.  Rather, load /etc/apparmor.d/lxc-containers, which
# will source all profiles under /etc/apparmor.d/lxc

profile lxc-default-with-mounting flags=(attach_disconnected,mediate_deleted) {
  #include <abstractions/lxc/container-base>

# allow standard blockdevtypes.
# The concern here is in-kernel superblock parsers bringing down the
# host with bad data.  However, we continue to disallow proc, sys, securityfs,
# etc to nonstandard locations.
  mount fstype=ext*,
  mount fstype=xfs,
  mount fstype=btrfs,
  mount fstype=nfs*,
  mount fstype=rpc_pipefs*,
  mount fstype=cgroup,
  mount fstype=autofs,
}


Code:
  LV                              VG   Attr       LSize   Pool Origin                          Data%  Meta%  Move Log Cpy%Sync Convert
  root                            pve  -wi-ao---- 177.00g                                                                             
  swap                            pve  -wi-a-----   8.00g                                                                             
  data                            pve2 twi-aotz-- 700.00g                                      63.44  44.25                           
  lvol0                           pve2 -wi-a-----  96.00m                                                                             
  snap_vm-102-disk-0_pre1320      pve2 Vri---tz-k 240.00g data vm-102-disk-0                                                         
  vm-102-disk-0                   pve2 Vwi-aotz-- 240.00g data                                 94.15                                 
  vm-120-disk-0                   pve2 Vwi-aotz--   8.00g data                                 95.44                                 
  vm-203-disk-0                   pve2 Vwi-aotz--  90.00g data                                 97.86                                 
  vm-214-disk-0                   pve2 Vwi-aotz-- 128.00g data                                 88.34                                 
  data                            pve3 twi-aotz--   1.70t                                      28.95  3.62                           
  snap_vm-200-disk-1_backup       pve3 Vri---tz-k 100.00g data vm-200-disk-1                                                         
  snap_vm-200-disk-1_openjdk      pve3 Vri---tz-k 100.00g data vm-200-disk-1                                                         
  snap_vm-200-disk-1_working      pve3 Vri---tz-k 100.00g data vm-200-disk-1                                                         
  snap_vm-312-disk-1_preMultiAttr pve3 Vri---tz-k  80.00g data                                                                       
  snap_vm-312-disk-1_v3153        pve3 Vri---tz-k  80.00g data                                                                       
  vm-102-disk-0                   pve3 Vwi-a-tz-- 240.00g data                                 52.86                                 
  vm-106-disk-0                   pve3 Vwi-aotz--   8.00g data                                 58.89                                 
  vm-109-disk-0                   pve3 Vwi-aotz--   8.00g data                                 88.68                                 
  vm-110-disk-1                   pve3 Vwi-aotz--   8.00g data                                 94.40                                 
  vm-115-disk-1                   pve3 Vwi-aotz--  30.00g data                                 89.76                                 
  vm-117-disk-0                   pve3 Vwi-aotz--  10.00g data                                 97.30                                 
  vm-118-disk-1                   pve3 Vwi-a-tz--  10.00g data                                 31.01                                 
  vm-200-disk-1                   pve3 Vwi-aotz-- 100.00g data                                 99.99                                 
  vm-200-state-openjdk            pve3 Vwi-a-tz--  <8.49g data                                 46.43                                 
  vm-200-state-working            pve3 Vwi-a-tz--  <8.49g data                                 35.50                                 
  vm-201-disk-0                   pve3 Vwi-a-tz--  50.00g data                                 100.00                                 
  vm-210-disk-1                   pve3 Vwi-aotz--  20.00g data                                 99.44                                 
  vm-302-disk-1                   pve3 Vwi-aotz--   8.00g data                                 99.02                                 
  vm-312-disk-1                   pve3 Vwi-aotz--  80.00g data snap_vm-312-disk-1_preMultiAttr 46.65                                 
  vm-312-state-preMultiAttr       pve3 Vwi-a-tz--  <4.49g data                                 23.56                                 
  vm-312-state-v3153              pve3 Vwi-a-tz--  <4.49g data                                 44.15                                 
  vm-315-disk-0                   pve3 Vwi-aotz--  10.00g data                                 92.27
 
and both containers use NFS (actively?)? it's possible that the freeze that happens as preparation for the snapshot and NFS mounts interact badly..
 
yes you are right NFS is used more or less on all machines and could be the source of the problem. so i've done some tests some time ago. with a backupjob for a single container during daytime to see it live when it get stuck. sadly without any useful information from logs.

then i realized something strange, this issue only appears at automatic backupjobs. it exists for years now and i never saw it happen when run manually. is there a difference between the 2 methods to explain it somehow?
 
no, the code is the same.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!