Backup LXC container freeze

EricM

Member
Feb 13, 2016
6
0
21
48
Hello,


I have a 2 node production cluster with ceph, a third node for the corum with ceph monitor but no osd this node is use for test purpose.
I have ~20 lxc container and 5 vm on these 2 node.
from few month, sometime backup lxc freeze on a container, not on a particular node, not on a particular container.
backup job successfully done on few container and then freeze at step "creating archive"
ex:
INFO: Starting Backup of VM 126 (lxc)
INFO: status = running
INFO: CT Name: CT126
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: create storage snapshot 'vzdump'
/dev/rbd2
INFO: creating archive '/mnt/pve/Qnap2/dump/vzdump-lxc-126-2018_10_01-01_03_34.tar.gz'

tar is in D state so I can't kill it : ps aux :
root 2855445 0.0 0.0 24804 9284 ? D oct.01 0:21 tar cpf - --totals --one-file-system -p --sparse --numeric-owner --acls --xattrs --xattrs-include=user.* --xattrs-include=security.capability --warning=no-file-ignored --warning=no-xattr-write --one-file-system --warning=no-file-ignored --directory=/mnt/pve/Qnap2/dump/vzdump-lxc-126-2018_10_01-01_03_34.tmp ./etc/vzdump/pct.conf --directory=/mnt/vzsnap0 --no-anchored --exclude=lost+found --anchored --exclude=./tmp/?* --exclude=./var/tmp/?* --exclude=./var/run/?*.pid ./​

I can't stop backup process, I can't remove snapshot on this container and so next backup failed, the only solution is to reboot the node.
I don't find what tar is waiting...
/mnt/vzsnap0/ is empty and not mounted maybe it is the problem?
I try to mount manually but nothing happen :
mount /dev/rbd/pveceph1/vm-126-disk-1@vzdump /mnt/vzsnap0/

Backup is done on an NFS mount, I try to change NFS from hard to soft but I always have this issue.

Anyone knows how to find what tar is waiting? anyone has an idea on what's happen?

thanks for your helps :)
Eric

pveversion --verbose
proxmox-ve: 5.2-2 (running kernel: 4.15.18-4-pve)
pve-manager: 5.2-9 (running version: 5.2-9/4b30e8f9)
pve-kernel-4.15: 5.2-7
pve-kernel-4.15.18-4-pve: 4.15.18-23
pve-kernel-4.15.18-1-pve: 4.15.18-19
pve-kernel-4.15.17-3-pve: 4.15.17-14
ceph: 12.2.8-pve1
corosync: 2.4.2-pve5
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.0-8
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-38
libpve-guest-common-perl: 2.0-17
libpve-http-server-perl: 2.0-10
libpve-storage-perl: 5.0-28
libqb0: 1.0.1-1
lvm2: 2.02.168-pve6
lxc-pve: 3.0.2+pve1-2
lxcfs: 3.0.0-1
novnc-pve: 1.0.0-2
proxmox-widget-toolkit: 1.0-20
pve-cluster: 5.0-30
pve-container: 2.0-27
pve-docs: 5.2-8
pve-firewall: 3.0-14
pve-firmware: 2.0-5
pve-ha-manager: 2.0-5
pve-i18n: 1.0-6
pve-libspice-server1: 0.12.8-3
pve-qemu-kvm: 2.11.2-1
pve-xtermjs: 1.0-5
qemu-server: 5.0-34
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.9-pve1~bpo9
 
Seems to be a problem with the NFS server (/mnt/pve/Qnap2). But soft mount should help if you wait long enough.
 
Sorry to wake this old thread, but I'm having the same issue on many Proxmox 5.4-11 when they try to backup (snapshot) some of it's VM to some NFS storage. It stall there:
Task viewer: VM/CT 703 - Backup OutputStatus Stop INFO: starting new backup job: vzdump 703 --storage to_proxmox9sh --mode snapshot --remove 0 --mailto xyz@abc.ca --compress gzip --node proxmox7s INFO: Starting Backup of VM 703 (qemu) INFO: Backup started at 2020-10-12 16:03:54 INFO: status = running INFO: update VM 703: -lock backup INFO: VM Name: zzz.abc.ca INFO: include disk 'scsi0' 'local:703/vm-703-disk-1.qcow2' 180G INFO: backup mode: snapshot INFO: bandwidth limit: 204800 KB/s INFO: ionice priority: 7 INFO: creating archive '/mnt/pve/to_proxmox9sh/dump/vzdump-qemu-703-2020_10_12-16_03_54.vma.gz'

If I stop or suspend the VM, it work. If I stop and clone the VM and I start the clone, it work. All these VM are WHM/cPANEL, some really small, some bigs. Every time I need to "qm unlock 703" and RESET the VM for it to get back online.

Here are some logs:
root@proxmox7s:/var/log/vzdump# tail -f -n 50 qemu-703.log 2020-10-12 16:03:54 INFO: Starting Backup of VM 703 (qemu) 2020-10-12 16:03:54 INFO: status = running 2020-10-12 16:03:54 INFO: update VM 703: -lock backup 2020-10-12 16:03:54 INFO: VM Name: zzz.abc.ca 2020-10-12 16:03:54 INFO: include disk 'scsi0' 'local:703/vm-703-disk-1.qcow2' 180G 2020-10-12 16:03:54 INFO: backup mode: snapshot 2020-10-12 16:03:54 INFO: bandwidth limit: 204800 KB/s 2020-10-12 16:03:54 INFO: ionice priority: 7 2020-10-12 16:03:54 INFO: creating archive '/mnt/pve/to_proxmox9sh/dump/vzdump-qemu-703-2020_10_12-16_03_54.vma.gz'

Around 16h15 I manually stopped the backup, unlock the VM and restarted it, here some other logs:
Code:
root@proxmox7s:/var/log# tail -n 20 -f messages
Oct 12 16:00:06 proxmox7s kernel: [15802091.556951] vmbr0: port 5(fwpr799p0) entered disabled state
Oct 12 16:00:06 proxmox7s kernel: [15802091.557173] device fwln799i0 left promiscuous mode
Oct 12 16:00:06 proxmox7s kernel: [15802091.557269] fwbr799i0: port 1(fwln799i0) entered disabled state
Oct 12 16:00:06 proxmox7s kernel: [15802091.577969] device fwpr799p0 left promiscuous mode
Oct 12 16:00:06 proxmox7s kernel: [15802091.578061] vmbr0: port 5(fwpr799p0) entered disabled state
Oct 12 16:00:06 proxmox7s pvedaemon[6030]: <root@pam> end task UPID:proxmox7s:0000353A:5E2FAAFC:5F84B4C7:vncproxy:799:root@pam: OK
Oct 12 16:00:06 proxmox7s pvedaemon[6030]: <root@pam> starting task UPID:proxmox7s:000048F8:5E300E89:5F84B5C6:vncproxy:799:root@pam:
Oct 12 16:00:07 proxmox7s pvedaemon[6030]: <root@pam> end task UPID:proxmox7s:00004168:5E300C7C:5F84B5C1:qmshutdown:799:root@pam: OK
Oct 12 16:01:18 proxmox7s pvedaemon[6244]: <root@pam> successful auth for user 'root@pam'
Oct 12 16:01:33 proxmox7s pvedaemon[30862]: <root@pam> starting task UPID:proxmox7s:000024FC:5E303058:5F84B61D:vzdump:703:root@pam:
Oct 12 16:01:33 proxmox7s qm[9472]: <root@pam> update VM 703: -lock backup
Oct 12 16:01:34 proxmox7s qm[9476]: <root@pam> starting task UPID:proxmox7s:000027B1:5E3030B5:5F84B61E:qmpause:703:root@pam:
Oct 12 16:01:34 proxmox7s qm[9476]: <root@pam> end task UPID:proxmox7s:000027B1:5E3030B5:5F84B61E:qmpause:703:root@pam: OK
Oct 12 16:02:26 proxmox7s qm[22022]: <root@pam> starting task UPID:proxmox7s:00005607:5E304525:5F84B652:qmresume:703:root@pam:
Oct 12 16:02:26 proxmox7s qm[22022]: <root@pam> end task UPID:proxmox7s:00005607:5E304525:5F84B652:qmresume:703:root@pam: OK
Oct 12 16:03:54 proxmox7s pvedaemon[30862]: <root@pam> starting task UPID:proxmox7s:000026AC:5E306770:5F84B6AA:vzdump:703:root@pam:
Oct 12 16:03:54 proxmox7s qm[9904]: <root@pam> update VM 703: -lock backup
Oct 12 16:15:00 proxmox7s pvedaemon[25505]: <root@pam> starting task UPID:proxmox7s:00000BF2:5E316BA0:5F84B944:qmreset:703:root@pam:
Oct 12 16:15:00 proxmox7s pvedaemon[25505]: <root@pam> end task UPID:proxmox7s:00000BF2:5E316BA0:5F84B944:qmreset:703:root@pam: OK

Code:
root@proxmox7s:/var/log# tail -n 20 -f kern.log
Oct 12 16:00:06 proxmox7s kernel: [15802091.577969] device fwpr799p0 left promiscuous mode
Oct 12 16:00:06 proxmox7s kernel: [15802091.578061] vmbr0: port 5(fwpr799p0) entered disabled state
Oct 12 16:00:06 proxmox7s pvedaemon[6030]: <root@pam> end task UPID:proxmox7s:0000353A:5E2FAAFC:5F84B4C7:vncproxy:799:root@pam: OK
Oct 12 16:00:06 proxmox7s pvedaemon[6030]: <root@pam> starting task UPID:proxmox7s:000048F8:5E300E89:5F84B5C6:vncproxy:799:root@pam:
Oct 12 16:00:07 proxmox7s pvedaemon[6030]: <root@pam> end task UPID:proxmox7s:00004168:5E300C7C:5F84B5C1:qmshutdown:799:root@pam: OK
Oct 12 16:00:07 proxmox7s pvedaemon[6030]: <root@pam> end task UPID:proxmox7s:000048F8:5E300E89:5F84B5C6:vncproxy:799:root@pam: Failed to run vncproxy.
Oct 12 16:01:18 proxmox7s pvedaemon[6244]: <root@pam> successful auth for user 'root@pam'
Oct 12 16:01:33 proxmox7s pvedaemon[30862]: <root@pam> starting task UPID:proxmox7s:000024FC:5E303058:5F84B61D:vzdump:703:root@pam:
Oct 12 16:01:33 proxmox7s qm[9472]: <root@pam> update VM 703: -lock backup
Oct 12 16:01:34 proxmox7s qm[9476]: <root@pam> starting task UPID:proxmox7s:000027B1:5E3030B5:5F84B61E:qmpause:703:root@pam:
Oct 12 16:01:34 proxmox7s qm[9476]: <root@pam> end task UPID:proxmox7s:000027B1:5E3030B5:5F84B61E:qmpause:703:root@pam: OK
Oct 12 16:02:26 proxmox7s qm[22022]: <root@pam> starting task UPID:proxmox7s:00005607:5E304525:5F84B652:qmresume:703:root@pam:
Oct 12 16:02:26 proxmox7s qm[22022]: <root@pam> end task UPID:proxmox7s:00005607:5E304525:5F84B652:qmresume:703:root@pam: OK
Oct 12 16:02:26 proxmox7s pvedaemon[30862]: <root@pam> end task UPID:proxmox7s:000024FC:5E303058:5F84B61D:vzdump:703:root@pam: interrupted by signal
Oct 12 16:03:54 proxmox7s pvedaemon[30862]: <root@pam> starting task UPID:proxmox7s:000026AC:5E306770:5F84B6AA:vzdump:703:root@pam:
Oct 12 16:03:54 proxmox7s qm[9904]: <root@pam> update VM 703: -lock backup
Oct 12 16:14:20 proxmox7s pvedaemon[30862]: <root@pam> end task UPID:proxmox7s:000026AC:5E306770:5F84B6AA:vzdump:703:root@pam: unexpected status
Oct 12 16:15:00 proxmox7s pvedaemon[25505]: <root@pam> starting task UPID:proxmox7s:00000BF2:5E316BA0:5F84B944:qmreset:703:root@pam:
Oct 12 16:15:00 proxmox7s pvedaemon[25505]: <root@pam> end task UPID:proxmox7s:00000BF2:5E316BA0:5F84B944:qmreset:703:root@pam: OK

Need anything else to help me? Thanks a lot.