Backup LXC container freeze

EricM

Member
Feb 13, 2016
6
0
21
47
Hello,


I have a 2 node production cluster with ceph, a third node for the corum with ceph monitor but no osd this node is use for test purpose.
I have ~20 lxc container and 5 vm on these 2 node.
from few month, sometime backup lxc freeze on a container, not on a particular node, not on a particular container.
backup job successfully done on few container and then freeze at step "creating archive"
ex:
INFO: Starting Backup of VM 126 (lxc)
INFO: status = running
INFO: CT Name: CT126
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: create storage snapshot 'vzdump'
/dev/rbd2
INFO: creating archive '/mnt/pve/Qnap2/dump/vzdump-lxc-126-2018_10_01-01_03_34.tar.gz'

tar is in D state so I can't kill it : ps aux :
root 2855445 0.0 0.0 24804 9284 ? D oct.01 0:21 tar cpf - --totals --one-file-system -p --sparse --numeric-owner --acls --xattrs --xattrs-include=user.* --xattrs-include=security.capability --warning=no-file-ignored --warning=no-xattr-write --one-file-system --warning=no-file-ignored --directory=/mnt/pve/Qnap2/dump/vzdump-lxc-126-2018_10_01-01_03_34.tmp ./etc/vzdump/pct.conf --directory=/mnt/vzsnap0 --no-anchored --exclude=lost+found --anchored --exclude=./tmp/?* --exclude=./var/tmp/?* --exclude=./var/run/?*.pid ./​

I can't stop backup process, I can't remove snapshot on this container and so next backup failed, the only solution is to reboot the node.
I don't find what tar is waiting...
/mnt/vzsnap0/ is empty and not mounted maybe it is the problem?
I try to mount manually but nothing happen :
mount /dev/rbd/pveceph1/vm-126-disk-1@vzdump /mnt/vzsnap0/

Backup is done on an NFS mount, I try to change NFS from hard to soft but I always have this issue.

Anyone knows how to find what tar is waiting? anyone has an idea on what's happen?

thanks for your helps :)
Eric

pveversion --verbose
proxmox-ve: 5.2-2 (running kernel: 4.15.18-4-pve)
pve-manager: 5.2-9 (running version: 5.2-9/4b30e8f9)
pve-kernel-4.15: 5.2-7
pve-kernel-4.15.18-4-pve: 4.15.18-23
pve-kernel-4.15.18-1-pve: 4.15.18-19
pve-kernel-4.15.17-3-pve: 4.15.17-14
ceph: 12.2.8-pve1
corosync: 2.4.2-pve5
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.0-8
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-38
libpve-guest-common-perl: 2.0-17
libpve-http-server-perl: 2.0-10
libpve-storage-perl: 5.0-28
libqb0: 1.0.1-1
lvm2: 2.02.168-pve6
lxc-pve: 3.0.2+pve1-2
lxcfs: 3.0.0-1
novnc-pve: 1.0.0-2
proxmox-widget-toolkit: 1.0-20
pve-cluster: 5.0-30
pve-container: 2.0-27
pve-docs: 5.2-8
pve-firewall: 3.0-14
pve-firmware: 2.0-5
pve-ha-manager: 2.0-5
pve-i18n: 1.0-6
pve-libspice-server1: 0.12.8-3
pve-qemu-kvm: 2.11.2-1
pve-xtermjs: 1.0-5
qemu-server: 5.0-34
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.9-pve1~bpo9
 
Seems to be a problem with the NFS server (/mnt/pve/Qnap2). But soft mount should help if you wait long enough.
 
Sorry to wake this old thread, but I'm having the same issue on many Proxmox 5.4-11 when they try to backup (snapshot) some of it's VM to some NFS storage. It stall there:
Task viewer: VM/CT 703 - Backup OutputStatus Stop INFO: starting new backup job: vzdump 703 --storage to_proxmox9sh --mode snapshot --remove 0 --mailto xyz@abc.ca --compress gzip --node proxmox7s INFO: Starting Backup of VM 703 (qemu) INFO: Backup started at 2020-10-12 16:03:54 INFO: status = running INFO: update VM 703: -lock backup INFO: VM Name: zzz.abc.ca INFO: include disk 'scsi0' 'local:703/vm-703-disk-1.qcow2' 180G INFO: backup mode: snapshot INFO: bandwidth limit: 204800 KB/s INFO: ionice priority: 7 INFO: creating archive '/mnt/pve/to_proxmox9sh/dump/vzdump-qemu-703-2020_10_12-16_03_54.vma.gz'

If I stop or suspend the VM, it work. If I stop and clone the VM and I start the clone, it work. All these VM are WHM/cPANEL, some really small, some bigs. Every time I need to "qm unlock 703" and RESET the VM for it to get back online.

Here are some logs:
root@proxmox7s:/var/log/vzdump# tail -f -n 50 qemu-703.log 2020-10-12 16:03:54 INFO: Starting Backup of VM 703 (qemu) 2020-10-12 16:03:54 INFO: status = running 2020-10-12 16:03:54 INFO: update VM 703: -lock backup 2020-10-12 16:03:54 INFO: VM Name: zzz.abc.ca 2020-10-12 16:03:54 INFO: include disk 'scsi0' 'local:703/vm-703-disk-1.qcow2' 180G 2020-10-12 16:03:54 INFO: backup mode: snapshot 2020-10-12 16:03:54 INFO: bandwidth limit: 204800 KB/s 2020-10-12 16:03:54 INFO: ionice priority: 7 2020-10-12 16:03:54 INFO: creating archive '/mnt/pve/to_proxmox9sh/dump/vzdump-qemu-703-2020_10_12-16_03_54.vma.gz'

Around 16h15 I manually stopped the backup, unlock the VM and restarted it, here some other logs:
Code:
root@proxmox7s:/var/log# tail -n 20 -f messages
Oct 12 16:00:06 proxmox7s kernel: [15802091.556951] vmbr0: port 5(fwpr799p0) entered disabled state
Oct 12 16:00:06 proxmox7s kernel: [15802091.557173] device fwln799i0 left promiscuous mode
Oct 12 16:00:06 proxmox7s kernel: [15802091.557269] fwbr799i0: port 1(fwln799i0) entered disabled state
Oct 12 16:00:06 proxmox7s kernel: [15802091.577969] device fwpr799p0 left promiscuous mode
Oct 12 16:00:06 proxmox7s kernel: [15802091.578061] vmbr0: port 5(fwpr799p0) entered disabled state
Oct 12 16:00:06 proxmox7s pvedaemon[6030]: <root@pam> end task UPID:proxmox7s:0000353A:5E2FAAFC:5F84B4C7:vncproxy:799:root@pam: OK
Oct 12 16:00:06 proxmox7s pvedaemon[6030]: <root@pam> starting task UPID:proxmox7s:000048F8:5E300E89:5F84B5C6:vncproxy:799:root@pam:
Oct 12 16:00:07 proxmox7s pvedaemon[6030]: <root@pam> end task UPID:proxmox7s:00004168:5E300C7C:5F84B5C1:qmshutdown:799:root@pam: OK
Oct 12 16:01:18 proxmox7s pvedaemon[6244]: <root@pam> successful auth for user 'root@pam'
Oct 12 16:01:33 proxmox7s pvedaemon[30862]: <root@pam> starting task UPID:proxmox7s:000024FC:5E303058:5F84B61D:vzdump:703:root@pam:
Oct 12 16:01:33 proxmox7s qm[9472]: <root@pam> update VM 703: -lock backup
Oct 12 16:01:34 proxmox7s qm[9476]: <root@pam> starting task UPID:proxmox7s:000027B1:5E3030B5:5F84B61E:qmpause:703:root@pam:
Oct 12 16:01:34 proxmox7s qm[9476]: <root@pam> end task UPID:proxmox7s:000027B1:5E3030B5:5F84B61E:qmpause:703:root@pam: OK
Oct 12 16:02:26 proxmox7s qm[22022]: <root@pam> starting task UPID:proxmox7s:00005607:5E304525:5F84B652:qmresume:703:root@pam:
Oct 12 16:02:26 proxmox7s qm[22022]: <root@pam> end task UPID:proxmox7s:00005607:5E304525:5F84B652:qmresume:703:root@pam: OK
Oct 12 16:03:54 proxmox7s pvedaemon[30862]: <root@pam> starting task UPID:proxmox7s:000026AC:5E306770:5F84B6AA:vzdump:703:root@pam:
Oct 12 16:03:54 proxmox7s qm[9904]: <root@pam> update VM 703: -lock backup
Oct 12 16:15:00 proxmox7s pvedaemon[25505]: <root@pam> starting task UPID:proxmox7s:00000BF2:5E316BA0:5F84B944:qmreset:703:root@pam:
Oct 12 16:15:00 proxmox7s pvedaemon[25505]: <root@pam> end task UPID:proxmox7s:00000BF2:5E316BA0:5F84B944:qmreset:703:root@pam: OK

Code:
root@proxmox7s:/var/log# tail -n 20 -f kern.log
Oct 12 16:00:06 proxmox7s kernel: [15802091.577969] device fwpr799p0 left promiscuous mode
Oct 12 16:00:06 proxmox7s kernel: [15802091.578061] vmbr0: port 5(fwpr799p0) entered disabled state
Oct 12 16:00:06 proxmox7s pvedaemon[6030]: <root@pam> end task UPID:proxmox7s:0000353A:5E2FAAFC:5F84B4C7:vncproxy:799:root@pam: OK
Oct 12 16:00:06 proxmox7s pvedaemon[6030]: <root@pam> starting task UPID:proxmox7s:000048F8:5E300E89:5F84B5C6:vncproxy:799:root@pam:
Oct 12 16:00:07 proxmox7s pvedaemon[6030]: <root@pam> end task UPID:proxmox7s:00004168:5E300C7C:5F84B5C1:qmshutdown:799:root@pam: OK
Oct 12 16:00:07 proxmox7s pvedaemon[6030]: <root@pam> end task UPID:proxmox7s:000048F8:5E300E89:5F84B5C6:vncproxy:799:root@pam: Failed to run vncproxy.
Oct 12 16:01:18 proxmox7s pvedaemon[6244]: <root@pam> successful auth for user 'root@pam'
Oct 12 16:01:33 proxmox7s pvedaemon[30862]: <root@pam> starting task UPID:proxmox7s:000024FC:5E303058:5F84B61D:vzdump:703:root@pam:
Oct 12 16:01:33 proxmox7s qm[9472]: <root@pam> update VM 703: -lock backup
Oct 12 16:01:34 proxmox7s qm[9476]: <root@pam> starting task UPID:proxmox7s:000027B1:5E3030B5:5F84B61E:qmpause:703:root@pam:
Oct 12 16:01:34 proxmox7s qm[9476]: <root@pam> end task UPID:proxmox7s:000027B1:5E3030B5:5F84B61E:qmpause:703:root@pam: OK
Oct 12 16:02:26 proxmox7s qm[22022]: <root@pam> starting task UPID:proxmox7s:00005607:5E304525:5F84B652:qmresume:703:root@pam:
Oct 12 16:02:26 proxmox7s qm[22022]: <root@pam> end task UPID:proxmox7s:00005607:5E304525:5F84B652:qmresume:703:root@pam: OK
Oct 12 16:02:26 proxmox7s pvedaemon[30862]: <root@pam> end task UPID:proxmox7s:000024FC:5E303058:5F84B61D:vzdump:703:root@pam: interrupted by signal
Oct 12 16:03:54 proxmox7s pvedaemon[30862]: <root@pam> starting task UPID:proxmox7s:000026AC:5E306770:5F84B6AA:vzdump:703:root@pam:
Oct 12 16:03:54 proxmox7s qm[9904]: <root@pam> update VM 703: -lock backup
Oct 12 16:14:20 proxmox7s pvedaemon[30862]: <root@pam> end task UPID:proxmox7s:000026AC:5E306770:5F84B6AA:vzdump:703:root@pam: unexpected status
Oct 12 16:15:00 proxmox7s pvedaemon[25505]: <root@pam> starting task UPID:proxmox7s:00000BF2:5E316BA0:5F84B944:qmreset:703:root@pam:
Oct 12 16:15:00 proxmox7s pvedaemon[25505]: <root@pam> end task UPID:proxmox7s:00000BF2:5E316BA0:5F84B944:qmreset:703:root@pam: OK

Need anything else to help me? Thanks a lot.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!