VZDump locks VM

kenneth_vkd

Well-Known Member
Sep 13, 2017
37
3
48
31
Hi
I am experiencing a strange issue with 2 VMs on our 4 node proxmox cluster.
The 2 VMs are on the same host.

When either scheduled or manual VZDump starts, it locks the VM completely and nothing more happens, Only way to get past the problem is to stop the vzdump job and do a unlock and reset from the shell. It seems as if the OS of the VM locks up.
Here is the output from the job:
Code:
()
INFO: starting new backup job: vzdump 103 --compress gzip --storage backups-nfs --remove 0 --mailto ***@***.dk --mode snapshot --node ***
INFO: Starting Backup of VM 103 (qemu)
INFO: status = running
INFO: update VM 103: -lock backup
INFO: VM Name: vm103
INFO: include disk 'scsi0' 'local-zfs:vm-103-disk-1' 80G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating archive '/mnt/pve/backups-nfs/dump/vzdump-qemu-103-2019_01_12-04_14_12.vma.gz'

I have removed sensitive information such as email addresses and hostnames, but otherwise the output is untouched.

When the job reaches the state where it should create the dumpfile it stays there for about 5-10 seconds before the VM locks up and our monitoring system starts reporting errors.

For the scheduled backups, this problem prevents the rest of the VMs on the node from getting backed up.

The error only comes for these two specific VMs on this specific node. So if I exclude these from the scheduled job, everything is fine. Even backup of larger VMs is running fine. There are also no snapshots on these VMs.

All VMs are being backed up to the same locationm, which is named backup-nfs in the job above.

We have also tried to reboot the node, but still no difference
Code:
()
proxmox-ve: 5.3-1 (running kernel: 4.15.18-9-pve)
pve-manager: 5.3-5 (running version: 5.3-5/97ae681d)
pve-kernel-4.15: 5.2-12
pve-kernel-4.15.18-9-pve: 4.15.18-30
pve-kernel-4.15.18-8-pve: 4.15.18-28
pve-kernel-4.15.18-5-pve: 4.15.18-24
pve-kernel-4.15.18-4-pve: 4.15.18-23
pve-kernel-4.15.18-3-pve: 4.15.18-22
pve-kernel-4.15.17-3-pve: 4.15.17-14
corosync: 2.4.4-pve1
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.1-3
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-43
libpve-guest-common-perl: 2.0-18
libpve-http-server-perl: 2.0-11
libpve-storage-perl: 5.0-33
libqb0: 1.0.3-1~bpo9
lvm2: 2.02.168-pve6
lxc-pve: 3.0.2+pve1-5
lxcfs: 3.0.2-2
novnc-pve: 1.0.0-2
proxmox-widget-toolkit: 1.0-22
pve-cluster: 5.0-31
pve-container: 2.0-31
pve-docs: 5.3-1
pve-edk2-firmware: 1.20181023-1
pve-firewall: 3.0-16
pve-firmware: 2.0-6
pve-ha-manager: 2.0-5
pve-i18n: 1.0-9
pve-libspice-server1: 0.14.1-1
pve-qemu-kvm: 2.12.1-1
pve-xtermjs: 1.0-5
qemu-server: 5.0-43
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.12-pve1~bpo1

VMs are running on a ZFS RAID1 root on Intel NVMe drives.

I am currently migrating one of these VMs to another node to see if the problem goes away. Has anyone previously seen a similar issue the what I have explained above?
 
I have done migration to a different host in the cluster and still the same result.
I have quite a few similar sized VMs on both hosts and when I start the backup of those, then it stops at the "creating archive" step for at most a few seconds and then it continues
 
As mentioned previously, the VM locks completely and no messages show at all. If we try to log in using the console, we can type the username and after that there is just a blinking cursor/underscore and nothing happens.
All services running on the server can no longer be accessed once the VM locks up.
 
What OS is on those two VMs vs others? There must be something what make VM system crash/spinlock. What about remote syslog/netdata etc?
 
VMs are running CentOS 7 with cPanel on top. But we have 3 other servers with the exact same configuration and they do not have the issue.

We do not have remote syslog configured, but if it could help in the given situation, I might be able to configure it.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!