[SOLVED] Error at backup, and having to reboot node

fxandrei

Renowned Member
Jan 10, 2013
156
15
83
So i have this weird issue with backups on my cluster. The cluster has ceph.
So the thing is that sometimes a vm gets stuck while doing the backup.
I just see this in the logs : ERROR: got timeout

Now after searching all over the web some suggested that its probably something related to the storage, and something gets stuck\locked...
Anyway, if i reboot the node that has these problems (with the vms), they work.

The think is i dont know where to actually look.
I mean i looked in all the ceph logs, proxmox and debian logs ....

Anyone else had similar issues ?
 
Please post the output of pveversion -v, the VM config (qm config <vmid>) and the complete log of the backup task.
 
proxmox-ve: 6.0-2 (running kernel: 5.0.18-1-pve)
pve-manager: 6.0-6 (running version: 6.0-6/c71f879f)
pve-kernel-5.0: 6.0-6
pve-kernel-helper: 6.0-6
pve-kernel-4.15: 5.4-8
pve-kernel-5.0.18-1-pve: 5.0.18-3
pve-kernel-4.15.18-20-pve: 4.15.18-46
pve-kernel-4.15.18-12-pve: 4.15.18-36
ceph: 14.2.2-pve1
ceph-fuse: 14.2.2-pve1
corosync: 3.0.2-pve2
criu: 3.11-3
glusterfs-client: 5.5-3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.10-pve2
libpve-access-control: 6.0-2
libpve-apiclient-perl: 3.0-2
libpve-common-perl: 6.0-4
libpve-guest-common-perl: 3.0-1
libpve-http-server-perl: 3.0-2
libpve-storage-perl: 6.0-7
libqb0: 1.0.5-1
lvm2: 2.03.02-pve3
lxc-pve: 3.1.0-64
lxcfs: 3.0.3-pve60
novnc-pve: 1.0.0-60
openvswitch-switch: 2.10.0+2018.08.28+git.8ca7c82b7d+ds1-12
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.0-7
pve-cluster: 6.0-5
pve-container: 3.0-5
pve-docs: 6.0-4
pve-edk2-firmware: 2.20190614-1
pve-firewall: 4.0-7
pve-firmware: 3.0-2
pve-ha-manager: 3.0-2
pve-i18n: 2.0-2
pve-qemu-kvm: 4.0.0-5
pve-xtermjs: 3.13.2-1
qemu-server: 6.0-7
smartmontools: 7.0-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.1-pve1

INFO: trying to get global lock - waiting...
ERROR: can't acquire lock '/var/run/vzdump.lock' - got timeout
TASK ERROR: got unexpected control message:

INFO: starting new backup job: vzdump 102 104 105 106 117 205 314 115 350 613 616 607 181 500 --storage backup-fs --mailto --mode snapshot --compress lzo --quiet 1 --mailnotification failure
INFO: skip external VMs: 102, 105, 106, 117, 205, 314, 115, 181, 500
INFO: Starting Backup of VM 104 (qemu)
INFO: Backup started at 2019-10-28 22:00:02
INFO: status = running
INFO: update VM 104: -lock backup
INFO: VM Name: test
INFO: include disk 'scsi0' 'ssd-pool:vm-104-disk-0' 40G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating archive '/mnt/pve/backup-fs/dump/vzdump-qemu-104-2019_10_28-22_00_02.vma.lzo'
ERROR: got timeout
INFO: aborting backup job
ERROR: interrupted by signal
ERROR: Backup of VM 104 failed - got timeout
INFO: Failed at 2019-10-28 22:06:28
INFO: Starting Backup of VM 350 (qemu)
INFO: Backup started at 2019-10-28 22:06:28
INFO: status = running
INFO: update VM 350: -lock backup
INFO: VM Name: sql
INFO: include disk 'virtio0' 'ssd-pool:vm-350-disk-0' 100G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating archive '/mnt/pve/backup-fs/dump/vzdump-qemu-350-2019_10_28-22_06_28.vma.lzo'
TASK ERROR: got unexpected control message:


So when the backup fails i cannot acces the console of those vms. I click console, it waits 2 seconds, then the window appears, and it just says "Connecting" .... and it eventually times out.
So that probably means a process got stuck and has exclusive acces to some file needed.
 
Please update to the latest version. It contains lots of fixes, especially for an issue that sounds just like yours.
 
  • Like
Reactions: fxandrei
Can you give some details to the fixes you are reffering to ? Or at least what the package would be ?

EDIT: by the way, i dont have a subscription. i can only use pve-no-subscription
 
Last edited:
So i went and updated all the packages on all three nodes.
I guess ill wait a week or two to see if i still have that problem, and ill come back.
 
There were fixes to the qmp monitor in the pve-qemu-kvm package. For some users started hanging during backup.
 
So its not a week after, but i have been doing the backups every day, with different configs, and no error. No timeout, no lock, no nothing.
I hope i dont speak to soon but it seems that was it.
Updating the packages fixed it .

Thanks.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!