Multiple parallel 'qmrestore' executions fail

shantanu

Member
Mar 30, 2012
104
6
18
Related to:
https://forum.proxmox.com/threads/qmrestore-doesnt-reserve-vmid.32260/

The bug of not reserving the VMID is solved, though there is a new problem.

My PVE setup:
Code:
# pveversion -v
proxmox-ve: 5.3-1 (running kernel: 4.15.18-9-pve)
pve-manager: 5.3-5 (running version: 5.3-5/97ae681d)
pve-kernel-4.15: 5.2-12
pve-kernel-4.15.18-9-pve: 4.15.18-30
corosync: 2.4.4-pve1
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.1-3
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-43
libpve-guest-common-perl: 2.0-18
libpve-http-server-perl: 2.0-11
libpve-storage-perl: 5.0-33
libqb0: 1.0.3-1~bpo9
lvm2: 2.02.168-pve6
lxc-pve: 3.0.2+pve1-5
lxcfs: 3.0.2-2
novnc-pve: 1.0.0-2
proxmox-widget-toolkit: 1.0-22
pve-cluster: 5.0-31
pve-container: 2.0-31
pve-docs: 5.3-1
pve-edk2-firmware: 1.20181023-1
pve-firewall: 3.0-16
pve-firmware: 2.0-6
pve-ha-manager: 2.0-5
pve-i18n: 1.0-9
pve-libspice-server1: 0.14.1-1
pve-qemu-kvm: 2.12.1-1
pve-xtermjs: 1.0-5
qemu-server: 5.0-43
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.12-pve1~bpo1

When I run multiple qmrestores in parallel, there is a lock timeout and the second restore never succeeds.

Shell 1:
Code:
Shell 1:
# pvesh get /cluster/nextid
104

# qmrestore 1.vma.gz 104 --storage local --unique 1
restore vma archive: zcat /root/1.vma.gz | vma extract -v -r /var/tmp/vzdumptmp24155.fifo - /var/tmp/vzdumptmp24155
CFG: size: 435 name: qemu-server.conf
DEV: dev_id=1 size: 68719476736 devname: drive-scsi0
DEV: dev_id=2 size: 549755813888 devname: drive-virtio0
CTIME: Sun Dec  9 11:51:48 2018
Formatting '/var/lib/vz/images/104/vm-104-disk-0.qcow2', fmt=qcow2 size=68719476736 cluster_size=65536 preallocation=metadata lazy_refcounts=off refcount_bits=16
new volume ID is 'local:104/vm-104-disk-0.qcow2'
map 'drive-scsi0' to '/var/lib/vz/images/104/vm-104-disk-0.qcow2' (write zeros = 0)
...
While this is going on ... start a second shell ...

Shell 2:
Code:
# pvesh get /cluster/nextid
105

# qmrestore 2.vma.gz 105 --storage local --unique 1
restore vma archive: zcat /root/2.vma.gz | vma extract -v -r /var/tmp/vzdumptmp24209.fifo - /var/tmp/vzdumptmp24209
CFG: size: 435 name: qemu-server.conf
DEV: dev_id=1 size: 68719476736 devname: drive-scsi0
DEV: dev_id=2 size: 549755813888 devname: drive-virtio0
CTIME: Sun Dec  9 11:51:48 2018
trying to acquire lock...
no lock found trying to remove 'create'  lock
command 'set -o pipefail && zcat /root/2.vma.gz | vma extract -v -r /var/tmp/vzdumptmp24209.fifo - /var/tmp/vzdumptmp24209' failed: can't lock file '/var/lock/pve-manager/pve-storage-local' - got timeout
 

fabian

Proxmox Staff Member
Staff member
Jan 7, 2016
3,465
540
113
is the first restore still allocating the second disk when the second restore fails? allocating disks on a storage is protected by a mutex-like lock on that storage - if the disk is big and the storage needs to pre-allocate/zero the whole disk, it is possible to run into such a timeout. the same lock is also used for other operations that add or remove disks/volumes.
 

shantanu

Member
Mar 30, 2012
104
6
18
Yes, the first restore is still running, and this is expected.
The problem is that the second restore times out within a few seconds.

What is weird is that if I restore using the WebGUI, the second job correctly keeps spinning on "waiting for lock ..."
 

fabian

Proxmox Staff Member
Staff member
Jan 7, 2016
3,465
540
113
the lock timeout is 10 seconds. we have plans of adjusting these storage locks to use smaller granularity (i.e., per VMID) where possible, but in the meantime you either need to use a storage with faster allocation time, or space out your restores a bit more.

the lock handling is in the backend, whether you use qmrestore or the web interface should make no difference (except maybe affecting the timing of when the restores start ;))
 

shantanu

Member
Mar 30, 2012
104
6
18
I know the problem description seems unbelievable but its true ... "works via GUI, but not from commandline". :eek:

FWIW, I keep these servers updated every week or so, I'll check if the commandline problem is reproducible after a reboot (i.e. using the latest installed packages)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE and Proxmox Mail Gateway. We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!