Multiple parallel 'qmrestore' executions fail

shantanu · Dec 14, 2018

Related to:
https://forum.proxmox.com/threads/qmrestore-doesnt-reserve-vmid.32260/

The bug of not reserving the VMID is solved, though there is a new problem.

My PVE setup:

Code:

# pveversion -v
proxmox-ve: 5.3-1 (running kernel: 4.15.18-9-pve)
pve-manager: 5.3-5 (running version: 5.3-5/97ae681d)
pve-kernel-4.15: 5.2-12
pve-kernel-4.15.18-9-pve: 4.15.18-30
corosync: 2.4.4-pve1
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.1-3
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-43
libpve-guest-common-perl: 2.0-18
libpve-http-server-perl: 2.0-11
libpve-storage-perl: 5.0-33
libqb0: 1.0.3-1~bpo9
lvm2: 2.02.168-pve6
lxc-pve: 3.0.2+pve1-5
lxcfs: 3.0.2-2
novnc-pve: 1.0.0-2
proxmox-widget-toolkit: 1.0-22
pve-cluster: 5.0-31
pve-container: 2.0-31
pve-docs: 5.3-1
pve-edk2-firmware: 1.20181023-1
pve-firewall: 3.0-16
pve-firmware: 2.0-6
pve-ha-manager: 2.0-5
pve-i18n: 1.0-9
pve-libspice-server1: 0.14.1-1
pve-qemu-kvm: 2.12.1-1
pve-xtermjs: 1.0-5
qemu-server: 5.0-43
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.12-pve1~bpo1

When I run multiple qmrestores in parallel, there is a lock timeout and the second restore never succeeds.

Shell 1:

Code:

Shell 1:
# pvesh get /cluster/nextid
104

# qmrestore 1.vma.gz 104 --storage local --unique 1
restore vma archive: zcat /root/1.vma.gz | vma extract -v -r /var/tmp/vzdumptmp24155.fifo - /var/tmp/vzdumptmp24155
CFG: size: 435 name: qemu-server.conf
DEV: dev_id=1 size: 68719476736 devname: drive-scsi0
DEV: dev_id=2 size: 549755813888 devname: drive-virtio0
CTIME: Sun Dec  9 11:51:48 2018
Formatting '/var/lib/vz/images/104/vm-104-disk-0.qcow2', fmt=qcow2 size=68719476736 cluster_size=65536 preallocation=metadata lazy_refcounts=off refcount_bits=16
new volume ID is 'local:104/vm-104-disk-0.qcow2'
map 'drive-scsi0' to '/var/lib/vz/images/104/vm-104-disk-0.qcow2' (write zeros = 0)
...

While this is going on ... start a second shell ...

Shell 2:

Code:

# pvesh get /cluster/nextid
105

# qmrestore 2.vma.gz 105 --storage local --unique 1
restore vma archive: zcat /root/2.vma.gz | vma extract -v -r /var/tmp/vzdumptmp24209.fifo - /var/tmp/vzdumptmp24209
CFG: size: 435 name: qemu-server.conf
DEV: dev_id=1 size: 68719476736 devname: drive-scsi0
DEV: dev_id=2 size: 549755813888 devname: drive-virtio0
CTIME: Sun Dec  9 11:51:48 2018
trying to acquire lock...
no lock found trying to remove 'create'  lock
command 'set -o pipefail && zcat /root/2.vma.gz | vma extract -v -r /var/tmp/vzdumptmp24209.fifo - /var/tmp/vzdumptmp24209' failed: can't lock file '/var/lock/pve-manager/pve-storage-local' - got timeout

shantanu · Dec 17, 2018

<bump>

fabian · Dec 17, 2018

is the first restore still allocating the second disk when the second restore fails? allocating disks on a storage is protected by a mutex-like lock on that storage - if the disk is big and the storage needs to pre-allocate/zero the whole disk, it is possible to run into such a timeout. the same lock is also used for other operations that add or remove disks/volumes.

shantanu · Dec 19, 2018

Yes, the first restore is still running, and this is expected.
The problem is that the second restore times out within a few seconds.

What is weird is that if I restore using the WebGUI, the second job correctly keeps spinning on "waiting for lock ..."

fabian · Dec 19, 2018

the lock timeout is 10 seconds. we have plans of adjusting these storage locks to use smaller granularity (i.e., per VMID) where possible, but in the meantime you either need to use a storage with faster allocation time, or space out your restores a bit more.

the lock handling is in the backend, whether you use qmrestore or the web interface should make no difference (except maybe affecting the timing of when the restores start

)

shantanu · Dec 21, 2018

I know the problem description seems unbelievable but its true ... "works via GUI, but not from commandline".

FWIW, I keep these servers updated every week or so, I'll check if the commandline problem is reproducible after a reboot (i.e. using the latest installed packages)

gentoo9ball · Mar 16, 2020

Bump, any fix for this?

fabian · Mar 17, 2020

per VMID locking is not yet implemented, no.

Search

Search

Multiple parallel 'qmrestore' executions fail

shantanu

Renowned Member

shantanu

Renowned Member

fabian

Proxmox Staff Member

shantanu

Renowned Member

fabian

Proxmox Staff Member

shantanu

Renowned Member

gentoo9ball

New Member

fabian

Proxmox Staff Member

We value your privacy