Issues while cloning VMs simultaneously

azhv · Jun 8, 2021

Hello, I encountered an issue while trying to clone two (or more) VMs simultaneously between two nodes. Both nodes have NFS storage.
When I start cloning the VMs the first starts and the second one fails with: TASK ERROR: clone failed: error during cfs-locked 'nfs2' operation: got lock timeout - aborting command.
I've done the same test with smaller VMs (effective size like 3gb) and both were transferred. I've also tried this with templates - if its a but bigger (15-20gb if fails with timeout).
I was wondering if there are ways to avoid this error and maybe queue the requests?
Any ideas/workarounds are more than welcome.
Thank you.

proxmox-ve: 6.2-1 (running kernel: 5.4.34-1-pve)
pve-manager: 6.2-4 (running version: 6.2-4/9824574a)
pve-kernel-5.4: 6.2-1
pve-kernel-helper: 6.2-1
pve-kernel-5.4.34-1-pve: 5.4.34-2
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.3-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.15-pve1
libproxmox-acme-perl: 1.0.3
libpve-access-control: 6.1-1
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.1-2
libpve-guest-common-perl: 3.0-10
libpve-http-server-perl: 3.0-5
libpve-storage-perl: 6.1-7
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.2-1
lxcfs: 4.0.3-pve2
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.2-1
pve-cluster: 6.1-8
pve-container: 3.1-5
pve-docs: 6.2-4
pve-edk2-firmware: 2.20200229-1
pve-firewall: 4.1-2
pve-firmware: 3.1-1
pve-ha-manager: 3.0-9
pve-i18n: 2.1-2
pve-qemu-kvm: 5.0.0-2
pve-xtermjs: 4.3.0-1
qemu-server: 6.2-2
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.3-pve1

Dominic · Jun 24, 2021

I was wondering if there are ways to avoid this error and maybe queue the requests?

For the moment I'd use the command line so that 2 (or more) qm migrate commands get executed one after the other.

Could you please try to replicate this with command line and post the exact command? And also your /etc/pve/storage.cfg Because so far I have not been able to reproduce the problem.

fabian · Jun 24, 2021

your NFS storage is probably overloaded by the first clone, and can't handle allocating the image file for the second one before hitting a (60s) timeout..

azhv · Jun 28, 2021

Hello,
And thanks for the replies.
@fabian - Is there any way to avoid this? For example increasing the timeout or tune the NFS? I tried to fiddle with the NFS settings but still hit the issue. If the 60s timer is changed it will affect only cloning or all operations?
@Dominic - actually I hit the problem when cloning VMs simultaneously. If I wait for the first operation to finish and start the other its fine.

dir: local
path /var/lib/vz
content snippets,images,rootdir,iso,vztmpl,backup
maxfiles 1
shared 0

nfs: templates
export /media/templates
path /mnt/pve/templates
server 10.0.10.40
content iso
maxfiles 0

nfs: datastore-auto1
export /media/datastore-auto1
path /mnt/pve/datastore-auto1
server 10.0.10.30
content iso,images

nfs: datastore-auto2
export /media/datastore-auto2
path /mnt/pve/datastore-auto2
server 10.0.10.32
content images,iso
maxfiles 0

Thank you!

fabian · Jun 28, 2021

no, the timeout is not configurable (it's for a cluster-wide shared lock).

Tommi2Day · Sep 1, 2021

for terraform operations it was helpfull for me to limit parallelism to prevent this error.

terraform apply -parallelism=2

Search

Search

Issues while cloning VMs simultaneously

azhv

New Member

Dominic

Proxmox Retired Staff

fabian

Proxmox Staff Member

azhv

New Member

fabian

Proxmox Staff Member

Tommi2Day

Member

We value your privacy