Issues while cloning VMs simultaneously

azhv

New Member
Oct 5, 2020
14
0
1
34
Hello, I encountered an issue while trying to clone two (or more) VMs simultaneously between two nodes. Both nodes have NFS storage.
When I start cloning the VMs the first starts and the second one fails with: TASK ERROR: clone failed: error during cfs-locked 'nfs2' operation: got lock timeout - aborting command.
I've done the same test with smaller VMs (effective size like 3gb) and both were transferred. I've also tried this with templates - if its a but bigger (15-20gb if fails with timeout).
I was wondering if there are ways to avoid this error and maybe queue the requests?
Any ideas/workarounds are more than welcome.
Thank you.
proxmox-ve: 6.2-1 (running kernel: 5.4.34-1-pve)
pve-manager: 6.2-4 (running version: 6.2-4/9824574a)
pve-kernel-5.4: 6.2-1
pve-kernel-helper: 6.2-1
pve-kernel-5.4.34-1-pve: 5.4.34-2
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.3-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.15-pve1
libproxmox-acme-perl: 1.0.3
libpve-access-control: 6.1-1
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.1-2
libpve-guest-common-perl: 3.0-10
libpve-http-server-perl: 3.0-5
libpve-storage-perl: 6.1-7
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.2-1
lxcfs: 4.0.3-pve2
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.2-1
pve-cluster: 6.1-8
pve-container: 3.1-5
pve-docs: 6.2-4
pve-edk2-firmware: 2.20200229-1
pve-firewall: 4.1-2
pve-firmware: 3.1-1
pve-ha-manager: 3.0-9
pve-i18n: 2.1-2
pve-qemu-kvm: 5.0.0-2
pve-xtermjs: 4.3.0-1
qemu-server: 6.2-2
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.3-pve1
 
I was wondering if there are ways to avoid this error and maybe queue the requests?
For the moment I'd use the command line so that 2 (or more) qm migrate commands get executed one after the other.

Could you please try to replicate this with command line and post the exact command? And also your /etc/pve/storage.cfg Because so far I have not been able to reproduce the problem.
 
your NFS storage is probably overloaded by the first clone, and can't handle allocating the image file for the second one before hitting a (60s) timeout..
 
Hello,
And thanks for the replies.
@fabian - Is there any way to avoid this? For example increasing the timeout or tune the NFS? I tried to fiddle with the NFS settings but still hit the issue. If the 60s timer is changed it will affect only cloning or all operations?
@Dominic - actually I hit the problem when cloning VMs simultaneously. If I wait for the first operation to finish and start the other its fine.
dir: local
path /var/lib/vz
content snippets,images,rootdir,iso,vztmpl,backup
maxfiles 1
shared 0

nfs: templates
export /media/templates
path /mnt/pve/templates
server 10.0.10.40
content iso
maxfiles 0

nfs: datastore-auto1
export /media/datastore-auto1
path /mnt/pve/datastore-auto1
server 10.0.10.30
content iso,images

nfs: datastore-auto2
export /media/datastore-auto2
path /mnt/pve/datastore-auto2
server 10.0.10.32
content images,iso
maxfiles 0
Thank you!
 
no, the timeout is not configurable (it's for a cluster-wide shared lock).
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!