Lock Timeout error in Proxmox 5.0

Chalky

New Member
Whenever I try and create a VM on an SMB Share that I've mounted and marked as Shared in Proxmox I get the following error:

Code:
Jul 20 14:18:53 vms603 pvedaemon[3575]: VM 102 creating disks failed
Jul 20 14:18:53 vms603 pvedaemon[3575]: create failed - unable to create image: got lock timeout - aborting command
Jul 20 14:18:53 vms603 pvedaemon[1545]: <root@pam> end task UPID:vms603:00000DF7:000133FC:5970AD81:qmcreate:102:root@pam: create failed - unable to create image: got lock timeout - aborting command

The files are created but Proxmox gives up before the files have finished being created. If I turn off the 'Shared' flag then it works but then I can't do any Live Migrations which defeats the object of the whole operation.

I see this crop up in the forums from time to time on older versions, but no-one ever seems to go far enough to get an actual resolution to this.
 
I have tried creating several qcow2 files manually using the qemu-img command and all of the different preallocation switches and don't get any errors. I don't know what command Proxmox runs to create the image, there's several threads asking this and Proxmox staff have replied with "look in the GUI logs" but it isn't in the GUI logs!

Code:
root@mgt9:/mnt/cluster6-vol1/images/112# qemu-img create -f qcow2 -o preallocation=metadata test-metadata.qcow2 32G
#Formatting 'test-metadata.qcow2', fmt=qcow2 size=34359738368 encryption=off cluster_size=65536 preallocation=metadata lazy_refcounts=off refcount_bits=16
root@mgt9:/mnt/cluster6-vol1/images/112# qemu-img create -f qcow2 -o preallocation=off test-off.qcow2 32G
Formatting 'test-off.qcow2', fmt=qcow2 size=34359738368 encryption=off cluster_size=65536 preallocation=off lazy_refcounts=off refcount_bits=16
root@mgt9:/mnt/cluster6-vol1/images/112# qemu-img create -f qcow2 -o preallocation=falloc test-falloc.qcow2 32G
Formatting 'test-falloc.qcow2', fmt=qcow2 size=34359738368 encryption=off cluster_size=65536 preallocation=falloc lazy_refcounts=off refcount_bits=16
root@mgt9:/mnt/cluster6-vol1/images/112# qemu-img create -f qcow2 -o preallocation=full test-full.qcow2 32G
Formatting 'test-full.qcow2', fmt=qcow2 size=34359738368 encryption=off cluster_size=65536 preallocation=full lazy_refcounts=off refcount_bits=16
 
How exactly are you mounting the SMB share and how are you presenting it to Proxmox? CIFS uses Windows locking mechanisms when it can. If you don't do anything to prevent it, it will use opportunistic locks, which will do client-side caching of files. You may want to look up "veto oplock files" in the smb.conf documentation to see if the locks "qm" is using conflict with opportunistic locks.

When you're in a cluster, there are cluster locks for everything that happens when they use qm. Everything is basically abstracted away from qemu-img because they want everything in the cluster to stay in sync. I've read through the code and it works like this:

- You create a VM in the UI
- The UI calls qm create (as part of the API)
- Under the hood it calls, Qemu::create_vm, which in turn calls Qemu::create_disks
- Part of the process of creating a disk in that section involves determining which storage type you're on, if you're in a cluster, and other things
- Assuming you're using a directory store (the only way I can think you'd do it), it will call vdisk_alloc(), which will then attempt to allocate the disk based on whatever storage plugin you're using
- vdisk_alloc() then calls into the cluster API to try and get a lock on the disk. If it can't come up with a quorum for the lock, it will give up

In this particularly case, my guess is that because you're marking it as shared, a directory path is not a shared resource, so the other hosts in the cluster don't have any idea that the resource is shared and can't determine how to lock the disk image. They eventually give up and the lock fails. This takes about 120 seconds according to the code (60 for the allocation, and 60 for the failure if no quorum lock can be established.)

So the fun part of this is if you have the ability, you can write your own plugin to handle CIFS connections. But the way it sits right now, the cluster locking mechanism is not designed for having cluster locks on resources like this yet. Perhaps in the future?
 
How exactly are you mounting the SMB share and how are you presenting it to Proxmox? CIFS uses Windows locking mechanisms when it can. If you don't do anything to prevent it, it will use opportunistic locks, which will do client-side caching of files. You may want to look up "veto oplock files" in the smb.conf documentation to see if the locks "qm" is using conflict with opportunistic locks.

When you're in a cluster, there are cluster locks for everything that happens when they use qm. Everything is basically abstracted away from qemu-img because they want everything in the cluster to stay in sync. I've read through the code and it works like this:

- You create a VM in the UI
- The UI calls qm create (as part of the API)
- Under the hood it calls, Qemu::create_vm, which in turn calls Qemu::create_disks
- Part of the process of creating a disk in that section involves determining which storage type you're on, if you're in a cluster, and other things
- Assuming you're using a directory store (the only way I can think you'd do it), it will call vdisk_alloc(), which will then attempt to allocate the disk based on whatever storage plugin you're using
- vdisk_alloc() then calls into the cluster API to try and get a lock on the disk. If it can't come up with a quorum for the lock, it will give up

In this particularly case, my guess is that because you're marking it as shared, a directory path is not a shared resource, so the other hosts in the cluster don't have any idea that the resource is shared and can't determine how to lock the disk image. They eventually give up and the lock fails. This takes about 120 seconds according to the code (60 for the allocation, and 60 for the failure if no quorum lock can be established.)

So the fun part of this is if you have the ability, you can write your own plugin to handle CIFS connections. But the way it sits right now, the cluster locking mechanism is not designed for having cluster locks on resources like this yet. Perhaps in the future?

close, but no cigar ;) the "cluster lock" is actually handled by our cluster file system, and is the same for all shared storages (and works just fine for shared dir storages in principle, as it is not related to the storage plugin at all). this lock is not intended for long-running operations - the lock is automatically cleared after 120 seconds, so we only give commands executed under it 60 seconds to finish (and another 60 seconds to abort again).

@OP: my guess is allocating a disk of your chosen size takes longer than 60 seconds, which should never happen. maybe there are some CIFS settings you could tune to get shorter allocation times?
 
close, but no cigar ;) the "cluster lock" is actually handled by our cluster file system, and is the same for all shared storages (and works just fine for shared dir storages in principle, as it is not related to the storage plugin at all). this lock is not intended for long-running operations - the lock is automatically cleared after 120 seconds, so we only give commands executed under it 60 seconds to finish (and another 60 seconds to abort again).

@OP: my guess is allocating a disk of your chosen size takes longer than 60 seconds, which should never happen. maybe there are some CIFS settings you could tune to get shorter allocation times?

Thanks for the education. It was a pretty cursory examination of the Perl code. :)

It looked like the locking mechanism got passed off to the storage plugin. I'm sure I just misread the code.

Incidentally, that's some of the better Perl code I've seen in a long time. Kudos for that.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!