'Can't lock file' timeout error

primarscources

New Member
Jan 2, 2024
1
0
1
I am new to Proxmox. My configuration is four nodes, each of which has an SSD containing the latest version of Proxmox, two large SATA HDDs, and a Mellanox NIC. The nodes are linked through a 2.5G network switch using the motherboard LAN ports. I also have a second (faster) switch, to which the Mellanox cards are connected. This fast switch is intended to handle Ceph traffic, however I am having trouble adding any kind of storage to a VM.

I inducted each of the 8 HDDs into Ceph as OSDs, created a monitor and a manager on each node, then ran this command intended to create an erasure-coded pool with 2 drives worth of parity data:
pveceph pool create storage_pool --erasure-coding k=6,m=2
After the command ran, the Ceph PGs summary shows 33 active+clean, and 128 creating+incomplete. The status is now HEALTH_WARN, with the summary Reduced data availability: 128 pgs inactive, 128 pgs incomplete
I can't tell if this means it worked, it failed, or merely that nothing is stored on it yet. The OSDs summary shows 8 in, 0 out, 8 Up, 0 Down, which sounds encouraging.

I then set up a Debian virtual machine intended to operate as a fileserver. Under the VM's Hardware tab, any attempt to add a virtual disk (residing on either the Ceph storage pool, or even the same SSD Proxmox lives on) results in the following error:
can't lock file '/var/lock/qemu-server/lock-<VM_ID>.conf' - got timeout (500)
I also tried adding a Moint Point to a Debian LXC container and got a similar timeout error. So I've struck out on VMs and LXCs, which may suggest the problem lies elsewhere.

I can't tell if this is a Ceph misconfiguration, a general Proxmox misconfiguration, a misconfiguration of the VM/LXC, or something else. Is there a recovery protocol, or should I simply delete everything and start over?

Thanks in advance!


My Ceph configuration is:
Code:
[global]
    auth_client_required = cephx
    auth_cluster_required = cephx
    auth_service_required = cephx
    cluster_network = 192.168.0.10/24
    fsid = XXXXXXXX
    mon_allow_pool_delete = true
    mon_host = 192.168.0.10 192.168.0.11 192.168.0.12 192.168.0.13
    ms_bind_ipv4 = true
    ms_bind_ipv6 = false
    osd_pool_default_min_size = 2
    osd_pool_default_size = 3
    public_network = 192.168.0.10/24

[client]
    keyring = /etc/pve/priv/$cluster.$name.keyring

[client.crash]
    keyring = /etc/pve/ceph/$cluster.$name.keyring

[mon.node1]
    public_addr = 192.168.0.10

[mon.node2]
    public_addr = 192.168.0.11

[mon.node3]
    public_addr = 192.168.0.12

[mon.node4]
    public_addr = 192.168.0.13
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!