'Can't lock file' timeout error

primarscources · Aug 25, 2024

I am new to Proxmox. My configuration is four nodes, each of which has an SSD containing the latest version of Proxmox, two large SATA HDDs, and a Mellanox NIC. The nodes are linked through a 2.5G network switch using the motherboard LAN ports. I also have a second (faster) switch, to which the Mellanox cards are connected. This fast switch is intended to handle Ceph traffic, however I am having trouble adding any kind of storage to a VM.

I inducted each of the 8 HDDs into Ceph as OSDs, created a monitor and a manager on each node, then ran this command intended to create an erasure-coded pool with 2 drives worth of parity data:
pveceph pool create storage_pool --erasure-coding k=6,m=2
After the command ran, the Ceph PGs summary shows 33 active+clean, and 128 creating+incomplete. The status is now HEALTH_WARN, with the summary Reduced data availability: 128 pgs inactive, 128 pgs incomplete
I can't tell if this means it worked, it failed, or merely that nothing is stored on it yet. The OSDs summary shows 8 in, 0 out, 8 Up, 0 Down, which sounds encouraging.

I then set up a Debian virtual machine intended to operate as a fileserver. Under the VM's Hardware tab, any attempt to add a virtual disk (residing on either the Ceph storage pool, or even the same SSD Proxmox lives on) results in the following error:
can't lock file '/var/lock/qemu-server/lock-<VM_ID>.conf' - got timeout (500)
I also tried adding a Moint Point to a Debian LXC container and got a similar timeout error. So I've struck out on VMs and LXCs, which may suggest the problem lies elsewhere.

I can't tell if this is a Ceph misconfiguration, a general Proxmox misconfiguration, a misconfiguration of the VM/LXC, or something else. Is there a recovery protocol, or should I simply delete everything and start over?

Thanks in advance!

My Ceph configuration is:

Code:

[global]
    auth_client_required = cephx
    auth_cluster_required = cephx
    auth_service_required = cephx
    cluster_network = 192.168.0.10/24
    fsid = XXXXXXXX
    mon_allow_pool_delete = true
    mon_host = 192.168.0.10 192.168.0.11 192.168.0.12 192.168.0.13
    ms_bind_ipv4 = true
    ms_bind_ipv6 = false
    osd_pool_default_min_size = 2
    osd_pool_default_size = 3
    public_network = 192.168.0.10/24

[client]
    keyring = /etc/pve/priv/$cluster.$name.keyring

[client.crash]
    keyring = /etc/pve/ceph/$cluster.$name.keyring

[mon.node1]
    public_addr = 192.168.0.10

[mon.node2]
    public_addr = 192.168.0.11

[mon.node3]
    public_addr = 192.168.0.12

[mon.node4]
    public_addr = 192.168.0.13

Search

Search

'Can't lock file' timeout error

primarscources

New Member

We value your privacy