I am new to Proxmox. My configuration is four nodes, each of which has an SSD containing the latest version of Proxmox, two large SATA HDDs, and a Mellanox NIC. The nodes are linked through a 2.5G network switch using the motherboard LAN ports. I also have a second (faster) switch, to which the Mellanox cards are connected. This fast switch is intended to handle Ceph traffic, however I am having trouble adding any kind of storage to a VM.
I inducted each of the 8 HDDs into Ceph as OSDs, created a monitor and a manager on each node, then ran this command intended to create an erasure-coded pool with 2 drives worth of parity data:
After the command ran, the Ceph PGs summary shows 33 active+clean, and 128 creating+incomplete. The status is now
I can't tell if this means it worked, it failed, or merely that nothing is stored on it yet. The OSDs summary shows 8 in, 0 out, 8 Up, 0 Down, which sounds encouraging.
I then set up a Debian virtual machine intended to operate as a fileserver. Under the VM's Hardware tab, any attempt to add a virtual disk (residing on either the Ceph storage pool, or even the same SSD Proxmox lives on) results in the following error:
I also tried adding a Moint Point to a Debian LXC container and got a similar timeout error. So I've struck out on VMs and LXCs, which may suggest the problem lies elsewhere.
I can't tell if this is a Ceph misconfiguration, a general Proxmox misconfiguration, a misconfiguration of the VM/LXC, or something else. Is there a recovery protocol, or should I simply delete everything and start over?
Thanks in advance!
My Ceph configuration is:
I inducted each of the 8 HDDs into Ceph as OSDs, created a monitor and a manager on each node, then ran this command intended to create an erasure-coded pool with 2 drives worth of parity data:
pveceph pool create storage_pool --erasure-coding k=6,m=2
After the command ran, the Ceph PGs summary shows 33 active+clean, and 128 creating+incomplete. The status is now
HEALTH_WARN
, with the summary Reduced data availability: 128 pgs inactive, 128 pgs incomplete
I can't tell if this means it worked, it failed, or merely that nothing is stored on it yet. The OSDs summary shows 8 in, 0 out, 8 Up, 0 Down, which sounds encouraging.
I then set up a Debian virtual machine intended to operate as a fileserver. Under the VM's Hardware tab, any attempt to add a virtual disk (residing on either the Ceph storage pool, or even the same SSD Proxmox lives on) results in the following error:
can't lock file '/var/lock/qemu-server/lock-<VM_ID>.conf' - got timeout (500)
I also tried adding a Moint Point to a Debian LXC container and got a similar timeout error. So I've struck out on VMs and LXCs, which may suggest the problem lies elsewhere.
I can't tell if this is a Ceph misconfiguration, a general Proxmox misconfiguration, a misconfiguration of the VM/LXC, or something else. Is there a recovery protocol, or should I simply delete everything and start over?
Thanks in advance!
My Ceph configuration is:
Code:
[global]
auth_client_required = cephx
auth_cluster_required = cephx
auth_service_required = cephx
cluster_network = 192.168.0.10/24
fsid = XXXXXXXX
mon_allow_pool_delete = true
mon_host = 192.168.0.10 192.168.0.11 192.168.0.12 192.168.0.13
ms_bind_ipv4 = true
ms_bind_ipv6 = false
osd_pool_default_min_size = 2
osd_pool_default_size = 3
public_network = 192.168.0.10/24
[client]
keyring = /etc/pve/priv/$cluster.$name.keyring
[client.crash]
keyring = /etc/pve/ceph/$cluster.$name.keyring
[mon.node1]
public_addr = 192.168.0.10
[mon.node2]
public_addr = 192.168.0.11
[mon.node3]
public_addr = 192.168.0.12
[mon.node4]
public_addr = 192.168.0.13
Last edited: