I have a Proxmox 8.4 cluster with two nodes and one qdevice, with Ceph Squid 19.2.1 recently installed and an additional device to maintain quorum for Ceph. Each node has one SATA SSD, so I have two OSDs (osd.18 and osd.19) created, and I have a pool called poolssd with both. Since ceph has been installed and configured, I get this message and it won't let me create any virtual machine in said pool:
HEALTH_WARN: Reduced data availability: 33 pgs inactive, 33 pgs peering
pg 1.0 is stuck peering since forever, current state peering, last acting [19,18]
pg 4.0 is stuck peering for 26h, current state peering, last acting [18,19]
pg 4.1 is stuck peering for 26h, current state creating+peering, last acting [19,18]
pg 4.2 is stuck peering for 26h, current state peering, last acting [18,19]
pg 4.3 is stuck peering for 26h, current state peering, last acting [18,19]
pg 4.4 is stuck peering for 26h, current state peering, last acting [18,19]
pg 4.5 is stuck peering for 26h, current state creating+peering, last acting [19,18]
pg 4.6 is stuck peering for 26h, current state peering, last acting [18,19]
pg 4.7 is stuck peering for 26h, current state peering, last acting [18,19]
pg 4.8 is stuck peering for 26h, current state creating+peering, last acting [19,18]
pg 4.9 is stuck peering for 26h, current state creating+peering, last acting [19,18]
pg 4.a is stuck peering for 26h, current state creating+peering, last acting [19,18]
pg 4.b is stuck peering for 26h, current state peering, last acting [18,19]
pg 4.c is stuck peering for 26h, current state creating+peering, last acting [19,18]
pg 4.d is stuck peering for 26h, current state creating+peering, last acting [19,18]
pg 4.e is stuck peering for 26h, current state creating+peering, last acting [19,18]
pg 4.f is stuck peering for 26h, current state peering, last acting [18,19]
pg 4.10 is stuck peering for 26h, current state peering, last acting [18,19]
pg 4.11 is stuck peering for 26h, current state peering, last acting [18,19]
pg 4.12 is stuck peering for 26h, current state peering, last acting [18,19]
pg 4.13 is stuck peering for 26h, current state creating+peering, last acting [19,18]
pg 4.14 is stuck peering for 26h, current state peering, last acting [18,19]
pg 4.15 is stuck peering for 26h, current state creating+peering, last acting [19,18]
pg 4.16 is stuck peering for 26h, current state peering, last acting [18,19]
pg 4.17 is stuck peering for 26h, current state peering, last acting [18,19]
pg 4.18 is stuck peering for 26h, current state creating+peering, last acting [19,18]
pg 4.19 is stuck peering for 26h, current state peering, last acting [18,19]
pg 4.1a is stuck peering for 26h, current state creating+peering, last acting [19,18]
pg 4.1b is stuck peering for 26h, current state creating+peering, last acting [19,18]
pg 4.1c is stuck peering for 26h, current state peering, last acting [18,19]
pg 4.1d is stuck peering for 26h, current state peering, last acting [18,19]
pg 4.1e is stuck peering for 26h, current state peering, last acting [18,19]
pg 4.1f is stuck peering for 26h, current state creating+peering, last acting [19,18]
I have 3 monitors configured, the 2 corresponding to the 2 proxmox nodes (mon.pve1 and mon.pve2) and the quorum monitor. And I also get the following messages:
HEALTH_WARN: 2 daemons have recently crashed
mon.pve1 crashed on host pve1 at 2025-07-03T05:24:48.235164Z
mon.pve1 crashed on host pve1 at 2025-07-03T05:45:50.830345Z
HEALTH_WARN: 14 slow ops, oldest one blocked for 8610 sec, daemons [osd.18,osd.19,mon.pve1] have slow ops.
I have a dedicated network for the private Ceph network and another for the public network, as can be seen in the Ceph.conf configuration file, which is as follows:
[global]
auth_client_required = cephx
auth_cluster_required = cephx
auth_service_required = cephx
cluster_network = 192.168.70.0/24
fsid = eb409a91-affd-487a-a02c-4df2e46e0a2e
mon_allow_pool_delete = true
mon_initial_members = pve1-pub pve2-pub ceph-mon3-pub
mon_host = 192.168.60.11 192.168.60.12 192.168.60.130
ms_bind_ipv4 = true
ms_bind_ipv6 = false
osd_pool_default_min_size = 1
osd_pool_default_size = 2
public_network = 192.168.60.0/24
[client]
keyring = /etc/pve/priv/$cluster.$name.keyring
[client.crash]
keyring = /etc/pve/ceph/$cluster.$name.keyring
[mon.pve1]
host = pve1
public_addr = 192.168.60.11
cluster_addr = 192.168.70.11
[mon.pve2]
host = pve2
public_addr = 192.168.60.12
cluster_addr = 192.168.70.12
[mon.ceph-mon3]
host = ceph-mon3
public_addr = 192.168.60.130
cluster_addr = 192.168.70.130
Both Proxmox nodes have subscriptions to the Proxmox Enterprise repository, so they are up to date with the stable repository.
I had previously performed this same configuration in a test environment using virtual machines for the nodes, and everything worked correctly in that environment. I replicated the test environment to the HPE physical servers to set up the production environment, but I can't get it to work.
Can anyone give me a clue?
Thank you very much.
HEALTH_WARN: Reduced data availability: 33 pgs inactive, 33 pgs peering
pg 1.0 is stuck peering since forever, current state peering, last acting [19,18]
pg 4.0 is stuck peering for 26h, current state peering, last acting [18,19]
pg 4.1 is stuck peering for 26h, current state creating+peering, last acting [19,18]
pg 4.2 is stuck peering for 26h, current state peering, last acting [18,19]
pg 4.3 is stuck peering for 26h, current state peering, last acting [18,19]
pg 4.4 is stuck peering for 26h, current state peering, last acting [18,19]
pg 4.5 is stuck peering for 26h, current state creating+peering, last acting [19,18]
pg 4.6 is stuck peering for 26h, current state peering, last acting [18,19]
pg 4.7 is stuck peering for 26h, current state peering, last acting [18,19]
pg 4.8 is stuck peering for 26h, current state creating+peering, last acting [19,18]
pg 4.9 is stuck peering for 26h, current state creating+peering, last acting [19,18]
pg 4.a is stuck peering for 26h, current state creating+peering, last acting [19,18]
pg 4.b is stuck peering for 26h, current state peering, last acting [18,19]
pg 4.c is stuck peering for 26h, current state creating+peering, last acting [19,18]
pg 4.d is stuck peering for 26h, current state creating+peering, last acting [19,18]
pg 4.e is stuck peering for 26h, current state creating+peering, last acting [19,18]
pg 4.f is stuck peering for 26h, current state peering, last acting [18,19]
pg 4.10 is stuck peering for 26h, current state peering, last acting [18,19]
pg 4.11 is stuck peering for 26h, current state peering, last acting [18,19]
pg 4.12 is stuck peering for 26h, current state peering, last acting [18,19]
pg 4.13 is stuck peering for 26h, current state creating+peering, last acting [19,18]
pg 4.14 is stuck peering for 26h, current state peering, last acting [18,19]
pg 4.15 is stuck peering for 26h, current state creating+peering, last acting [19,18]
pg 4.16 is stuck peering for 26h, current state peering, last acting [18,19]
pg 4.17 is stuck peering for 26h, current state peering, last acting [18,19]
pg 4.18 is stuck peering for 26h, current state creating+peering, last acting [19,18]
pg 4.19 is stuck peering for 26h, current state peering, last acting [18,19]
pg 4.1a is stuck peering for 26h, current state creating+peering, last acting [19,18]
pg 4.1b is stuck peering for 26h, current state creating+peering, last acting [19,18]
pg 4.1c is stuck peering for 26h, current state peering, last acting [18,19]
pg 4.1d is stuck peering for 26h, current state peering, last acting [18,19]
pg 4.1e is stuck peering for 26h, current state peering, last acting [18,19]
pg 4.1f is stuck peering for 26h, current state creating+peering, last acting [19,18]
I have 3 monitors configured, the 2 corresponding to the 2 proxmox nodes (mon.pve1 and mon.pve2) and the quorum monitor. And I also get the following messages:
HEALTH_WARN: 2 daemons have recently crashed
mon.pve1 crashed on host pve1 at 2025-07-03T05:24:48.235164Z
mon.pve1 crashed on host pve1 at 2025-07-03T05:45:50.830345Z
HEALTH_WARN: 14 slow ops, oldest one blocked for 8610 sec, daemons [osd.18,osd.19,mon.pve1] have slow ops.
I have a dedicated network for the private Ceph network and another for the public network, as can be seen in the Ceph.conf configuration file, which is as follows:
[global]
auth_client_required = cephx
auth_cluster_required = cephx
auth_service_required = cephx
cluster_network = 192.168.70.0/24
fsid = eb409a91-affd-487a-a02c-4df2e46e0a2e
mon_allow_pool_delete = true
mon_initial_members = pve1-pub pve2-pub ceph-mon3-pub
mon_host = 192.168.60.11 192.168.60.12 192.168.60.130
ms_bind_ipv4 = true
ms_bind_ipv6 = false
osd_pool_default_min_size = 1
osd_pool_default_size = 2
public_network = 192.168.60.0/24
[client]
keyring = /etc/pve/priv/$cluster.$name.keyring
[client.crash]
keyring = /etc/pve/ceph/$cluster.$name.keyring
[mon.pve1]
host = pve1
public_addr = 192.168.60.11
cluster_addr = 192.168.70.11
[mon.pve2]
host = pve2
public_addr = 192.168.60.12
cluster_addr = 192.168.70.12
[mon.ceph-mon3]
host = ceph-mon3
public_addr = 192.168.60.130
cluster_addr = 192.168.70.130
Both Proxmox nodes have subscriptions to the Proxmox Enterprise repository, so they are up to date with the stable repository.
I had previously performed this same configuration in a test environment using virtual machines for the nodes, and everything worked correctly in that environment. I replicated the test environment to the HPE physical servers to set up the production environment, but I can't get it to work.
Can anyone give me a clue?
Thank you very much.