Proxmox Ceph Configuration Issue – MON Timeout on Second and Third Node

Oct 25, 2024
3
0
1
Hello,

I’m currently trying to set up a Ceph cluster on three HP DL380 G11 servers.
  • OS drives: 2 × 480 GB SSD (RAID) per node
  • Ceph/VM drives: 4 × 800 GB SAS 24 Gb SSDs per node (JBOD)
  • Proxmox version: 9.0.3
  • Management network: 10.10.0.x (1 Gbps LAN)
  • Ceph Public network: 10.11.0.x (100 Gbps via Intel 810-C NICs)
  • Ceph Cluster network: 10.12.0.x (100 Gbps via Intel 810-C NICs)
  • The two Ceph networks are configured on bond0 (two ports of the Intel 810-C NIC) using Linux VLANs.
  • I can successfully ping all three hosts on all three networks.

  1. I created the Proxmox cluster and joined all three nodes (pve01, pve02, pve03) successfully.
  2. I installed Ceph on the first node (pve01) — installation works without issues.
  3. When I install Ceph on the second node (pve02) and try to add a MON, I get a timeout during the MON creation.
  4. The exact same issue occurs on the third node (pve03).
  5. Both the Web GUI and the CLI return a timeout when trying to create a MON on pve02 and pve03.

Conifugration file pve01 (that is visible on pve02 and pve03):

[global]
auth_client_required = cephx
auth_cluster_required = cephx
auth_service_required = cephx
cluster_network = 10.12.0.0/24
fsid = xxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
mon_allow_pool_delete = true
mon_host = 10.11.0.1
ms_bind_ipv4 = true
ms_bind_ipv6 = false
osd_pool_default_min_size = 2
osd_pool_default_size = 3
public_network = 10.11.0.0/24
[client]
keyring = /etc/pve/priv/$cluster.$name.keyring
[client.crash]
keyring = /etc/pve/ceph/$cluster.$name.keyring
[mon.pve01]
 
Hello,
if your firewall on Datacenter level is enabled, you will need to created a rule for the Ceph traffic.
Simplest version would be like that:

1761119212579.png
 
  • Like
Reactions: gurubert
Thank you for your reply. I have disabled the firewalls on all hosts, but it still didn’t work. I strongly suspect that the issue is related to the 100 Gbps connection — more specifically the VLAN networks I selected for the public and cluster traffic. I just ran a test on the same servers over the regular LAN network, and everything worked fine there. (I can still ping all hosts between each other over the VLANs for both the public and cluster networks.)