Hi,
I've used Proxmox for many years but recently decided to setup a lab env with three nodes to test out using ceph. I have three simple nodes with four NIC's, 1x1Gbps for management and corosync, 1x2,5Gbps for VM bridge (vlan aware), 2x2,5Gbps connected directly to the other nodes according to the full mesh guide https://pve.proxmox.com/wiki/Full_Mesh_Network_for_Ceph_Server#Routed_Setup_(with_Fallback)
Due to the network setup, I want to separate public and cluster traffic, public over it's own subnet going over the VM bridge and cluster through the full mesh network.
The nodes have dedicated NVME's for use with ceph.
Management network - Routed network 10.83.100.0/24
Ceph public network - Non routed vlan 10.83.150.0/24
Ceph cluster network - Only available on nodes using frr/ISIS 172.16.1.0/24
I've configured the nodes to have one vlan interface on 10.83.150.0/24 each, 21, 22, 23 and one loopback ip on the cluster network 172.16.1.0/24 each, 21, 22, 23 according to the guide. I also change mtu to 9000 on the cluster network.
All nodes can ping eachothers ips on both of these networks, so far so good.
I installed Ceph 19, beacuse I didn't think much of it and thought the last version would be preferrable in a new setup (was this a mistake?).
I can create the monitors on each nodes, as well as the managers, all seems ok. When I add the OSD's for each node (one disk per node, so one OSD per node), I can add the first and it shows up as in and up, but when I add the second one, they both fail and goes down. I can not understand the logs and what the problem seems to be, and I've asked all the AI's telling me to add explicit ip's for monitors and osd's without any changes in behaviour.
I get the feeling I'm missing something obvious, but I can't figure it out.
This is my ceph.conf
This is the interfaces file of the first node
And this is the result of the vtysh show openfabric route command also on the primary node
The osd logs from node 2 when adding the second OSD. It seems to loop/try multiple times before failing
I've used Proxmox for many years but recently decided to setup a lab env with three nodes to test out using ceph. I have three simple nodes with four NIC's, 1x1Gbps for management and corosync, 1x2,5Gbps for VM bridge (vlan aware), 2x2,5Gbps connected directly to the other nodes according to the full mesh guide https://pve.proxmox.com/wiki/Full_Mesh_Network_for_Ceph_Server#Routed_Setup_(with_Fallback)
Due to the network setup, I want to separate public and cluster traffic, public over it's own subnet going over the VM bridge and cluster through the full mesh network.
The nodes have dedicated NVME's for use with ceph.
Management network - Routed network 10.83.100.0/24
Ceph public network - Non routed vlan 10.83.150.0/24
Ceph cluster network - Only available on nodes using frr/ISIS 172.16.1.0/24
I've configured the nodes to have one vlan interface on 10.83.150.0/24 each, 21, 22, 23 and one loopback ip on the cluster network 172.16.1.0/24 each, 21, 22, 23 according to the guide. I also change mtu to 9000 on the cluster network.
All nodes can ping eachothers ips on both of these networks, so far so good.
I installed Ceph 19, beacuse I didn't think much of it and thought the last version would be preferrable in a new setup (was this a mistake?).
I can create the monitors on each nodes, as well as the managers, all seems ok. When I add the OSD's for each node (one disk per node, so one OSD per node), I can add the first and it shows up as in and up, but when I add the second one, they both fail and goes down. I can not understand the logs and what the problem seems to be, and I've asked all the AI's telling me to add explicit ip's for monitors and osd's without any changes in behaviour.
I get the feeling I'm missing something obvious, but I can't figure it out.
This is my ceph.conf
Code:
[global]
auth_client_required = cephx
auth_cluster_required = cephx
auth_service_required = cephx
cluster_network = 172.16.1.0/24
fsid = 79d31fdf-6c05-4458-b6cc-11f79a833d49
mon_allow_pool_delete = true
mon_host = 10.83.150.21 10.83.150.22 10.83.150.23
ms_bind_ipv4 = true
ms_bind_ipv6 = false
osd_pool_default_min_size = 2
osd_pool_default_size = 3
public_network = 10.83.150.0/24
[client]
keyring = /etc/pve/priv/$cluster.$name.keyring
[client.crash]
keyring = /etc/pve/ceph/$cluster.$name.keyring
[mon.fv-prx-01]
public_addr = 10.83.150.21
[mon.fv-prx-02]
public_addr = 10.83.150.22
[mon.fv-prx-03]
public_addr = 10.83.150.23
[osd.0]
cluster_addr = 172.16.1.21
public_addr = 10.83.150.21
[osd.1]
cluster_addr = 172.16.1.22
public_addr = 10.83.150.22
[osd.2]
cluster_addr = 172.16.1.23
public_addr = 10.83.150.23
This is the interfaces file of the first node
Code:
auto lo
iface lo inet loopback
auto enx00e04c68048b
iface enx00e04c68048b inet manual
auto enp2s0
iface enp2s0 inet static
mtu 9000
auto eno1
iface eno1 inet static
mtu 9000
iface wlp4s0 inet manual
auto enx7cc2c63c7c1b
iface enx7cc2c63c7c1b inet manual
auto vmbr0
iface vmbr0 inet static
address 10.83.100.21/24
gateway 10.83.100.1
bridge-ports enx7cc2c63c7c1b
bridge-stp off
bridge-fd 0
bridge-vlan-aware yes
bridge-vids 2-4094
#MGMT / Cluster
auto vmbr1
iface vmbr1 inet manual
bridge-ports enx00e04c68048b
bridge-stp off
bridge-fd 0
bridge-vlan-aware yes
bridge-vids 2-4094
#Guest bridge
auto ceph
iface ceph inet static
address 10.83.150.21/24
mtu 1500
vlan-id 150
vlan-raw-device vmbr1
And this is the result of the vtysh show openfabric route command also on the primary node
Code:
Area 1:
IS-IS paths to level-2 routers that speak IP
Vertex Type Metric Next-Hop Interface Parent
------------------------------------------------------------------------
fv-prx-01
172.16.1.0/24 IP internal 0 fv-prx-01(4)
fv-prx-02 TE-IS 10 fv-prx-02 enp2s0 fv-prx-01(4)
fv-prx-03 TE-IS 10 fv-prx-03 eno1 fv-prx-01(4)
172.16.1.0/24 IP TE 20 fv-prx-02 enp2s0 fv-prx-02(4)
fv-prx-03 eno1 fv-prx-03(4)
IS-IS L2 IPv4 routing table:
Prefix Metric Interface Nexthop Label(s)
---------------------------------------------------------
172.16.1.0/24 20 enp2s0 172.16.1.22 -
eno1 172.16.1.23 -
The osd logs from node 2 when adding the second OSD. It seems to loop/try multiple times before failing
Code:
2025-06-24T15:55:20.817+0200 713c700036c0 1 osd.1 215 state: booting -> active
2025-06-24T15:55:20.817+0200 713c667f06c0 1 osd.1 pg_epoch: 215 pg[1.0( empty local-lis/les=0/0 n=0 ec=190/190 lis/c=0/0 les/c/f=0/0/0 sis=215) [1,0] r=0 lpr=215 pi=[190,215)/0 crt=0'0 mlcod 0'0 unknown NOTIFY mbc={}] PeeringState::start_peering_interval up [] -> [1,0], acting [] -> [1,0], acting_primary ? -> 1, up_primary ? -> 1, role -1 -> 0, features acting 4540701547738038271 upacting 4540701547738038271
2025-06-24T15:55:20.817+0200 713c667f06c0 1 osd.1 pg_epoch: 215 pg[1.0( empty local-lis/les=0/0 n=0 ec=190/190 lis/c=0/0 les/c/f=0/0/0 sis=215) [1,0] r=0 lpr=215 pi=[190,215)/0 crt=0'0 mlcod 0'0 unknown mbc={}] state<Start>: transitioning to Primary
2025-06-24T15:55:21.828+0200 713c700036c0 0 log_channel(cluster) log [WRN] : Monitor daemon marked osd.1 down, but it is still running
2025-06-24T15:55:21.828+0200 713c700036c0 0 log_channel(cluster) log [DBG] : map e216 wrongly marked me down at e216
2025-06-24T15:55:21.828+0200 713c700036c0 1 osd.1 216 start_waiting_for_healthy
2025-06-24T15:55:21.828+0200 713c700036c0 1 osd.1 216 start_boot
2025-06-24T15:55:21.828+0200 713c667f06c0 1 osd.1 pg_epoch: 216 pg[1.0( empty local-lis/les=0/0 n=0 ec=190/190 lis/c=0/0 les/c/f=0/0/0 sis=215) [1,0] r=0 lpr=215 pi=[190,215)/0 crt=0'0 mlcod 0'0 creating+peering mbc={}] state<Started/Primary/Peering>: Peering, affected_by_map, going to Reset
2025-06-24T15:55:21.829+0200 713c667f06c0 1 osd.1 pg_epoch: 216 pg[1.0( empty local-lis/les=0/0 n=0 ec=190/190 lis/c=0/0 les/c/f=0/0/0 sis=216) [] r=-1 lpr=216 pi=[190,216)/0 crt=0'0 mlcod 0'0 unknown mbc={}] PeeringState::start_peering_interval up [1,0] -> [], acting [1,0] -> [], acting_primary 1 -> -1, up_primary 1 -> -1, role 0 -> -1, features acting 4540701547738038271 upacting 4540701547738038271
2025-06-24T15:55:21.829+0200 713c667f06c0 1 osd.1 pg_epoch: 216 pg[1.0( empty local-lis/les=0/0 n=0 ec=190/190 lis/c=0/0 les/c/f=0/0/0 sis=216) [] r=-1 lpr=216 pi=[190,216)/0 crt=0'0 mlcod 0'0 unknown NOTIFY mbc={}] state<Start>: transitioning to Stray
2025-06-24T15:55:21.829+0200 713c7802d6c0 1 osd.1 216 set_numa_affinity storage numa node 0
2025-06-24T15:55:21.829+0200 713c7802d6c0 -1 osd.1 216 set_numa_affinity unable to identify public interface '' numa node: (2) No such file or directory
2025-06-24T15:55:21.829+0200 713c7802d6c0 1 osd.1 216 set_numa_affinity not setting numa affinity
2025-06-24T15:55:21.829+0200 713c7802d6c0 1 bluestore(/var/lib/ceph/osd/ceph-1) collect_metadata devices span numa nodes 0
2025-06-24T15:55:21.830+0200 713c7802d6c0 1 bluestore(/var/lib/ceph/osd/ceph-1) collect_metadata devices span numa nodes 0