Ceph on 3-node full mesh can not add OSD's

subframe

New Member
Jun 25, 2025
2
0
1
Hi,

I've used Proxmox for many years but recently decided to setup a lab env with three nodes to test out using ceph. I have three simple nodes with four NIC's, 1x1Gbps for management and corosync, 1x2,5Gbps for VM bridge (vlan aware), 2x2,5Gbps connected directly to the other nodes according to the full mesh guide https://pve.proxmox.com/wiki/Full_Mesh_Network_for_Ceph_Server#Routed_Setup_(with_Fallback)
Due to the network setup, I want to separate public and cluster traffic, public over it's own subnet going over the VM bridge and cluster through the full mesh network.
The nodes have dedicated NVME's for use with ceph.

Management network - Routed network 10.83.100.0/24
Ceph public network - Non routed vlan 10.83.150.0/24
Ceph cluster network - Only available on nodes using frr/ISIS 172.16.1.0/24

I've configured the nodes to have one vlan interface on 10.83.150.0/24 each, 21, 22, 23 and one loopback ip on the cluster network 172.16.1.0/24 each, 21, 22, 23 according to the guide. I also change mtu to 9000 on the cluster network.
All nodes can ping eachothers ips on both of these networks, so far so good.
I installed Ceph 19, beacuse I didn't think much of it and thought the last version would be preferrable in a new setup (was this a mistake?).

I can create the monitors on each nodes, as well as the managers, all seems ok. When I add the OSD's for each node (one disk per node, so one OSD per node), I can add the first and it shows up as in and up, but when I add the second one, they both fail and goes down. I can not understand the logs and what the problem seems to be, and I've asked all the AI's telling me to add explicit ip's for monitors and osd's without any changes in behaviour.

I get the feeling I'm missing something obvious, but I can't figure it out.

This is my ceph.conf
Code:
[global]
    auth_client_required = cephx
    auth_cluster_required = cephx
    auth_service_required = cephx
    cluster_network = 172.16.1.0/24
    fsid = 79d31fdf-6c05-4458-b6cc-11f79a833d49
    mon_allow_pool_delete = true
    mon_host = 10.83.150.21 10.83.150.22 10.83.150.23
    ms_bind_ipv4 = true
    ms_bind_ipv6 = false
    osd_pool_default_min_size = 2
    osd_pool_default_size = 3
    public_network = 10.83.150.0/24

[client]
    keyring = /etc/pve/priv/$cluster.$name.keyring

[client.crash]
    keyring = /etc/pve/ceph/$cluster.$name.keyring

[mon.fv-prx-01]
    public_addr = 10.83.150.21

[mon.fv-prx-02]
    public_addr = 10.83.150.22

[mon.fv-prx-03]
    public_addr = 10.83.150.23

[osd.0]
    cluster_addr = 172.16.1.21
    public_addr = 10.83.150.21

[osd.1]
    cluster_addr = 172.16.1.22
    public_addr = 10.83.150.22

[osd.2]
    cluster_addr = 172.16.1.23
    public_addr = 10.83.150.23

This is the interfaces file of the first node
Code:
auto lo
iface lo inet loopback

auto enx00e04c68048b
iface enx00e04c68048b inet manual

auto enp2s0
iface enp2s0 inet static
        mtu 9000

auto eno1
iface eno1 inet static
        mtu 9000

iface wlp4s0 inet manual

auto enx7cc2c63c7c1b
iface enx7cc2c63c7c1b inet manual

auto vmbr0
iface vmbr0 inet static
        address 10.83.100.21/24
        gateway 10.83.100.1
        bridge-ports enx7cc2c63c7c1b
        bridge-stp off
        bridge-fd 0
        bridge-vlan-aware yes
        bridge-vids 2-4094
#MGMT / Cluster

auto vmbr1
iface vmbr1 inet manual
        bridge-ports enx00e04c68048b
        bridge-stp off
        bridge-fd 0
        bridge-vlan-aware yes
        bridge-vids 2-4094
#Guest bridge

auto ceph
iface ceph inet static
        address 10.83.150.21/24
        mtu 1500
        vlan-id 150
        vlan-raw-device vmbr1

And this is the result of the vtysh show openfabric route command also on the primary node
Code:
Area 1:
IS-IS paths to level-2 routers that speak IP
 Vertex         Type         Metric  Next-Hop   Interface  Parent      
 ------------------------------------------------------------------------
 fv-prx-01                                                              
 172.16.1.0/24  IP internal  0                             fv-prx-01(4)
 fv-prx-02      TE-IS        10      fv-prx-02  enp2s0     fv-prx-01(4)
 fv-prx-03      TE-IS        10      fv-prx-03  eno1       fv-prx-01(4)
 172.16.1.0/24  IP TE        20      fv-prx-02  enp2s0     fv-prx-02(4)
                                     fv-prx-03  eno1       fv-prx-03(4)


IS-IS L2 IPv4 routing table:

 Prefix         Metric  Interface  Nexthop      Label(s)
 ---------------------------------------------------------
 172.16.1.0/24  20      enp2s0     172.16.1.22  -        
                        eno1       172.16.1.23  -

The osd logs from node 2 when adding the second OSD. It seems to loop/try multiple times before failing
Code:
2025-06-24T15:55:20.817+0200 713c700036c0  1 osd.1 215 state: booting -> active
2025-06-24T15:55:20.817+0200 713c667f06c0  1 osd.1 pg_epoch: 215 pg[1.0( empty local-lis/les=0/0 n=0 ec=190/190 lis/c=0/0 les/c/f=0/0/0 sis=215) [1,0] r=0 lpr=215 pi=[190,215)/0 crt=0'0 mlcod 0'0 unknown NOTIFY mbc={}] PeeringState::start_peering_interval up [] -> [1,0], acting [] -> [1,0], acting_primary ? -> 1, up_primary ? -> 1, role -1 -> 0, features acting 4540701547738038271 upacting 4540701547738038271
2025-06-24T15:55:20.817+0200 713c667f06c0  1 osd.1 pg_epoch: 215 pg[1.0( empty local-lis/les=0/0 n=0 ec=190/190 lis/c=0/0 les/c/f=0/0/0 sis=215) [1,0] r=0 lpr=215 pi=[190,215)/0 crt=0'0 mlcod 0'0 unknown mbc={}] state<Start>: transitioning to Primary
2025-06-24T15:55:21.828+0200 713c700036c0  0 log_channel(cluster) log [WRN] : Monitor daemon marked osd.1 down, but it is still running
2025-06-24T15:55:21.828+0200 713c700036c0  0 log_channel(cluster) log [DBG] : map e216 wrongly marked me down at e216
2025-06-24T15:55:21.828+0200 713c700036c0  1 osd.1 216 start_waiting_for_healthy
2025-06-24T15:55:21.828+0200 713c700036c0  1 osd.1 216 start_boot
2025-06-24T15:55:21.828+0200 713c667f06c0  1 osd.1 pg_epoch: 216 pg[1.0( empty local-lis/les=0/0 n=0 ec=190/190 lis/c=0/0 les/c/f=0/0/0 sis=215) [1,0] r=0 lpr=215 pi=[190,215)/0 crt=0'0 mlcod 0'0 creating+peering mbc={}] state<Started/Primary/Peering>: Peering, affected_by_map, going to Reset
2025-06-24T15:55:21.829+0200 713c667f06c0  1 osd.1 pg_epoch: 216 pg[1.0( empty local-lis/les=0/0 n=0 ec=190/190 lis/c=0/0 les/c/f=0/0/0 sis=216) [] r=-1 lpr=216 pi=[190,216)/0 crt=0'0 mlcod 0'0 unknown mbc={}] PeeringState::start_peering_interval up [1,0] -> [], acting [1,0] -> [], acting_primary 1 -> -1, up_primary 1 -> -1, role 0 -> -1, features acting 4540701547738038271 upacting 4540701547738038271
2025-06-24T15:55:21.829+0200 713c667f06c0  1 osd.1 pg_epoch: 216 pg[1.0( empty local-lis/les=0/0 n=0 ec=190/190 lis/c=0/0 les/c/f=0/0/0 sis=216) [] r=-1 lpr=216 pi=[190,216)/0 crt=0'0 mlcod 0'0 unknown NOTIFY mbc={}] state<Start>: transitioning to Stray
2025-06-24T15:55:21.829+0200 713c7802d6c0  1 osd.1 216 set_numa_affinity storage numa node 0
2025-06-24T15:55:21.829+0200 713c7802d6c0 -1 osd.1 216 set_numa_affinity unable to identify public interface '' numa node: (2) No such file or directory
2025-06-24T15:55:21.829+0200 713c7802d6c0  1 osd.1 216 set_numa_affinity not setting numa affinity
2025-06-24T15:55:21.829+0200 713c7802d6c0  1 bluestore(/var/lib/ceph/osd/ceph-1) collect_metadata devices span numa nodes 0
2025-06-24T15:55:21.830+0200 713c7802d6c0  1 bluestore(/var/lib/ceph/osd/ceph-1) collect_metadata devices span numa nodes 0
 
What version specifically? If you created the OSDs before 19.2.1-pve3 there is a known issue and you should recreate them. But all the ones we added and then recreated had no issue doing so, or the specified crashes.

How are you adding the new OSD, via the GUI?

OSDs don't have/need an IP address. There aren't osd entries in ceph.conf by default.
 
This is a freshly installed cluster, nothing running on it yet, so everything was created with Ceph 19. Yes I've added everything through the UI except for the init specifying the networks as written in the guide.

Code:
# ceph --version
ceph version 19.2.1 (c783d93f19f71de89042abf6023076899b42259d) squid (stable)

# pveversion  
pve-manager/8.4.1/2a5fa54a8503f96d (running kernel: 6.8.12-11-pve)
 
Last edited: