Missing information about creating 2nd ring in Ceph in documentation, and impossibility to create 2nd ring in Ceph.

logiczny

Member
Mar 10, 2021
11
1
23
67
Hi,
I run in my homelab a 3xnode cluster with ceph, 3xmons, 3xmgr, 3xmds. I run it for about ~3 years now.
Yesterday I've installed fresh 3x nodes and migrated my cluster from old nodes to a 3x new one, which are identical (GMKTEC M5 Plus, 24GB RAM each, beautiful devices, works like a charm so far).
Every node have 2xNICs (I'm using also VLANs, but it doesn't matter for this case).
1st subnet is public 192.168.66.0/24 and second, cluster one is 192.168.90.0/24, created exclusively for corosync and ceph traffic.
To this day everything worked like it should using only public subnet, but I've decided to add a second ring on cluster subnet to unload some traffic from the single NIC and start using second NIC as a cluster one. And the hell started...

/etc/hosts file on every node looks the same.
Each node can ping each other on both subnets and I can ssh beetwen them with no problem. I've successfully added 2nd ring (192.168.90.0/24) to the corosync and can see (with tcpdump) a traffic on the cluster subnet on ports: 6800, 53394 - And I can see that in the proxmox webgui (screenshot in a spoiler below).


The problem:

But, for some reason, I'm unable to find official documentation for Proxmox's Ceph how to add 2nd ring and I'm failing to do so by trial&error approach editing /etc/pve/ceph.conf in various ways, with no success.
I was able to find official documentation for "the non-proxmox, offical one" Ceph (URL), but it's states that I should add cluster_addr to each of the [ osd.x ] in the /etc/ceph/ceph.conf file (if I'm not mistaken), which is missing in proxmox, but this is normal AFAIK.

In proxmox, we have only /etc/pve/ceph.conf which do not contains any of the [osd.x] sections.
So I added a cluster_network in the [global] section and cluster_addr to each [mon.x] . I've tried first without adding cluster_addr to [mon.x] with no success.
Every config I've tried was followed by at least restart of an OSDs or by a full reboot, but Ceph won't use cluster subnet no matter what.
If I disconnect the public NIC by taking the RJ45 out, in the webgui I can see my node as online (green icon near the name), like a corosync knows what going on (also I confirmed by looking at the journalctl and seeing second ring's address being used ), but in the ceph tab (webgui) I see just a failed host and OSD down (2 nodes working, 1 failed) - so it's clearly not communicating properly via a ceph's cluster ring.

My configs looks like this:
1749151305274.png
Code:
/etc/hosts:
127.0.0.1       localhost

192.168.66.10   xxx
192.168.66.16   p1
192.168.66.17   p2
192.168.66.18   p3

192.168.90.16   p1-cluster
192.168.90.17   p2-cluster
192.168.90.18   p3-cluster
(please ignore 192.168.66.10 - it's usually off and have qorum vote set to 0, it's not a part of the problem in any way)

Code:
/etc/pve/corosync.conf:
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: outrunator
    nodeid: 2
    quorum_votes: 0
    ring0_addr: 192.168.66.10
    ring1_addr: 192.168.90.10
  }
  node {
    name: p1
    nodeid: 4
    quorum_votes: 1
    ring0_addr: 192.168.66.16
    ring1_addr: 192.168.90.16
  }
  node {
    name: p2
    nodeid: 6
    quorum_votes: 1
    ring0_addr: 192.168.66.17
    ring1_addr: 192.168.90.17
  }
  node {
    name: p3
    nodeid: 7
    quorum_votes: 1
    ring0_addr: 192.168.66.18
    ring1_addr: 192.168.90.18
  }
}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: cluster1
  config_version: 34
  interface {
    linknumber: 0
  }
  interface {
    linknumber: 1
  }
  ip_version: ipv4
  link_mode: passive
  secauth: off
  token: 10000
  version: 2
}

Code:
/etc/pve/ceph.conf:
[global]
        auth_client_required = cephx
        auth_cluster_required = cephx
        auth_service_required = cephx
        cluster_network = 192.168.90.0/24
        fsid = xxxyyyzzz
        mon_allow_pool_delete = true
        mon_host = 192.168.66.18 192.168.66.16 192.168.66.17
        mon_max_pg_per_osd = 300
        ms_bind_ipv4 = true
        ms_bind_ipv6 = false
        osd_pool_default_min_size = 2
        osd_pool_default_size = 3
        public_network = 192.168.66.0/24

[client]
        keyring = /etc/pve/priv/$cluster.$name.keyring

[client.crash]
        keyring = /etc/pve/ceph/$cluster.$name.keyring

[mds]
        keyring = /var/lib/ceph/mds/ceph-$id/keyring

[mds.outrunator]
        host = outrunator
        mds_standby_for_name = pve

[mds.p1]
        host = p1
        mds_standby_for_name = pve

[mds.p2]
        host = p2
        mds_standby_for_name = pve

[mds.p3]
        host = p3
        mds_standby_for_name = pve

[mon.p1]
        public_addr = 192.168.66.16
        cluster_addr = 192.168.90.16

[mon.p2]
        public_addr = 192.168.66.17
        cluster_addr = 192.168.90.17

[mon.p3]
        public_addr = 192.168.66.18
        cluster_addr = 192.168.90.18

Code:
ceph mon dump:
epoch 32
fsid xxxyyyzzz
last_changed 2025-06-05T17:21:42.655722+0200
created 2022-06-16T11:48:31.723602+0200
min_mon_release 19 (squid)
election_strategy: 1
0: [v2:192.168.66.18:3300/0,v1:192.168.66.18:6789/0] mon.p3
1: [v2:192.168.66.16:3300/0,v1:192.168.66.16:6789/0] mon.p1
2: [v2:192.168.66.17:3300/0,v1:192.168.66.17:6789/0] mon.p2
Code:
pvecm status
Cluster information
-------------------
Name:             cluster1
Config Version:   34
Transport:        knet
Secure auth:      off

Quorum information
------------------
Date:             Thu Jun  5 21:01:55 2025
Quorum provider:  corosync_votequorum
Nodes:            3
Node ID:          0x00000007
Ring ID:          4.cd2b
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   3
Highest expected: 3
Total votes:      3
Quorum:           2
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000004          1 192.168.66.16
0x00000006          1 192.168.66.17
0x00000007          1 192.168.66.18 (local)
Code:
pveversion -v
proxmox-ve: 8.4.0 (running kernel: 6.8.12-11-pve)
pve-manager: 8.4.1 (running version: 8.4.1/2a5fa54a8503f96d)
proxmox-kernel-helper: 8.1.1
proxmox-kernel-6.8.12-11-pve-signed: 6.8.12-11
proxmox-kernel-6.8: 6.8.12-11
amd64-microcode: 3.20240820.1~deb12u1
ceph: 19.2.1-pve3
ceph-fuse: 19.2.1-pve3
corosync: 3.1.9-pve1
criu: 3.17.1-2+deb12u1
glusterfs-client: 10.3-5
ifupdown: residual config
ifupdown2: 3.2.0-1+pmx11
libjs-extjs: 7.0.0-5
libknet1: 1.30-pve2
libproxmox-acme-perl: 1.6.0
libproxmox-backup-qemu0: 1.5.1
libproxmox-rs-perl: 0.3.5
libpve-access-control: 8.2.2
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.1.0
libpve-cluster-perl: 8.1.0
libpve-common-perl: 8.3.1
libpve-guest-common-perl: 5.2.2
libpve-http-server-perl: 5.2.2
libpve-network-perl: 0.11.2
libpve-rs-perl: 0.9.4
libpve-storage-perl: 8.3.6
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.6.0-2
proxmox-backup-client: 3.4.1-1
proxmox-backup-file-restore: 3.4.1-1
proxmox-firewall: 0.7.1
proxmox-kernel-helper: 8.1.1
proxmox-mail-forward: 0.3.2
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.7
proxmox-widget-toolkit: 4.3.11
pve-cluster: 8.1.0
pve-container: 5.2.6
pve-docs: 8.4.0
pve-edk2-firmware: not correctly installed
pve-esxi-import-tools: 0.7.4
pve-firewall: 5.1.1
pve-firmware: 3.15-4
pve-ha-manager: 4.0.7
pve-i18n: 3.4.4
pve-qemu-kvm: 9.2.0-5
pve-xtermjs: 5.5.0-2
qemu-server: 8.3.12
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.7-pve2
Code:
ceph -v
ceph version 19.2.1 (c783d93f19f71de89042abf6023076899b42259d) squid (stable)

Would you guys help me out with this? I'm surely not the only person on the Internet struggling with this.

Thanks.
P.S
I love proxmox!
 
Last edited: