Missing information about creating 2nd ring in Ceph in documentation, and impossibility to create 2nd ring in Ceph.

logiczny

Member
Mar 10, 2021
12
1
23
67
Hi,
I run in my homelab a 3xnode cluster with ceph, 3xmons, 3xmgr, 3xmds. I run it for about ~3 years now.
Yesterday I've installed fresh 3x nodes and migrated my cluster from old nodes to a 3x new one, which are identical (GMKTEC M5 Plus, 24GB RAM each, beautiful devices, works like a charm so far).
Every node have 2xNICs (I'm using also VLANs, but it doesn't matter for this case).
1st subnet is public 192.168.66.0/24 and second, cluster one is 192.168.90.0/24, created exclusively for corosync and ceph traffic.
To this day everything worked like it should using only public subnet, but I've decided to add a second ring on cluster subnet to unload some traffic from the single NIC and start using second NIC as a cluster one. And the hell started...

/etc/hosts file on every node looks the same.
Each node can ping each other on both subnets and I can ssh beetwen them with no problem. I've successfully added 2nd ring (192.168.90.0/24) to the corosync and can see (with tcpdump) a traffic on the cluster subnet on ports: 6800, 53394 - And I can see that in the proxmox webgui (screenshot in a spoiler below).


The problem:

But, for some reason, I'm unable to find official documentation for Proxmox's Ceph how to add 2nd ring and I'm failing to do so by trial&error approach editing /etc/pve/ceph.conf in various ways, with no success.
I was able to find official documentation for "the non-proxmox, offical one" Ceph (URL), but it's states that I should add cluster_addr to each of the [ osd.x ] in the /etc/ceph/ceph.conf file (if I'm not mistaken), which is missing in proxmox, but this is normal AFAIK.

In proxmox, we have only /etc/pve/ceph.conf which do not contains any of the [osd.x] sections.
So I added a cluster_network in the [global] section and cluster_addr to each [mon.x] . I've tried first without adding cluster_addr to [mon.x] with no success.
Every config I've tried was followed by at least restart of an OSDs or by a full reboot, but Ceph won't use cluster subnet no matter what.
If I disconnect the public NIC by taking the RJ45 out, in the webgui I can see my node as online (green icon near the name), like a corosync knows what going on (also I confirmed by looking at the journalctl and seeing second ring's address being used ), but in the ceph tab (webgui) I see just a failed host and OSD down (2 nodes working, 1 failed) - so it's clearly not communicating properly via a ceph's cluster ring.

My configs looks like this:
1749151305274.png
Code:
/etc/hosts:
127.0.0.1       localhost

192.168.66.10   xxx
192.168.66.16   p1
192.168.66.17   p2
192.168.66.18   p3

192.168.90.16   p1-cluster
192.168.90.17   p2-cluster
192.168.90.18   p3-cluster
(please ignore 192.168.66.10 - it's usually off and have qorum vote set to 0, it's not a part of the problem in any way)

Code:
/etc/pve/corosync.conf:
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: outrunator
    nodeid: 2
    quorum_votes: 0
    ring0_addr: 192.168.66.10
    ring1_addr: 192.168.90.10
  }
  node {
    name: p1
    nodeid: 4
    quorum_votes: 1
    ring0_addr: 192.168.66.16
    ring1_addr: 192.168.90.16
  }
  node {
    name: p2
    nodeid: 6
    quorum_votes: 1
    ring0_addr: 192.168.66.17
    ring1_addr: 192.168.90.17
  }
  node {
    name: p3
    nodeid: 7
    quorum_votes: 1
    ring0_addr: 192.168.66.18
    ring1_addr: 192.168.90.18
  }
}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: cluster1
  config_version: 34
  interface {
    linknumber: 0
  }
  interface {
    linknumber: 1
  }
  ip_version: ipv4
  link_mode: passive
  secauth: off
  token: 10000
  version: 2
}

Code:
/etc/pve/ceph.conf:
[global]
        auth_client_required = cephx
        auth_cluster_required = cephx
        auth_service_required = cephx
        cluster_network = 192.168.90.0/24
        fsid = xxxyyyzzz
        mon_allow_pool_delete = true
        mon_host = 192.168.66.18 192.168.66.16 192.168.66.17
        mon_max_pg_per_osd = 300
        ms_bind_ipv4 = true
        ms_bind_ipv6 = false
        osd_pool_default_min_size = 2
        osd_pool_default_size = 3
        public_network = 192.168.66.0/24

[client]
        keyring = /etc/pve/priv/$cluster.$name.keyring

[client.crash]
        keyring = /etc/pve/ceph/$cluster.$name.keyring

[mds]
        keyring = /var/lib/ceph/mds/ceph-$id/keyring

[mds.outrunator]
        host = outrunator
        mds_standby_for_name = pve

[mds.p1]
        host = p1
        mds_standby_for_name = pve

[mds.p2]
        host = p2
        mds_standby_for_name = pve

[mds.p3]
        host = p3
        mds_standby_for_name = pve

[mon.p1]
        public_addr = 192.168.66.16
        cluster_addr = 192.168.90.16

[mon.p2]
        public_addr = 192.168.66.17
        cluster_addr = 192.168.90.17

[mon.p3]
        public_addr = 192.168.66.18
        cluster_addr = 192.168.90.18

Code:
ceph mon dump:
epoch 32
fsid xxxyyyzzz
last_changed 2025-06-05T17:21:42.655722+0200
created 2022-06-16T11:48:31.723602+0200
min_mon_release 19 (squid)
election_strategy: 1
0: [v2:192.168.66.18:3300/0,v1:192.168.66.18:6789/0] mon.p3
1: [v2:192.168.66.16:3300/0,v1:192.168.66.16:6789/0] mon.p1
2: [v2:192.168.66.17:3300/0,v1:192.168.66.17:6789/0] mon.p2
Code:
pvecm status
Cluster information
-------------------
Name:             cluster1
Config Version:   34
Transport:        knet
Secure auth:      off

Quorum information
------------------
Date:             Thu Jun  5 21:01:55 2025
Quorum provider:  corosync_votequorum
Nodes:            3
Node ID:          0x00000007
Ring ID:          4.cd2b
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   3
Highest expected: 3
Total votes:      3
Quorum:           2
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000004          1 192.168.66.16
0x00000006          1 192.168.66.17
0x00000007          1 192.168.66.18 (local)
Code:
pveversion -v
proxmox-ve: 8.4.0 (running kernel: 6.8.12-11-pve)
pve-manager: 8.4.1 (running version: 8.4.1/2a5fa54a8503f96d)
proxmox-kernel-helper: 8.1.1
proxmox-kernel-6.8.12-11-pve-signed: 6.8.12-11
proxmox-kernel-6.8: 6.8.12-11
amd64-microcode: 3.20240820.1~deb12u1
ceph: 19.2.1-pve3
ceph-fuse: 19.2.1-pve3
corosync: 3.1.9-pve1
criu: 3.17.1-2+deb12u1
glusterfs-client: 10.3-5
ifupdown: residual config
ifupdown2: 3.2.0-1+pmx11
libjs-extjs: 7.0.0-5
libknet1: 1.30-pve2
libproxmox-acme-perl: 1.6.0
libproxmox-backup-qemu0: 1.5.1
libproxmox-rs-perl: 0.3.5
libpve-access-control: 8.2.2
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.1.0
libpve-cluster-perl: 8.1.0
libpve-common-perl: 8.3.1
libpve-guest-common-perl: 5.2.2
libpve-http-server-perl: 5.2.2
libpve-network-perl: 0.11.2
libpve-rs-perl: 0.9.4
libpve-storage-perl: 8.3.6
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.6.0-2
proxmox-backup-client: 3.4.1-1
proxmox-backup-file-restore: 3.4.1-1
proxmox-firewall: 0.7.1
proxmox-kernel-helper: 8.1.1
proxmox-mail-forward: 0.3.2
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.7
proxmox-widget-toolkit: 4.3.11
pve-cluster: 8.1.0
pve-container: 5.2.6
pve-docs: 8.4.0
pve-edk2-firmware: not correctly installed
pve-esxi-import-tools: 0.7.4
pve-firewall: 5.1.1
pve-firmware: 3.15-4
pve-ha-manager: 4.0.7
pve-i18n: 3.4.4
pve-qemu-kvm: 9.2.0-5
pve-xtermjs: 5.5.0-2
qemu-server: 8.3.12
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.7-pve2
Code:
ceph -v
ceph version 19.2.1 (c783d93f19f71de89042abf6023076899b42259d) squid (stable)

Would you guys help me out with this? I'm surely not the only person on the Internet struggling with this.

Thanks.
P.S
I love proxmox!
 
Last edited:
Hello


But, for some reason, I'm unable to find official documentation for Proxmox's Ceph how to add 2nd ring and I'm failing to do so by trial&error approach editing /etc/pve/ceph.conf in various ways, with no success.

This is not possible. If you want redundancy you need to implement it on the network layer, e.g. using a linux bond instead of directly using a NIC to define the Ceph networks. You can only add different links/rings to Corosync.

Regarding the public and cluster Ceph networks, please take a look at Ceph's documentation at [1]. The public network is the network where Ceph daemons will talk to clients (e.g. VMs) and the cluster network will be used for internal OSD communication. You can verify this with `ss -tulpn` which will tell you on which addresses : port each program is listening to. I would not suggest to manually set per-OSD network settings, and only set the global cluster_network and public_network.

I would recommend to not change the Ceph network settings blindly as monitors and OSDs keep these values in their persistent state and it won't change just because the config changed.


[1] https://docs.ceph.com/en/latest/rados/configuration/network-config-ref/
 
Last edited:
  • Like
Reactions: gurubert
Thanks for response, I finally settled with with bond (802.3ad, LACP) on every node and it's not the best solution, but enough for now.
BTW if I'm using bond, is there any benefit from cluster and public network on VLANs? Or just remove cluster VLAN and leave the public alone?
 
Last edited:
I finally settled with with bond (802.3ad, LACP) on every node and it's not the best solution, but enough for now.

Why would you say it is not the best solution? Note that Ceph docs [1] suggest to either use a transmit hash policy of either 2+3 or 3+4.

BTW if I'm using bond, is there any benefit from cluster and public network on VLANs? Or just remove cluster VLAN and leave the public alone?

We generally recommend to use the same network (if the NIC has enough bandwidth for Ceph) due to the simplicity of the setup. If you need to separate the traffic due to some constrain (e.g. if the public network can be reached from the outside) then using different VLANs is OK too.

[1] https://docs.ceph.com/en/latest/rados/configuration/network-config-ref/