Hi,
I run in my homelab a 3xnode cluster with ceph, 3xmons, 3xmgr, 3xmds. I run it for about ~3 years now.
Yesterday I've installed fresh 3x nodes and migrated my cluster from old nodes to a 3x new one, which are identical (GMKTEC M5 Plus, 24GB RAM each, beautiful devices, works like a charm so far).
Every node have 2xNICs (I'm using also VLANs, but it doesn't matter for this case).
1st subnet is public 192.168.66.0/24 and second, cluster one is 192.168.90.0/24, created exclusively for corosync and ceph traffic.
To this day everything worked like it should using only public subnet, but I've decided to add a second ring on cluster subnet to unload some traffic from the single NIC and start using second NIC as a cluster one. And the hell started...
Each node can ping each other on both subnets and I can ssh beetwen them with no problem. I've successfully added 2nd ring (
But, for some reason, I'm unable to find official documentation for Proxmox's Ceph how to add 2nd ring and I'm failing to do so by trial&error approach editing
I was able to find official documentation for "the non-proxmox, offical one" Ceph (URL), but it's states that I should add
In proxmox, we have only
So I added a
Every config I've tried was followed by at least restart of an OSDs or by a full reboot, but Ceph won't use cluster subnet no matter what.
If I disconnect the public NIC by taking the RJ45 out, in the webgui I can see my node as online (green icon near the name), like a corosync knows what going on (also I confirmed by looking at the journalctl and seeing second ring's address being used ), but in the ceph tab (webgui) I see just a failed host and OSD down (2 nodes working, 1 failed) - so it's clearly not communicating properly via a ceph's cluster ring.
My configs looks like this:

(please ignore 192.168.66.10 - it's usually off and have qorum vote set to 0, it's not a part of the problem in any way)
Would you guys help me out with this? I'm surely not the only person on the Internet struggling with this.
Thanks.
P.S
I love proxmox!
I run in my homelab a 3xnode cluster with ceph, 3xmons, 3xmgr, 3xmds. I run it for about ~3 years now.
Yesterday I've installed fresh 3x nodes and migrated my cluster from old nodes to a 3x new one, which are identical (GMKTEC M5 Plus, 24GB RAM each, beautiful devices, works like a charm so far).
Every node have 2xNICs (I'm using also VLANs, but it doesn't matter for this case).
1st subnet is public 192.168.66.0/24 and second, cluster one is 192.168.90.0/24, created exclusively for corosync and ceph traffic.
To this day everything worked like it should using only public subnet, but I've decided to add a second ring on cluster subnet to unload some traffic from the single NIC and start using second NIC as a cluster one. And the hell started...
/etc/hosts
file on every node looks the same.Each node can ping each other on both subnets and I can ssh beetwen them with no problem. I've successfully added 2nd ring (
192.168.90.0/24
) to the corosync and can see (with tcpdump
) a traffic on the cluster subnet on ports: 6800, 53394
- And I can see that in the proxmox webgui (screenshot in a spoiler below).The problem:
But, for some reason, I'm unable to find official documentation for Proxmox's Ceph how to add 2nd ring and I'm failing to do so by trial&error approach editing
/etc/pve/ceph.conf
in various ways, with no success.I was able to find official documentation for "the non-proxmox, offical one" Ceph (URL), but it's states that I should add
cluster_addr
to each of the [ osd.x ]
in the /etc/ceph/ceph.conf
file (if I'm not mistaken), which is missing in proxmox, but this is normal AFAIK.In proxmox, we have only
/etc/pve/ceph.conf
which do not contains any of the [osd.x]
sections.So I added a
cluster_network
in the [global]
section and cluster_addr
to each [mon.x]
. I've tried first without adding cluster_addr
to [mon.x]
with no success.Every config I've tried was followed by at least restart of an OSDs or by a full reboot, but Ceph won't use cluster subnet no matter what.
If I disconnect the public NIC by taking the RJ45 out, in the webgui I can see my node as online (green icon near the name), like a corosync knows what going on (also I confirmed by looking at the journalctl and seeing second ring's address being used ), but in the ceph tab (webgui) I see just a failed host and OSD down (2 nodes working, 1 failed) - so it's clearly not communicating properly via a ceph's cluster ring.
My configs looks like this:

Code:
/etc/hosts:
127.0.0.1 localhost
192.168.66.10 xxx
192.168.66.16 p1
192.168.66.17 p2
192.168.66.18 p3
192.168.90.16 p1-cluster
192.168.90.17 p2-cluster
192.168.90.18 p3-cluster
Code:
/etc/pve/corosync.conf:
logging {
debug: off
to_syslog: yes
}
nodelist {
node {
name: outrunator
nodeid: 2
quorum_votes: 0
ring0_addr: 192.168.66.10
ring1_addr: 192.168.90.10
}
node {
name: p1
nodeid: 4
quorum_votes: 1
ring0_addr: 192.168.66.16
ring1_addr: 192.168.90.16
}
node {
name: p2
nodeid: 6
quorum_votes: 1
ring0_addr: 192.168.66.17
ring1_addr: 192.168.90.17
}
node {
name: p3
nodeid: 7
quorum_votes: 1
ring0_addr: 192.168.66.18
ring1_addr: 192.168.90.18
}
}
quorum {
provider: corosync_votequorum
}
totem {
cluster_name: cluster1
config_version: 34
interface {
linknumber: 0
}
interface {
linknumber: 1
}
ip_version: ipv4
link_mode: passive
secauth: off
token: 10000
version: 2
}
Code:
/etc/pve/ceph.conf:
[global]
auth_client_required = cephx
auth_cluster_required = cephx
auth_service_required = cephx
cluster_network = 192.168.90.0/24
fsid = xxxyyyzzz
mon_allow_pool_delete = true
mon_host = 192.168.66.18 192.168.66.16 192.168.66.17
mon_max_pg_per_osd = 300
ms_bind_ipv4 = true
ms_bind_ipv6 = false
osd_pool_default_min_size = 2
osd_pool_default_size = 3
public_network = 192.168.66.0/24
[client]
keyring = /etc/pve/priv/$cluster.$name.keyring
[client.crash]
keyring = /etc/pve/ceph/$cluster.$name.keyring
[mds]
keyring = /var/lib/ceph/mds/ceph-$id/keyring
[mds.outrunator]
host = outrunator
mds_standby_for_name = pve
[mds.p1]
host = p1
mds_standby_for_name = pve
[mds.p2]
host = p2
mds_standby_for_name = pve
[mds.p3]
host = p3
mds_standby_for_name = pve
[mon.p1]
public_addr = 192.168.66.16
cluster_addr = 192.168.90.16
[mon.p2]
public_addr = 192.168.66.17
cluster_addr = 192.168.90.17
[mon.p3]
public_addr = 192.168.66.18
cluster_addr = 192.168.90.18
Code:
ceph mon dump:
epoch 32
fsid xxxyyyzzz
last_changed 2025-06-05T17:21:42.655722+0200
created 2022-06-16T11:48:31.723602+0200
min_mon_release 19 (squid)
election_strategy: 1
0: [v2:192.168.66.18:3300/0,v1:192.168.66.18:6789/0] mon.p3
1: [v2:192.168.66.16:3300/0,v1:192.168.66.16:6789/0] mon.p1
2: [v2:192.168.66.17:3300/0,v1:192.168.66.17:6789/0] mon.p2
Code:
pvecm status
Cluster information
-------------------
Name: cluster1
Config Version: 34
Transport: knet
Secure auth: off
Quorum information
------------------
Date: Thu Jun 5 21:01:55 2025
Quorum provider: corosync_votequorum
Nodes: 3
Node ID: 0x00000007
Ring ID: 4.cd2b
Quorate: Yes
Votequorum information
----------------------
Expected votes: 3
Highest expected: 3
Total votes: 3
Quorum: 2
Flags: Quorate
Membership information
----------------------
Nodeid Votes Name
0x00000004 1 192.168.66.16
0x00000006 1 192.168.66.17
0x00000007 1 192.168.66.18 (local)
Code:
pveversion -v
proxmox-ve: 8.4.0 (running kernel: 6.8.12-11-pve)
pve-manager: 8.4.1 (running version: 8.4.1/2a5fa54a8503f96d)
proxmox-kernel-helper: 8.1.1
proxmox-kernel-6.8.12-11-pve-signed: 6.8.12-11
proxmox-kernel-6.8: 6.8.12-11
amd64-microcode: 3.20240820.1~deb12u1
ceph: 19.2.1-pve3
ceph-fuse: 19.2.1-pve3
corosync: 3.1.9-pve1
criu: 3.17.1-2+deb12u1
glusterfs-client: 10.3-5
ifupdown: residual config
ifupdown2: 3.2.0-1+pmx11
libjs-extjs: 7.0.0-5
libknet1: 1.30-pve2
libproxmox-acme-perl: 1.6.0
libproxmox-backup-qemu0: 1.5.1
libproxmox-rs-perl: 0.3.5
libpve-access-control: 8.2.2
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.1.0
libpve-cluster-perl: 8.1.0
libpve-common-perl: 8.3.1
libpve-guest-common-perl: 5.2.2
libpve-http-server-perl: 5.2.2
libpve-network-perl: 0.11.2
libpve-rs-perl: 0.9.4
libpve-storage-perl: 8.3.6
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.6.0-2
proxmox-backup-client: 3.4.1-1
proxmox-backup-file-restore: 3.4.1-1
proxmox-firewall: 0.7.1
proxmox-kernel-helper: 8.1.1
proxmox-mail-forward: 0.3.2
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.7
proxmox-widget-toolkit: 4.3.11
pve-cluster: 8.1.0
pve-container: 5.2.6
pve-docs: 8.4.0
pve-edk2-firmware: not correctly installed
pve-esxi-import-tools: 0.7.4
pve-firewall: 5.1.1
pve-firmware: 3.15-4
pve-ha-manager: 4.0.7
pve-i18n: 3.4.4
pve-qemu-kvm: 9.2.0-5
pve-xtermjs: 5.5.0-2
qemu-server: 8.3.12
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.7-pve2
Code:
ceph -v
ceph version 19.2.1 (c783d93f19f71de89042abf6023076899b42259d) squid (stable)
Would you guys help me out with this? I'm surely not the only person on the Internet struggling with this.
Thanks.
P.S
I love proxmox!
Last edited: