Hello everyone,
I'm reaching out to the community for a design review of my version 2 network setup for a new 3-node Proxmox/Ceph hyper-converged cluster. The goal is to build a stable and high-performance configuration, overcoming the issues I faced with my first implementation.
Context and Lessons Learned from v1
The first version of the cluster was running for about two months. The Ceph replication network was also a full-mesh on 25GbE interfaces, but with a different approach:
Goal for v2: For this reason, I am redesigning the network from scratch with the goal of maximizing simplicity and reliability. I want to avoid complex configurations and, if possible, additional software like dynamic routing protocols (e.g., FRR). Redundancy for the individual 25GbE links is not a priority; stability and performance are.
Hardware and Network Architecture v2
- [ ] Rete **Ceph Public** su `vmbr0` → `172.16.10.0/24`
- [ ] Rete **Management + VM** su `vmbr1` → `192.168.170.0/24` (VLAN 174)
- [ ] Rete **Ceph Cluster (replica)** on interface 25 Gb:
- `10.10.10.0/30`, `10.10.11.0/30`, `10.10.12.0/30`
- [ ] MTU 9000 all interface

Below is the complete network configuration I plan to apply to each node.
/etc/network/interfaces file on pve1:
/etc/network/interfaces file on pve2:
/etc/network/interfaces file on pve3:
Ceph Configuration (ceph.conf)
Consequently, the ceph.conf file would be configured as follows:
Questions for the Community
I'm reaching out to the community for a design review of my version 2 network setup for a new 3-node Proxmox/Ceph hyper-converged cluster. The goal is to build a stable and high-performance configuration, overcoming the issues I faced with my first implementation.
Context and Lessons Learned from v1
The first version of the cluster was running for about two months. The Ceph replication network was also a full-mesh on 25GbE interfaces, but with a different approach:
[]Each node had the same IP address (e.g., 10.10.10.1/32) configured on both of its 25GbE interfaces.
[]Traffic was managed using static routes to direct packets to the correct link.
Goal for v2: For this reason, I am redesigning the network from scratch with the goal of maximizing simplicity and reliability. I want to avoid complex configurations and, if possible, additional software like dynamic routing protocols (e.g., FRR). Redundancy for the individual 25GbE links is not a priority; stability and performance are.
Hardware and Network Architecture v2
Nodes:
Network schema idea:- [ ] Rete **Ceph Public** su `vmbr0` → `172.16.10.0/24`
- [ ] Rete **Management + VM** su `vmbr1` → `192.168.170.0/24` (VLAN 174)
- [ ] Rete **Ceph Cluster (replica)** on interface 25 Gb:
- `10.10.10.0/30`, `10.10.11.0/30`, `10.10.12.0/30`
- [ ] MTU 9000 all interface

The links for the Ceph replication network are direct connections between the 25GbE ports of the nodes.
Planned Network Configuration (/etc/network/interfaces)Below is the complete network configuration I plan to apply to each node.
/etc/network/interfaces file on pve1:
Code:
auto lo
iface lo inet loopback
[HEADING=2]10GbE - Ceph Public Network[/HEADING]
iface ens3f0 inet manual
mtu 9000
auto vmbr0
iface vmbr0 inet static
address 172.16.10.1/24
bridge-ports ens3f0
bridge-stp off
bridge-fd 0
mtu 9000
[HEADING=2]10GbE - Management & VM Network[/HEADING]
iface ens3f1 inet manual
auto vmbr1
iface vmbr1 inet static
address 192.168.170.250/24
gateway 192.168.170.254
bridge-ports ens3f1
bridge-stp off
bridge-fd 0
bridge-vlan-aware yes
[HEADING=2]25GbE - Ceph Cluster Network (Mesh)[/HEADING]
[HEADING=2]Link to pve2[/HEADING]
auto ens9f0np0
iface ens9f0np0 inet static
address 10.10.10.1/30
mtu 9000
[HEADING=2]Link to pve3[/HEADING]
auto ens9f1np1
iface ens9f1np1 inet static
address 10.10.11.1/30
mtu 9000
/etc/network/interfaces file on pve2:
Code:
auto lo
iface lo inet loopback
[HEADING=2]10GbE - Ceph Public Network[/HEADING]
iface ens2f0 inet manual
mtu 9000
auto vmbr0
iface vmbr0 inet static
address 172.16.10.2/24
bridge-ports ens2f0
bridge-stp off
bridge-fd 0
mtu 9000
[HEADING=2]10GbE - Management & VM Network[/HEADING]
iface ens2f1 inet manual
auto vmbr1
iface vmbr1 inet static
address 192.168.170.251/24
gateway 192.168.170.254
bridge-ports ens2f1
bridge-stp off
bridge-fd 0
bridge-vlan-aware yes
[HEADING=2]25GbE - Ceph Cluster Network (Mesh)[/HEADING]
[HEADING=2]Link to pve1[/HEADING]
auto ens9f0np0
iface ens9f0np0 inet static
address 10.10.10.2/30
mtu 9000
[HEADING=2]Link to pve3[/HEADING]
auto ens9f1np1
iface ens9f1np1 inet static
address 10.10.12.1/30
mtu 9000
/etc/network/interfaces file on pve3:
Code:
auto lo
iface lo inet loopback
[HEADING=2]10GbE - Ceph Public Network[/HEADING]
iface ens2f0 inet manual
mtu 9000
auto vmbr0
iface vmbr0 inet static
address 172.16.10.3/24
bridge-ports ens2f0
bridge-stp off
bridge-fd 0
mtu 9000
[HEADING=2]10GbE - Management & VM Network[/HEADING]
iface ens2f1 inet manual
auto vmbr1
iface vmbr1 inet static
address 192.168.170.252/24
gateway 192.168.170.254
bridge-ports ens2f1
bridge-stp off
bridge-fd 0
bridge-vlan-aware yes
[HEADING=2]25GbE - Ceph Cluster Network (Mesh)[/HEADING]
[HEADING=2]Link to pve2[/HEADING]
auto ens9f0np0
iface ens9f0np0 inet static
address 10.10.12.2/30
mtu 9000
[HEADING=2]Link to pve1[/HEADING]
auto ens9f1np1
iface ens9f1np1 inet static
address 10.10.11.2/30
mtu 9000
Ceph Configuration (ceph.conf)
Consequently, the ceph.conf file would be configured as follows:
Code:
[global]
...
public_network = 172.16.10.0/24
cluster_network = 10.10.10.0/30,10.10.11.0/30,10.10.12.0/30
...
Questions for the Community
[] v2 Design Validity: Given the failure of v1, is this new approach with separate /30 subnets for each link considered more stable and a "best practice" for a full-mesh topology? Is it the right path for the simplicity I'm aiming for?
[] Ceph's Native Routing Handling: With this setup, is Ceph able to natively handle the routing and always choose the direct link for OSD-to-OSD communication? Or could the Linux kernel still face routing issues between the two physical interfaces on the cluster_network?
[] Simple Alternatives: Are there any alternatives to this design that maintain simplicity (no additional routing software) but are equally or more performant/stable?
[] Proxmox GUI vs. Ceph Syntax: I've noticed a potential discrepancy. The official Ceph documentation states that multiple subnets in cluster_network should be separated by a comma (,). However, the Proxmox GUI (Datacenter -> Ceph -> Configuration) seems to use a space as a separator for multiple values. What is the correct syntax that is actually written to the ceph.conf file and interpreted by Ceph when using the Proxmox GUI? This might have been the root cause of my previous parsing issues.- Practical Experiences: Has anyone implemented a similar setup (with separate /30 subnets) and can confirm its stability, especially under heavy I/O loads?