Dear Proxmox staff and forum members,
being new to PVE and high availability systems in general, I'd like to discuss a 3-node-cluster setup, that involves network redundancy by means of OpenVSwitch using RSTP and 2 switches,
and beg you to pardon possible beginner's mistakes.
Each node is configured with 3 ovs-bridges, vmbr0 for entire Ceph networking, vmbr1 for PVE-node-intercommunication and management, and vmbr2 for VLAN-tagged guest traffic.
Each of the mentioned ovs bridges consists of 2 ethernet interfaces, one connected to switch1 (nominal switch) the other one connected to switch2 (backup switch).
The RSTP enabled switches allow "grouping" of ports by untagged vlans ("port-based-vlan"), labelled as "u-VLAN" in the attached network sketch.
I'm aware of the fact, that RSTP is not VLAN-aware, but MSTP cannot be used, as OpenVSwitch is not MSTP-enabled.
The network redundancy is working for all of the nodes' vmbrX in the case of one entirely failing switch; the port state changes from forwarding to discarding and vice versa on every node's interface-pairs can be nicely observed, when i pull one switch's power supply; so far, so good.
In case of only one failing port of a switch (/failing NIC on node/ failing link in general), those status changes again can be observed in the affected node's interfaces; this leads to a situation, in which only one node's interface swaps over to the backup switch, and the other nodes still remain connected to the nominal switch.
Thus, in practice one node isn't connected to the rest of the cluster anymore.
Does anybody have a suggestion, how i could address this problem?
Or is my approach to redundancy a hopeless/problematic one?
Could my goal of redundancy also for single link defects be achieved by using (active-backup-mode) bonds without a direct connection between the hardware switches (stacking)?
Thank you!
being new to PVE and high availability systems in general, I'd like to discuss a 3-node-cluster setup, that involves network redundancy by means of OpenVSwitch using RSTP and 2 switches,
and beg you to pardon possible beginner's mistakes.
Each node is configured with 3 ovs-bridges, vmbr0 for entire Ceph networking, vmbr1 for PVE-node-intercommunication and management, and vmbr2 for VLAN-tagged guest traffic.
Each of the mentioned ovs bridges consists of 2 ethernet interfaces, one connected to switch1 (nominal switch) the other one connected to switch2 (backup switch).
The RSTP enabled switches allow "grouping" of ports by untagged vlans ("port-based-vlan"), labelled as "u-VLAN" in the attached network sketch.
I'm aware of the fact, that RSTP is not VLAN-aware, but MSTP cannot be used, as OpenVSwitch is not MSTP-enabled.
The network redundancy is working for all of the nodes' vmbrX in the case of one entirely failing switch; the port state changes from forwarding to discarding and vice versa on every node's interface-pairs can be nicely observed, when i pull one switch's power supply; so far, so good.
In case of only one failing port of a switch (/failing NIC on node/ failing link in general), those status changes again can be observed in the affected node's interfaces; this leads to a situation, in which only one node's interface swaps over to the backup switch, and the other nodes still remain connected to the nominal switch.
Thus, in practice one node isn't connected to the rest of the cluster anymore.
Does anybody have a suggestion, how i could address this problem?
Or is my approach to redundancy a hopeless/problematic one?
Could my goal of redundancy also for single link defects be achieved by using (active-backup-mode) bonds without a direct connection between the hardware switches (stacking)?
Thank you!