Network crash during PVE cluster backups onto PBS

jaykavathe

Active Member
Feb 25, 2021
37
3
28
44
Trying to figure out why backup process crashing my network and what is better strategy for long term.

My setup for 3 node Ceph HA cluster is (2x 1G, 2x 10G):

node 1: 10.10.40.11

node 2: 10.10.40.12

node 3: 10.10.40.13

Only 3 above form the HA cluster. Each has 4 port NIC, 2 are taken by IPV6 ring, 1 is for management/uplink/internet/1 is connected to backup switch.

PBS : 10.10.40.14 added as a storage for the cluster with ip specified as 192.168.50.14 (backup network)

Backup network is physically connected to a basic Gigabit unmanaged switch with no gateway. 1 connection coming from each node + PBS. Backup network is set as 192.168.50.0 (11/12/13 and 14). I believe backup is correctly routed to go through only backup network.

Code:
#ip route show
default via 10.10.40.1 dev vmbr0 proto kernel onlink
10.10.40.0/24 dev vmbr0 proto kernel scope link src 10.10.40.11
192.168.50.0/24 dev vmbr1 proto kernel scope link src 192.168.50.11

Yet, running backups crashes the network, freezing Cisco and opnsense firewall. A reboot fixes the issue. Why this could be happening? I dont understand why Cisco needs reboot and not my cheap netgear backup switch. It feels as if that netgear switch is too dumb to even get frozen and just ignores data.

Despite separate physical backup switch, it feels like somehow backup traffic is going through cisco switch. I haven't yet put VLAN rules but I would like to understand why this is happening.

Typically what is a good practice for this kind of setup. I will be adding a few more nodes (not HA but big data servers that will push backup to same). Should I just get a decent switch for backup network? That's what I am planning anyway.

Diagram
Interface configs
 
Last edited: