Working SDN completely broke after applying a change

blackline

New Member
Dec 20, 2023
17
5
3
Hi, we have 2 PVE clusters in prod
  1. mother cluster with 3 nodes
  2. worker cluster with 3 nodes
They are more or less identical, have similar SDN configurations etc.

Today we made an addition to the SDN configs on both clusters. The mother cluster did fine, but for the worker cluster all vlan based communication (the ones defined in SDN) stopped working! The only thing that helped was
  1. create a new vlan with on each node
  2. create a new vmbr pointing to the new vlan
  3. reconfigure 200 VMs to the new vmbr
We are going to not use SDN moving forward, this is a serious bug and we can't trust it in this state.

If there are things we could do to help debug this, I am all ears.
Is anyone else using SDN in production?
 
Last edited: