OpenFabric: Could not find two T0 routers log spam after migration? (And VM access question)

telvenes

Well-Known Member
Oct 5, 2020
36
2
48
36
Hello everyone,

I successfully migrated my cluster last week from a manually configured BGP full mesh (using FRR) to the new Proxmox 9 SDN openfabric full mesh.

Initial Setup Note: My first problem was that the /etc/frr/daemon file was not updated correctly by the Proxmox SDN installation. This caused fabricd to fail. I solved this by completely purging frr (apt purge frr), deleting all files under /etc/frr, and then reinstalling frr. This fixed the initial setup issues, and the SDN apply process then worked correctly.

Problem 1: Log Spam

After the setup, my cluster is working well, and all nodes are peering correctly. However, the system logs on all nodes are being spammed with the following message every few minutes:
Code:
Nov 10 10:31:07 pve04 fabricd[1962]: [QBAZ6-3YZR3] OpenFabric: Could not find two T0 routers
Nov 10 10:36:03 pve04 fabricd[1962]: [QBAZ6-3YZR3] OpenFabric: Could not find two T0 routers
Nov 10 10:37:57 pve04 fabricd[1962]: [QBAZ6-3YZR3] OpenFabric: Could not find two T0 routers
Question 1: Everything seems to be working perfectly. What does this warning mean? Is it a problem in a full mesh topology, or can it be safely ignored?

Problem 2: VM Networking

Question 2: Now that the SDN openfabric is configured via the GUI, is it simpler to connect a VM directly to this fabric?

I am running K3s nodes inside VMs, and they use ceph-csi for storage. I would love to give these K3s nodes direct access to the openfabric network (which is my Ceph network) to improve storage performance.

What is the recommended way to "bridge" a VM's vNIC to the openfabric zone?

Thank you for your help!
 
I noticed that I get about double the amount of spam for this, I have both Full Mesh Ceph and Full Mesh Cluster networks in my case so it seems to generate this event for each fabric you create see:

vtysh -c "show openfabric topology"

Code:
Area ceph:
IS-IS paths to level-2 routers that speak IP
 Vertex         Type         Metric  Next-Hop       Interface  Parent            
 --------------------------------------------------------------------------------
 node-a                                                                   
 10.15.15.1/32  IP internal  0                          node-a(4)  
 node-b         TE-IS        10      node-b  mlx0       node-a(4)  
 node-c         TE-IS        10      node-c  mlx1       node-a(4)  
 10.15.15.2/32  IP TE        20      node-b  mlx0       node-b(4)  
 10.15.15.3/32  IP TE        20      node-c  mlx1       node-c(4)

Code:
Area cluster:
IS-IS paths to level-2 routers that speak IP
 Vertex         Type         Metric  Next-Hop       Interface  Parent            
 --------------------------------------------------------------------------------
 node-a                                                                   
 10.14.14.1/32  IP internal  0                                 node-a(4)  
 node-b         TE-IS        10      node-b          cls0    node-a(4)  
 node-c         TE-IS        10      node-c          cls1    node-a(4)  
 10.14.14.2/32  IP TE        20      node-b         cls0    node-b(4)  
 10.14.14.3/32  IP TE        20      node-c          cls1    node-c(4)

So far I've been running this cluster in production at a factory and haven't had any cause for concern but there is very little in terms of actual troubleshooting online I could find about this.

I can't answer your question about bridging a NIC to the openfabric network as I don't want my VMs on the Ceph or Cluster network in my instance sorry.