Do you have the task log for the SRV Reload Networking tasks on hosts where you know the issue appeared after reloading?
I don't have the full log, but I did note that only output that seemed important was that all or most hosts where reporting the following. The problem seemed to happen as each host refreshed.
Code:
vrf_BSDnet : warning: vrf_BSDnet: post-up cmd 'ip route del vrf vrf_BSDnet unreachable default metric 4278198272' failed: returned 2 (RTNETLINK answers: No such process
From the VM, I pinged the default gateway. Communications resumed until 10 seconds after stopping the ping. During this test, there is no /32 route in the kernel or bgp routing tables. All of our other VMs have /32 routes in both places.
No /32 routes also during the ping? The issue usually is that silent VMs do not get entries in ip neigh or they expire after awhile. Do the other hosts continously send traffic and therefore not run into this issue perhaps? It might be that after applying the configuration, FRR loses the routes momentarily after applying the configuration and FRR gets restarted. If the VMs do not send traffic after, it is possible that the EVPN routes never get created.
When I was preparing this reply I discovered I'd made an error with this prior test. I'd pinged the subnet IP rather than the gateway. Pinging the default gateway from the broken guest does not impact the problem. I did find that pinging any unused IP on the subnet does allow communications until about 10 seconds after the pings stop.
I've done a bunch of digging through a couple lines of thought, gathered a lot of information, and came to some conclusions/assumptions. I'm not sure of the best way to present it all, so I'll start with the conclusions/assumptions, go on to what I did and found, then end with a potentially relevent configuration information. I hope it isn't too confusing or information overload.
Conclusion/Assumption: Whatever process is responsible for synchronizing the kernel arp table into the evpn, upon sdn refresh, starts ignoring VMs that existed at the time of the refresh. The problem is not always triggered when refreshing the SDN configuration. VMs created or migrated to the host after the refresh are not affected.
Reason: The output from "ip neighbor" shows the VM's IP address. The output from "show bgp l2vpn evpn" does not have a matching entry. I duplicated the affected VM and started it up on the same host. It works properly.
Troubleshooting track #1:
I previously mentioned that the VM with communications issues was receiving, and replying to arp requests from the gateway at a rate of 1/s or faster. I was able to determine the reason for this.
We have "advertise subnets" enabled in our configuration. This was done to ease troubleshooting, so pings and traceroutes for unused IPs in allocated ranges reach the PVE environment. It also allows the normal arp process to find quiet VMs.
Our configuration has all nodes but one are configured as an exit and entry node.
When pinging our broken VM from a workstation external to the PVE infrastructure, we found that the following happened:
Ping received by pmhost-dsc-8
pmhost-dsc-8 doesn't have an arp entry for the IP, so it generated an arp request from the subnet gateway IP
the arp request is broadcast to all SDN zone members
pmhost-cc-1 recieves the arp request, forwards it to the VM
VM replies to arp
pmhost-cc-1 receives the arp reply and processes it by updating the entry in the kernel
the arp reply is not forwarded to the originator
pmhost-dsc-8 didn't get the arp reply therefore responds to the external IP with destination host unreachable
As long as traffic is received by any host for the broken vm, this process will repeat, causing arp requests to created once per second.
Troubleshooting track #2:
What the arp and routing tables look like for the broken VM when it's doing nothing and when it's pinging an unknown ip on the local subnet.
In the output from these commands, I filtered it down to the following IPs:
10.7.1.96/27: The subnet the VM is connected to
10.7.1.97: The gateway
10.7.1.103: A clone of the broken vm, on the same host, that was created after the problem started. This vm works correctly.
10.7.1.106: The broken vm
10.7.1.120: Another VM on the same subnet, but different host. It works correctly.
From the broken host. I'm attempting to ping the vm from outside of PVE. The VM is is receiving and replying to arp requests at least once per second.
Code:
# ip route|grep 10.7.1
10.7.1.96/27 nhid 9480 dev buildup proto bgp metric 20
10.7.1.120 nhid 20098 via 10.3.150.106 dev vrfbr_BSDnet proto bgp metric 20 onlink
# ip nei|grep 10.7.1
10.7.1.120 dev buildup lladdr bc:24:11:97:a6:d9 extern_learn NOARP proto zebra
10.7.1.106 dev buildup lladdr bc:24:11:68:3a:36 REACHABLE
10.7.1.103 dev buildup lladdr bc:24:11:7b:db:e9 REACHABLE
# vtysh -c "show ip route" |fgrep 10.7.1
B>* 10.7.1.96/27 [20/0] is directly connected, buildup (vrf vrf_BSDnet), weight 1, 5d02h08m
B>* 10.7.1.120/32 [200/0] via 10.3.150.106, vrfbr_BSDnet (vrf vrf_BSDnet) onlink, weight 1, 05:05:36
# vtysh -c "show bgp l2vpn evpn" |egrep '68:3a:36|7b:db:e9|10\.7\.1\.'
*>i[2]:[0]:[48]:[bc:24:11:97:a6:d9]:[32]:[10.7.1.120]
*> [2]:[0]:[48]:[bc:24:11:7b:db:e9]
*> [2]:[0]:[48]:[bc:24:11:7b:db:e9]:[32]:[10.7.1.103]
*>i[5]:[0]:[27]:[10.7.1.96]
From the host where the pings are entering the SDN from the external network. The VM is otherwise not doing anything.
Code:
# ip route|grep 10.7.1
10.7.1.96/27 nhid 440 dev buildup proto bgp metric 20
10.7.1.103 nhid 34689 via 10.6.150.101 dev vrfbr_BSDnet proto bgp metric 20 onlink
10.7.1.120 nhid 943 via 10.3.150.106 dev vrfbr_BSDnet proto bgp metric 20 onlink
# ip nei|grep 10.7.1
10.7.1.120 dev buildup lladdr bc:24:11:97:a6:d9 extern_learn NOARP proto zebra
10.7.1.103 dev buildup lladdr bc:24:11:7b:db:e9 extern_learn NOARP proto zebra
10.7.1.106 dev buildup INCOMPLETE
# vtysh -c "show ip route" |fgrep 10.7.1
B>* 10.7.1.96/27 [20/0] is directly connected, buildup (vrf vrf_BSDnet), weight 1, 4d03h39m
B>* 10.7.1.103/32 [200/0] via 10.6.150.101, vrfbr_BSDnet (vrf vrf_BSDnet) onlink, weight 1, 00:13:30
B>* 10.7.1.120/32 [200/0] via 10.3.150.106, vrfbr_BSDnet (vrf vrf_BSDnet) onlink, weight 1, 05:10:10
# vtysh -c "show bgp l2vpn evpn" |egrep '68:3a:36|7b:db:e9|10\.7\.1\.'
*>i[2]:[0]:[48]:[bc:24:11:97:a6:d9]:[32]:[10.7.1.120]
*>i[2]:[0]:[48]:[bc:24:11:7b:db:e9]
*>i[2]:[0]:[48]:[bc:24:11:7b:db:e9]:[32]:[10.7.1.103]
*>i[5]:[0]:[27]:[10.7.1.96]
From the broken host, while the VM is pinging an unknown IP on the local subnet.
Code:
# ip route|grep 10.7.1
10.7.1.96/27 nhid 9480 dev buildup proto bgp metric 20
10.7.1.120 nhid 20098 via 10.3.150.106 dev vrfbr_BSDnet proto bgp metric 20 onlink
# ip nei|grep 10.7.1
10.7.1.120 dev buildup lladdr bc:24:11:97:a6:d9 extern_learn NOARP proto zebra
10.7.1.106 dev buildup lladdr bc:24:11:68:3a:36 REACHABLE
10.7.1.103 dev buildup lladdr bc:24:11:7b:db:e9 REACHABLE
# vtysh -c "show ip route" |fgrep 10.7.1
B>* 10.7.1.96/27 [20/0] is directly connected, buildup (vrf vrf_BSDnet), weight 1, 4d07h28m
B>* 10.7.1.103/32 [200/0] via 10.6.150.101, vrfbr_BSDnet (vrf vrf_BSDnet) onlink, weight 1, 04:01:59
B>* 10.7.1.120/32 [200/0] via 10.3.150.106, vrfbr_BSDnet (vrf vrf_BSDnet) onlink, weight 1, 08:58:39
# vtysh -c "show bgp l2vpn evpn" |egrep '68:3a:36|7b:db:e9|10\.7\.1\.'
*>i[2]:[0]:[48]:[bc:24:11:97:a6:d9]:[32]:[10.7.1.120]
*> [2]:[0]:[48]:[bc:24:11:7b:db:e9]
*> [2]:[0]:[48]:[bc:24:11:7b:db:e9]:[32]:[10.7.1.103]
*>i[5]:[0]:[27]:[10.7.1.96]
From the host where the pings are entering the SDN from the external network, while the VM is pinging an unknown IP on the local subnet.
Code:
# ip route|grep 10.7.1
10.7.1.96/27 nhid 440 dev buildup proto bgp metric 20
10.7.1.103 nhid 34689 via 10.6.150.101 dev vrfbr_BSDnet proto bgp metric 20 onlink
10.7.1.120 nhid 943 via 10.3.150.106 dev vrfbr_BSDnet proto bgp metric 20 onlink
# ip nei|grep 10.7.1
10.7.1.120 dev buildup lladdr bc:24:11:97:a6:d9 extern_learn NOARP proto zebra
10.7.1.103 dev buildup lladdr bc:24:11:7b:db:e9 extern_learn NOARP proto zebra
10.7.1.106 dev buildup FAILED
### The entry for 10.7.1.106 cycles between DELAY, PROBE, FAILED, and INCOMPLETE
# vtysh -c "show ip route" |fgrep 10.7.1
B>* 10.7.1.96/27 [20/0] is directly connected, buildup (vrf vrf_BSDnet), weight 1, 4d07h34m
B>* 10.7.1.103/32 [200/0] via 10.6.150.101, vrfbr_BSDnet (vrf vrf_BSDnet) onlink, weight 1, 04:08:23
B>* 10.7.1.120/32 [200/0] via 10.3.150.106, vrfbr_BSDnet (vrf vrf_BSDnet) onlink, weight 1, 09:05:03
# vtysh -c "show bgp l2vpn evpn" |egrep '68:3a:36|7b:db:e9|10\.7\.1\.'
*>i[2]:[0]:[48]:[bc:24:11:97:a6:d9]:[32]:[10.7.1.120]
*>i[2]:[0]:[48]:[bc:24:11:7b:db:e9]
*>i[2]:[0]:[48]:[bc:24:11:7b:db:e9]:[32]:[10.7.1.103]
*>i[5]:[0]:[27]:[10.7.1.96]
Additional configuration information:
Controllers.cfg, the evpn zone, and a representative BGP uplink
Code:
evpn: BSDnet
asn 65200
peers 10.3.150.104,10.3.150.105,10.3.150.106,10.3.150.107,10.3.150.108,10.3.150.109,10.6.150.101,10.6.150.104,10.6.150.105,10.6.150.106,10.6.150.107,10.6.150.108,10.6.150.109,10.254.30.230
bgp: bgppmhost-cc-1
asn 65200
node pmhost-cc-1
peers 10.6.150.10,10.6.150.11
bgp-multipath-as-path-relax 0
ebgp 1
zones.cfg
Code:
evpn: BSDnet
controller BSDnet
vrf-vxlan 1000000
advertise-subnets 1
exitnodes pmhost-dsc-6,pmhost-dsc-4,pmhost-cc-5,pmhost-cc-4,pmhost-dsc-9,pmhost-cc-9,pmhost-cc-6,pmhost-cc-1,pmhost-dsc-5,pmhost-dsc-8,pmhost-dsc-7,pmhost-cc-8,pmhost-cc-7
ipam pve
mac BC:24:11:A8:25:90
mtu 9148
nodes pmhost-dsc-8,pmhost-dsc-7,pmhost-cc-8,pmhost-cc-7,pmhost-cc-9,pmhost-cc-6,pmhost-cc-1,pmhost-dsc-5,pmhost-cc-4,pmhost-dsc-9,pmhost-dsc-6,pmhost-dsc-4,pmhost-cc-5,pmhost-witness
The vnet.cfg definition for the vnet used by the broken VM
Code:
vnet: buildup
zone BSDnet
alias buildup
tag 1001096
The subnet.cfg definition for the subnet used by the broken VM
Code:
subnet: BSDnet-10.7.1.96-27
vnet buildup
gateway 10.7.1.97
Thanks,
Erik