EVPN SDN: errors and dropped on the vrf_evpn interface

hawat

Renowned Member
Dec 11, 2017
8
0
66
38
I'm using Virtual Environment 9.1.6 with an EVPN network at Hetzner. I've noticed that the number of errors and dropped packets on the vrf_evpn interface keeps growing, but the network is working fine or at least that's how it seems to me.
There are no errors on other interfaces. I reduced the MTU for the VM network to 1350, following Hetzner vSwitch recommendations (1400 maximum minus overhead).

Bash:
ip -s link show dev vrf_evpn
72: vrf_evpn: <NOARP,MASTER,UP,LOWER_UP> mtu 65575 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether aa:d1:34:82:55:4c brd ff:ff:ff:ff:ff:ff
    RX:    bytes   packets errors dropped  missed   mcast           
     42710078845 135885679      0       0       0       0
    TX:    bytes   packets errors dropped carrier collsns           
    104476795504 104912493 724156  724156       0       0

Is it possible that a large number of errors and dropped packets on vrf_evpn is not a problem, or do I need to fix something? Please advise me on which direction to go.
 
Hmm this is weird -- could you show me:
Code:
ip route show vrf vrf_evpn
ip neigh show vrf vrf_evpn
vtysh -c "show ip route vrf vrf_evpn"
vtysh -c "show evpn mac vni all"
vtysh -c "show evpn arp-cache vni all"

This could be just the arp requests hitting the vrf and being dropped because evpn hasn't inserted the route yet?
 
I don't have enough EVPN knowledge to figure out the problem. Here is the data you requested.

Code:
ip route show vrf vrf_evpn
10.20.20.8/29 dev mesh proto kernel scope link src 10.20.20.9
10.20.20.10 nhid 33 via 192.0.2.2 dev vrfbr_evpn proto bgp metric 20 onlink
10.190.0.0/20 dev vnet190 proto kernel scope link src 10.190.0.1
10.190.0.11 nhid 33 via 192.0.2.2 dev vrfbr_evpn proto bgp metric 20 onlink
10.190.0.20 nhid 33 via 192.0.2.2 dev vrfbr_evpn proto bgp metric 20 onlink
10.190.0.21 nhid 33 via 192.0.2.2 dev vrfbr_evpn proto bgp metric 20 onlink
10.190.0.52 nhid 33 via 192.0.2.2 dev vrfbr_evpn proto bgp metric 20 onlink
10.190.0.53 nhid 33 via 192.0.2.2 dev vrfbr_evpn proto bgp metric 20 onlink
10.190.0.59 nhid 33 via 192.0.2.2 dev vrfbr_evpn proto bgp metric 20 onlink
10.190.0.60 nhid 33 via 192.0.2.2 dev vrfbr_evpn proto bgp metric 20 onlink
10.190.0.61 nhid 33 via 192.0.2.2 dev vrfbr_evpn proto bgp metric 20 onlink
10.191.0.0/20 nhid 33 via 192.0.2.2 dev vrfbr_evpn proto bgp metric 20 onlink
192.168.10.0/24 nhid 33 via 192.0.2.2 dev vrfbr_evpn proto bgp metric 20 onlink

Code:
ip neigh show vrf vrf_evpn
10.20.20.10 dev mesh lladdr bc:24:11:57:5d:90 extern_learn NOARP proto zebra
10.190.0.12 dev vnet190 lladdr bc:24:11:41:bf:dc STALE
10.190.0.50 dev vnet190 lladdr bc:24:11:49:f3:ed REACHABLE
10.190.0.54 dev vnet190 lladdr bc:24:11:df:45:e6 DELAY
10.190.0.52 dev vnet190 lladdr bc:24:11:c3:cd:3f extern_learn NOARP proto zebra
10.190.0.58 dev vnet190 FAILED
192.0.2.2 dev vrfbr_evpn lladdr 46:24:70:b4:9a:59 extern_learn NOARP proto zebra
10.190.0.56 dev vnet190 lladdr bc:24:11:85:c7:c0 REACHABLE
10.190.0.62 dev vnet190 lladdr bc:24:11:ae:7b:86 REACHABLE
10.190.0.60 dev vnet190 lladdr bc:24:11:39:d8:b2 extern_learn NOARP proto zebra
10.190.0.21 dev vnet190 lladdr bc:24:11:86:e5:06 extern_learn NOARP proto zebra
10.190.1.201 dev vnet190 lladdr bc:24:11:79:94:10 DELAY
10.190.0.11 dev vnet190 lladdr bc:24:11:8e:e0:9b extern_learn NOARP proto zebra
10.20.20.11 dev mesh lladdr bc:24:11:8f:30:af REACHABLE
10.190.0.51 dev vnet190 lladdr bc:24:11:17:1a:bc REACHABLE
10.190.0.55 dev vnet190 lladdr bc:24:11:71:5f:c2 STALE
10.190.0.53 dev vnet190 lladdr bc:24:11:db:e2:c4 extern_learn NOARP proto zebra
10.190.0.59 dev vnet190 lladdr bc:24:11:74:8d:74 extern_learn NOARP proto zebra
10.190.0.61 dev vnet190 lladdr bc:24:11:5b:54:cf extern_learn NOARP proto zebra
10.190.0.22 dev vnet190 lladdr bc:24:11:ff:54:1f STALE
10.190.0.20 dev vnet190 lladdr bc:24:11:86:e5:06 extern_learn NOARP proto zebra
fe80::be24:11ff:feae:7b86 dev vnet190 lladdr bc:24:11:ae:7b:86 STALE
fe80::be24:11ff:fe37:1204 dev vnet190 lladdr bc:24:11:37:12:04 STALE
fe80::be24:11ff:fe05:d068 dev vnet190 lladdr bc:24:11:05:d0:68 STALE
fe80::be24:11ff:fe79:9410 dev vnet190 lladdr bc:24:11:79:94:10 STALE
fe80::be24:11ff:fe39:d8b2 dev vnet190 lladdr bc:24:11:39:d8:b2 extern_learn NOARP proto zebra
fe80::be24:11ff:fe67:bc07 dev vnet190 lladdr bc:24:11:67:bc:07 STALE
fe80::be24:11ff:fe74:8d74 dev vnet190 lladdr bc:24:11:74:8d:74 extern_learn NOARP proto zebra
fe80::be24:11ff:fefb:871b dev vnet190 lladdr bc:24:11:fb:87:1b STALE
fe80::be24:11ff:fe85:c7c0 dev vnet190 lladdr bc:24:11:85:c7:c0 STALE
fe80::be24:11ff:fe8e:e09b dev vnet190 lladdr bc:24:11:8e:e0:9b extern_learn NOARP proto zebra
fe80::be24:11ff:fe49:f3ed dev vnet190 lladdr bc:24:11:49:f3:ed STALE
fe80::be24:11ff:fe71:5fc2 dev vnet190 lladdr bc:24:11:71:5f:c2 STALE
fe80::be24:11ff:fe17:1abc dev vnet190 lladdr bc:24:11:17:1a:bc STALE
fe80::be24:11ff:feff:541f dev vnet190 lladdr bc:24:11:ff:54:1f STALE
fe80::be24:11ff:fec3:cd3f dev vnet190 lladdr bc:24:11:c3:cd:3f extern_learn NOARP proto zebra
fe80::be24:11ff:fe86:e506 dev vnet190 lladdr bc:24:11:86:e5:06 extern_learn NOARP proto zebra
fe80::be24:11ff:fe41:bfdc dev vnet190 lladdr bc:24:11:41:bf:dc STALE
fe80::be24:11ff:fe5b:54cf dev vnet190 lladdr bc:24:11:5b:54:cf extern_learn NOARP proto zebra


Code:
vtysh -c "show ip route vrf vrf_evpn"
Codes: K - kernel route, C - connected, L - local, S - static,
       R - RIP, O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
       T - Table, v - VNC, V - VNC-Direct, A - Babel, F - PBR,
       f - OpenFabric, t - Table-Direct,
       > - selected route, * - FIB route, q - queued, r - rejected, b - backup
       t - trapped, o - offload failure

IPv4 unicast VRF vrf_evpn:
C>* 10.20.20.8/29 is directly connected, mesh, weight 1, 1d02h04m
L>* 10.20.20.9/32 is directly connected, mesh, weight 1, 1d02h04m
B>* 10.20.20.10/32 [200/0] via 192.0.2.2, vrfbr_evpn onlink, weight 1, 1d02h03m
C>* 10.190.0.0/20 is directly connected, vnet190, weight 1, 1d02h04m
L>* 10.190.0.1/32 is directly connected, vnet190, weight 1, 1d02h04m
B>* 10.190.0.11/32 [200/0] via 192.0.2.2, vrfbr_evpn onlink, weight 1, 1d02h03m
B>* 10.190.0.20/32 [200/0] via 192.0.2.2, vrfbr_evpn onlink, weight 1, 1d02h03m
B>* 10.190.0.21/32 [200/0] via 192.0.2.2, vrfbr_evpn onlink, weight 1, 1d02h03m
B>* 10.190.0.52/32 [200/0] via 192.0.2.2, vrfbr_evpn onlink, weight 1, 00:04:24
B>* 10.190.0.53/32 [200/0] via 192.0.2.2, vrfbr_evpn onlink, weight 1, 1d02h03m
B>* 10.190.0.59/32 [200/0] via 192.0.2.2, vrfbr_evpn onlink, weight 1, 1d02h03m
B>* 10.190.0.60/32 [200/0] via 192.0.2.2, vrfbr_evpn onlink, weight 1, 1d02h03m
B>* 10.190.0.61/32 [200/0] via 192.0.2.2, vrfbr_evpn onlink, weight 1, 1d02h03m
B>* 10.191.0.0/20 [200/0] via 192.0.2.2, vrfbr_evpn onlink, weight 1, 1d02h03m
B>* 192.168.10.0/24 [200/0] via 192.0.2.2, vrfbr_evpn onlink, weight 1, 1d02h03m


Code:
vtysh -c "show evpn mac vni all"

VNI 11000 #MACs (local and remote) 16

Flags: N=sync-neighs, I=local-inactive, P=peer-active, X=peer-proxy
MAC               Type   Flags Intf/Remote ES/VTEP            VLAN  Seq #'s
bc:24:11:8e:e0:9b remote       192.0.2.2                            0/0
bc:24:11:41:bf:dc local        tap116i0                             0/0
bc:24:11:c3:cd:3f remote       192.0.2.2                            0/0
bc:24:11:79:94:10 local        tap201i0                             2/0
bc:24:11:71:5f:c2 local        tap107i0                             0/0
bc:24:11:86:e5:06 remote       192.0.2.2                            0/0
bc:24:11:ae:7b:86 local        tap118i0                             0/0
bc:24:11:17:1a:bc local        fwpr101p0                            0/0
bc:24:11:74:8d:74 remote       192.0.2.2                            0/0
bc:24:11:39:d8:b2 remote       192.0.2.2                            0/0
bc:24:11:5b:54:cf remote       192.0.2.2                            0/0
bc:24:11:db:e2:c4 remote       192.0.2.2                            0/0
bc:24:11:df:45:e6 local        veth104i0                            0/0
bc:24:11:49:f3:ed local        tap100i0                             0/0
bc:24:11:85:c7:c0 local        tap108i0                             0/0
bc:24:11:ff:54:1f local        tap115i0                             0/0

VNI 11002 #MACs (local and remote) 2

Flags: N=sync-neighs, I=local-inactive, P=peer-active, X=peer-proxy
MAC               Type   Flags Intf/Remote ES/VTEP            VLAN  Seq #'s
bc:24:11:57:5d:90 remote       192.0.2.2                            0/0
bc:24:11:8f:30:af local        veth110i0                            0/0


Code:
vtysh -c "show evpn arp-cache vni all"

VNI 11000 #ARP (IPv4 and IPv6, local and remote) 33

Flags: I=local-inactive, P=peer-active, X=peer-proxy
Neighbor                  Type   Flags State    MAC               Remote ES/VTEP                 Seq #'s
10.190.0.51               local        active   bc:24:11:17:1a:bc                                0/0
10.190.0.11               remote       active   bc:24:11:8e:e0:9b 192.0.2.2                      0/0
10.190.0.60               remote       active   bc:24:11:39:d8:b2 192.0.2.2                      0/0
10.190.0.62               local        active   bc:24:11:ae:7b:86                                0/0
10.190.0.21               remote       active   bc:24:11:86:e5:06 192.0.2.2                      0/0
fe80::be24:11ff:fe71:5fc2 local        active   bc:24:11:71:5f:c2                                0/0
10.190.0.12               local        active   bc:24:11:41:bf:dc                                0/0
fe80::be24:11ff:fe74:8d74 remote       active   bc:24:11:74:8d:74 192.0.2.2                      0/0
fe80::be24:11ff:fe37:1204 local        inactive bc:24:11:37:12:04                                0/0
10.190.0.22               local        active   bc:24:11:ff:54:1f                                0/0
10.190.0.50               local        active   bc:24:11:49:f3:ed                                0/0
fe80::be24:11ff:fe85:c7c0 local        active   bc:24:11:85:c7:c0                                0/0
fe80::be24:11ff:fe79:9410 local        active   bc:24:11:79:94:10                                2/0
10.190.0.53               remote       active   bc:24:11:db:e2:c4 192.0.2.2                      0/0
fe80::be24:11ff:fe39:d8b2 remote       active   bc:24:11:39:d8:b2 192.0.2.2                      0/0
10.190.0.55               local        active   bc:24:11:71:5f:c2                                0/0
fe80::be24:11ff:feff:541f local        active   bc:24:11:ff:54:1f                                0/0
fe80::be24:11ff:fe67:bc07 local        inactive bc:24:11:67:bc:07                                0/0
fe80::be24:11ff:fe05:d068 local        inactive bc:24:11:05:d0:68                                0/0
10.190.0.54               local        active   bc:24:11:df:45:e6                                0/0
fe80::be24:11ff:fe86:e506 remote       active   bc:24:11:86:e5:06 192.0.2.2                      0/0
10.190.0.59               remote       active   bc:24:11:74:8d:74 192.0.2.2                      0/0
fe80::be24:11ff:fefb:871b local        inactive bc:24:11:fb:87:1b                                0/0
fe80::be24:11ff:fe49:f3ed local        active   bc:24:11:49:f3:ed                                0/0
10.190.0.61               remote       active   bc:24:11:5b:54:cf 192.0.2.2                      0/0
fe80::be24:11ff:fe17:1abc local        active   bc:24:11:17:1a:bc                                0/0
fe80::be24:11ff:fe41:bfdc local        active   bc:24:11:41:bf:dc                                0/0
10.190.0.20               remote       active   bc:24:11:86:e5:06 192.0.2.2                      0/0
10.190.0.56               local        active   bc:24:11:85:c7:c0                                0/0
fe80::be24:11ff:fe8e:e09b remote       active   bc:24:11:8e:e0:9b 192.0.2.2                      0/0
10.190.1.201              local        active   bc:24:11:79:94:10                                2/0
fe80::be24:11ff:fe5b:54cf remote       active   bc:24:11:5b:54:cf 192.0.2.2                      0/0
fe80::be24:11ff:feae:7b86 local        active   bc:24:11:ae:7b:86                                0/0

VNI 11002 #ARP (IPv4 and IPv6, local and remote) 2

Flags: I=local-inactive, P=peer-active, X=peer-proxy
Neighbor        Type   Flags State    MAC               Remote ES/VTEP                 Seq #'s
10.20.20.11     local        active   bc:24:11:8f:30:af                                0/0
10.20.20.10     remote       active   bc:24:11:57:5d:90 192.0.2.2                      0/0
 
Everything seems to be working (the output you provided looks good) and the drop rate is actually quite low, so IMO this is fine. If you really want to know whats wrong and you can see regular drops, you could try finding out the drop reason using either dropwatch or bpftrace (something like: bpftrace -e 'tracepoint:skb:kfree_skb {printf("%s: %d\n", comm, args->reason)}').
 
I am attaching the bpftrace output in a file, I hope this helps find the problem. The errors and dropped counters increased at the time of data collection.
 

Attachments