Hi,
we are currently transforming our infrastructure to ipv6 only. Additionally we are moving away from LACP bonds to routed host addresses to get independent of switch vendors.
Technically this is implemented by adding a /128 address to a loopback, dummy or bridge interface, and using a routing daemon to announce this address to connected routers/switches. To prevent clashes with FRR in proxmox and some other services bird is used as routing daemon, and OSPF as routing protocol. Both bird and the switches are configured to only exchange the /128 host route and the default address. This works fine, allowing load balancing and failover.
The old setup was using LACP bonds connected to a modular switch; bond interface was added to a local bridge, and a IPv4 address was statically configured. An additional IPv6 address was configured using SLAAC. Instances were using IPv4 and IPv6 address; EVPN was used to connected the cluster hosts, and BGP controllers announced the instance addresses to the connect switch.
Proxmox is currently running on three node; two have already been transformed to the new setup, the third one is running the old setup. Due to problems with FRR and IPV6 VTEPs we have upgraded FRR to version 10.6 which seems to work fine. All instance except two tests instances are running on the node with the old setup. This node is also the only node configured as exit node (adding the other two node fails, see below).
Problem:
Ingress traffic to instances is not propagates correctly if ipv6 is used. Since the single exit node announces the addresses of the instances, it also receives all the incoming traffic. This works fine if the target instance is running on the exit node for both ipv4 and ipv6. For instances on other nodes the traffic has to be propagated using the EVPN overlay. This also works fir IPv4 traffic, but fails for IPv6. Traffic between instance on the other nodes (both instances on the same node) is also working for IPv4 and IPv6.
Summary:
- IPv4 traffic works fine independent of node the instance runs on
- IPv6 works for instances on exit node or between instances on the same other nodes
- IPv6 does not work if traffic has to be forwarded from exit node to another node
BGP status of the exit node with the old setup:
BGP status of a node with the new setup:
IPv6 routing table on the old host:
IPv6 routing table on the new setup:
I was able to partly trace packets via tcpdump (e.g. ping'ing a machine on one of the other nodes with address fdf5:87c7:8336:1:be24:11ff:fe2c:d0d):
1. packet is send to the exit node
2. packet is forwarded to the vrf_bcf interface
3. the old host does not have a host route for this instance, os it is sending out ICMP6 neighbor solicitation on the vxlan mesh
4. the node with the instance received the packet and sends an answer:
5. packet is dropped within the vxlan mesh for unknown reason, connectivity to instance is not possible
The `show evpn arp-cache vni all` command in vtysh shows all instances on the other hosts, but only their link local address (e.g.
fe80::be24:11ff:fe2c:d0d instead of fdf5:87c7:8336:1:be24:11ff:fe2c:d0d). These are also marked as active:
Local instances are present with both a LLA and the correct ULA.
ffr.conf on the old node: old_frr.conf attachment
frr.conf on the other node: new_frr.conf attachment
Files in /etc/pve/sdn/: sdn-config (except subnets and dns)
The second problem is the loss of connectivity if one of the new nodes is configured as exit node. I assume that the problem is related; the node is not able to forward traffic to instances on another node.
Given the fact that IPv4 is working, the overall setup seems to be OK, but some minor glitch is preventing IPv6 from working. I'm also aware that this setup is absolut non standard, especially the underlay and extra routing setup. I assume that this might be the cause of the problem, but I was not able to verify it yet.
If you have any hint on why the neighbor advertisement sent out on the bridge of the new node is dropped before it reaches the exit node, it would help me debugging this problem further.
And sorry for the long text ;-)
Best regards,
Burkhard Linke
we are currently transforming our infrastructure to ipv6 only. Additionally we are moving away from LACP bonds to routed host addresses to get independent of switch vendors.
Technically this is implemented by adding a /128 address to a loopback, dummy or bridge interface, and using a routing daemon to announce this address to connected routers/switches. To prevent clashes with FRR in proxmox and some other services bird is used as routing daemon, and OSPF as routing protocol. Both bird and the switches are configured to only exchange the /128 host route and the default address. This works fine, allowing load balancing and failover.
The old setup was using LACP bonds connected to a modular switch; bond interface was added to a local bridge, and a IPv4 address was statically configured. An additional IPv6 address was configured using SLAAC. Instances were using IPv4 and IPv6 address; EVPN was used to connected the cluster hosts, and BGP controllers announced the instance addresses to the connect switch.
Proxmox is currently running on three node; two have already been transformed to the new setup, the third one is running the old setup. Due to problems with FRR and IPV6 VTEPs we have upgraded FRR to version 10.6 which seems to work fine. All instance except two tests instances are running on the node with the old setup. This node is also the only node configured as exit node (adding the other two node fails, see below).
Problem:
Ingress traffic to instances is not propagates correctly if ipv6 is used. Since the single exit node announces the addresses of the instances, it also receives all the incoming traffic. This works fine if the target instance is running on the exit node for both ipv4 and ipv6. For instances on other nodes the traffic has to be propagated using the EVPN overlay. This also works fir IPv4 traffic, but fails for IPv6. Traffic between instance on the other nodes (both instances on the same node) is also working for IPv4 and IPv6.
Summary:
- IPv4 traffic works fine independent of node the instance runs on
- IPv6 works for instances on exit node or between instances on the same other nodes
- IPv6 does not work if traffic has to be forwarded from exit node to another node
BGP status of the exit node with the old setup:
Code:
proxmox-sr4-2.intra# show bgp summary
IPv4 Unicast Summary:
BGP router identifier 26.61.152.8, local AS number 65202 VRF default vrf-id 0
BGP table version 59
RIB entries 48, using 7296 bytes of memory
Peers 2, using 47 KiB of memory
Peer groups 2, using 128 bytes of memory
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt Desc
192.168.120.1 4 64525 18245 18251 59 0 0 05:03:43 20 24 N/A
fdf5:87c7:8336:3005::1 4 64525 18230 18232 0 0 0 05:03:43 NoNeg NoNeg N/A
Total number of neighbors 2
IPv6 Unicast Summary:
BGP router identifier 26.61.152.8, local AS number 65202 VRF default vrf-id 0
BGP table version 13
RIB entries 15, using 2280 bytes of memory
Peers 2, using 47 KiB of memory
Peer groups 2, using 128 bytes of memory
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt Desc
192.168.120.1 4 64525 18245 18251 0 0 0 05:03:43 NoNeg NoNeg N/A
fdf5:87c7:8336:3005::1 4 64525 18230 18232 13 0 0 05:03:43 7 8 N/A
Total number of neighbors 2
L2VPN EVPN Summary:
BGP router identifier 26.61.152.8, local AS number 65202 VRF default vrf-id 0
BGP table version 0
RIB entries 17, using 2584 bytes of memory
Peers 2, using 47 KiB of memory
Peer groups 2, using 128 bytes of memory
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt Desc
proxmox-sr4-1.intra(fdf5:87c7:8336:100::46) 4 65202 5753 5818 101 0 0 04:22:08 7 65 FRRouting/10.6.0
proxmox-sr4-3.intra(fdf5:87c7:8336:100::48) 4 65202 6113 6138 101 0 0 05:03:15 4 65 FRRouting/10.6.0
Total number of neighbors 2
BGP status of a node with the new setup:
Code:
proxmox-sr4-1.intra# show bgp summary
IPv6 Unicast Summary:
BGP router identifier 26.58.96.180, local AS number 65202 VRF default vrf-id 0
BGP table version 49
RIB entries 35, using 5320 bytes of memory
Peers 2, using 47 KiB of memory
Peer groups 2, using 128 bytes of memory
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt Desc
fdf5:87c7:8336:4001::1 4 64520 5299 5300 49 0 0 04:24:34 3 3 N/A
fdf5:87c7:8336:4001::2 4 64520 5294 5300 49 0 0 04:24:34 0 3 N/A
Total number of neighbors 2
L2VPN EVPN Summary:
BGP router identifier 26.58.96.180, local AS number 65202 VRF default vrf-id 0
BGP table version 0
RIB entries 17, using 2584 bytes of memory
Peers 2, using 47 KiB of memory
Peer groups 2, using 128 bytes of memory
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt Desc
proxmox-sr4-3.intra(fdf5:87c7:8336:100::48) 4 65202 5358 5337 85 0 0 04:24:34 4 7 FRRouting/10.6.0
proxmox-sr4-2.intra(fdf5:87c7:8336:3005::11) 4 65202 5364 5331 85 0 0 04:24:34 65 7 FRRouting/10.6.0
Total number of neighbors 2
IPv6 routing table on the old host:
Code:
root@proxmox-sr4-2:~# ip -6 r
fdf5:87c7:8336:1::20 nhid 27 via fdf5:87c7:8336:3005::1 dev vmbr0 proto bgp metric 20 pref medium
fdf5:87c7:8336:1::21 nhid 27 via fdf5:87c7:8336:3005::1 dev vmbr0 proto bgp metric 20 pref medium
fdf5:87c7:8336:1::22 nhid 27 via fdf5:87c7:8336:3005::1 dev vmbr0 proto bgp metric 20 pref medium
fdf5:87c7:8336:1::/64 nhid 22 dev vrf_bcf proto bgp metric 20 pref medium
fdf5:87c7:8336:2::10 nhid 27 via fdf5:87c7:8336:3005::1 dev vmbr0 proto bgp metric 20 pref medium
fdf5:87c7:8336:2::20 nhid 27 via fdf5:87c7:8336:3005::1 dev vmbr0 proto bgp metric 20 pref medium
fdf5:87c7:8336:2::21 nhid 27 via fdf5:87c7:8336:3005::1 dev vmbr0 proto bgp metric 20 pref medium
fdf5:87c7:8336:2::22 nhid 27 via fdf5:87c7:8336:3005::1 dev vmbr0 proto bgp metric 20 pref medium
fdf5:87c7:8336:3005::/64 dev vmbr0 proto kernel metric 256 pref medium
fe80::/64 dev vmbr0 proto kernel metric 256 pref medium
fe80::/64 dev eno1 proto kernel metric 256 pref medium
default via fdf5:87c7:8336:3005::1 dev vmbr0 proto kernel metric 1024 onlink pref medium
root@proxmox-sr4-2:~# ip -6 r show vrf vrf_bcf
anycast fdf5:87c7:8336:1:: dev intra proto kernel metric 0 pref medium
fdf5:87c7:8336:1::/64 dev intra proto kernel metric 256 pref medium
anycast fe80:: dev intra proto kernel metric 0 pref medium
anycast fe80:: dev vrfbr_bcf proto kernel metric 0 pref medium
fe80::/64 dev intra proto kernel metric 256 pref medium
fe80::/64 dev vrfbr_bcf proto kernel metric 256 pref medium
multicast ff00::/8 dev intra proto kernel metric 256 pref medium
multicast ff00::/8 dev vrfbr_bcf proto kernel metric 256 pref medium
IPv6 routing table on the new setup:
Code:
root@proxmox-sr4-1:~# ip -6 r
fdf5:87c7:8336:1::/64 dev intra proto bird metric 32 pref medium
fdf5:87c7:8336:2::20 nhid 30 via fe80::d2ea:11ff:fe40:ae40 dev ens8f0np0 proto bgp metric 20 pref medium
fdf5:87c7:8336:2::21 nhid 30 via fe80::d2ea:11ff:fe40:ae40 dev ens8f0np0 proto bgp metric 20 pref medium
fdf5:87c7:8336:2::22 nhid 30 via fe80::d2ea:11ff:fe40:ae40 dev ens8f0np0 proto bgp metric 20 pref medium
fdf5:87c7:8336:100::46 dev vmbr0 proto bird metric 32 pref medium
fdf5:87c7:8336:100::46 dev vmbr0 proto kernel metric 256 pref medium
fdf5:87c7:8336:4001::1 via fe80::d2ea:11ff:fe40:ae40 dev ens8f0np0 proto bird src fdf5:87c7:8336:100::46 metric 32 pref medium
fdf5:87c7:8336:4001::2 via fe80::d2ea:11ff:fe3f:7a54 dev ens8f1np1 proto bird src fdf5:87c7:8336:100::46 metric 32 pref medium
fe80::/64 dev ens8f0np0 proto kernel metric 256 pref medium
fe80::/64 dev ens8f1np1 proto kernel metric 256 pref medium
fe80::/64 dev eno1 proto kernel metric 256 pref medium
fe80::/64 dev vmbr0 proto kernel metric 256 pref medium
default proto bird src fdf5:87c7:8336:100::46 metric 32 pref medium
nexthop via fe80::d2ea:11ff:fe3f:7a54 dev ens8f1np1 weight 1
nexthop via fe80::d2ea:11ff:fe40:ae40 dev ens8f0np0 weight 1
root@proxmox-sr4-1:~# ip -6 r show vrf vrf_bcf
anycast fdf5:87c7:8336:1:: dev intra proto kernel metric 0 pref medium
fdf5:87c7:8336:1:be24:11ff:fe03:19d1 nhid 35 via fdf5:87c7:8336:3005::11 dev vrfbr_bcf proto bgp metric 20 onlink pref medium
fdf5:87c7:8336:1:be24:11ff:fe13:3310 nhid 35 via fdf5:87c7:8336:3005::11 dev vrfbr_bcf proto bgp metric 20 onlink pref medium
fdf5:87c7:8336:1:be24:11ff:fe15:4d5e nhid 35 via fdf5:87c7:8336:3005::11 dev vrfbr_bcf proto bgp metric 20 onlink pref medium
fdf5:87c7:8336:1:be24:11ff:fe1b:e199 nhid 35 via fdf5:87c7:8336:3005::11 dev vrfbr_bcf proto bgp metric 20 onlink pref medium
fdf5:87c7:8336:1:be24:11ff:fe2a:1b6f nhid 35 via fdf5:87c7:8336:3005::11 dev vrfbr_bcf proto bgp metric 20 onlink pref medium
fdf5:87c7:8336:1:be24:11ff:fe33:3065 nhid 35 via fdf5:87c7:8336:3005::11 dev vrfbr_bcf proto bgp metric 20 onlink pref medium
fdf5:87c7:8336:1:be24:11ff:fe3a:fffb nhid 35 via fdf5:87c7:8336:3005::11 dev vrfbr_bcf proto bgp metric 20 onlink pref medium
fdf5:87c7:8336:1:be24:11ff:fe81:bc2e nhid 35 via fdf5:87c7:8336:3005::11 dev vrfbr_bcf proto bgp metric 20 onlink pref medium
fdf5:87c7:8336:1:be24:11ff:fe89:314d nhid 35 via fdf5:87c7:8336:3005::11 dev vrfbr_bcf proto bgp metric 20 onlink pref medium
fdf5:87c7:8336:1:be24:11ff:fec4:d86a nhid 35 via fdf5:87c7:8336:3005::11 dev vrfbr_bcf proto bgp metric 20 onlink pref medium
fdf5:87c7:8336:1:be24:11ff:fec7:6231 nhid 35 via fdf5:87c7:8336:3005::11 dev vrfbr_bcf proto bgp metric 20 onlink pref medium
fdf5:87c7:8336:1:be24:11ff:feca:ee66 nhid 35 via fdf5:87c7:8336:3005::11 dev vrfbr_bcf proto bgp metric 20 onlink pref medium
fdf5:87c7:8336:1:be24:11ff:fed4:7aa7 nhid 35 via fdf5:87c7:8336:3005::11 dev vrfbr_bcf proto bgp metric 20 onlink pref medium
fdf5:87c7:8336:1:be24:11ff:fee3:7021 nhid 35 via fdf5:87c7:8336:3005::11 dev vrfbr_bcf proto bgp metric 20 onlink pref medium
fdf5:87c7:8336:1:be24:11ff:fef8:aad8 nhid 35 via fdf5:87c7:8336:3005::11 dev vrfbr_bcf proto bgp metric 20 onlink pref medium
fdf5:87c7:8336:1::/64 dev intra proto kernel metric 256 pref medium
anycast fe80:: dev vrfbr_bcf proto kernel metric 0 pref medium
anycast fe80:: dev intra proto kernel metric 0 pref medium
fe80::/64 dev intra proto kernel metric 256 pref medium
fe80::/64 dev vrfbr_bcf proto kernel metric 256 pref medium
multicast ff00::/8 dev intra proto kernel metric 256 pref medium
multicast ff00::/8 dev vrfbr_bcf proto kernel metric 256 pref medium
default nhid 35 via fdf5:87c7:8336:3005::11 dev vrfbr_bcf proto bgp metric 20 onlink pref medium
I was able to partly trace packets via tcpdump (e.g. ping'ing a machine on one of the other nodes with address fdf5:87c7:8336:1:be24:11ff:fe2c:d0d):
1. packet is send to the exit node
2. packet is forwarded to the vrf_bcf interface
3. the old host does not have a host route for this instance, os it is sending out ICMP6 neighbor solicitation on the vxlan mesh
4. the node with the instance received the packet and sends an answer:
Code:
tcpdump: listening on intra, link-type EN10MB (Ethernet), snapshot length 262144 bytes
15:41:35.888016 IP6 (hlim 255, next-header ICMPv6 (58) payload length: 32) fe80::be24:11ff:fea5:d4ee > ff02::1:ff2c:d0d: [icmp6 sum ok] ICMP6, neighbor solicitation, length 32, who has fdf5:87c7:8336:1:be24:11ff:fe2c:d0d
source link-address option (1), length 8 (1): bc:24:11:a5:d4:ee
0x0000: bc24 11a5 d4ee
15:41:35.888171 IP6 (hlim 255, next-header ICMPv6 (58) payload length: 32) fdf5:87c7:8336:1:be24:11ff:fe2c:d0d > fe80::be24:11ff:fea5:d4ee: [icmp6 sum ok] ICMP6, neighbor advertisement, length 32, tgt is fdf5:87c7:8336:1:be24:11ff:fe2c:d0d, Flags [solicited, override]
destination link-address option (2), length 8 (1): bc:24:11:2c:0d:0d
0x0000: bc24 112c 0d0d
5. packet is dropped within the vxlan mesh for unknown reason, connectivity to instance is not possible
The `show evpn arp-cache vni all` command in vtysh shows all instances on the other hosts, but only their link local address (e.g.
fe80::be24:11ff:fe2c:d0d instead of fdf5:87c7:8336:1:be24:11ff:fe2c:d0d). These are also marked as active:
Code:
root@proxmox-sr4-2:~# vtysh -e 'show evpn arp-cache vni all' | grep remote
VNI 5200 #ARP (IPv4 and IPv6, local and remote) 50
fe80::be24:11ff:feac:d98d remote active bc:24:11:ac:d9:8d fdf5:87c7:8336:100::48 0/1
192.168.121.13 remote active bc:24:11:2c:0d:0d fdf5:87c7:8336:100::46 0/3
fe80::be24:11ff:fed5:1c82 remote active bc:24:11:d5:1c:82 fdf5:87c7:8336:100::46 0/1
fe80::be24:11ff:fe2c:d0d remote active bc:24:11:2c:0d:0d fdf5:87c7:8336:100::46 0/3
Local instances are present with both a LLA and the correct ULA.
ffr.conf on the old node: old_frr.conf attachment
frr.conf on the other node: new_frr.conf attachment
Files in /etc/pve/sdn/: sdn-config (except subnets and dns)
The second problem is the loss of connectivity if one of the new nodes is configured as exit node. I assume that the problem is related; the node is not able to forward traffic to instances on another node.
Given the fact that IPv4 is working, the overall setup seems to be OK, but some minor glitch is preventing IPv6 from working. I'm also aware that this setup is absolut non standard, especially the underlay and extra routing setup. I assume that this might be the cause of the problem, but I was not able to verify it yet.
If you have any hint on why the neighbor advertisement sent out on the bridge of the new node is dropped before it reaches the exit node, it would help me debugging this problem further.
And sorry for the long text ;-)
Best regards,
Burkhard Linke