auto br_ceph
iface br_ceph inet manual
address [SVI IP]
bridge_stp off
bridge-ports none
bridge-fd 0
auto vxlan666
iface vxlan666 inet manual
pre-up ip link add vxlan666 type vxlan id 666 dstport 4789 local [LOOPBACK IP] nolearning
pre-up ip link set dev vxlan666 master br_ceph
pre-up ip link set up dev vxlan666
post-up ip link set mtu 9000 dev vxlan666
router bgp 65002
bgp router-id [LOOPBACK IP]
bgp graceful-restart-disable
neighbor LEAF peer-group
neighbor LEAF remote-as 65001
neighbor LEAF capability dynamic
neighbor [ IP] peer-group LEAF
neighbor [ IP] peer-group LEAF
!
address-family ipv4 unicast
network [LOOPBACK IP]/32
neighbor LEAF allowas-in
maximum-paths 8
exit-address-family
!
address-family l2vpn evpn
neighbor LEAF activate
neighbor LEAF allowas-in
advertise-all-vni
advertise-svi-ip
advertise ipv4 unicast
exit-address-family
exit
sysctl -w net.ipv4.fib_multipath_hash_policy = 1
interesting. In your usecase, the SVI for br_ceph is different on each host ?Greetings,
I apologize for posting in an old topic, but I would like to share a solution to similar problem.
We had a similar issue with one of our five-server cluster communicating trough BGP-EVPN fabric, where 20 Gbit/s links were operating at only ~4-5 Gbit/s through a VXLAN tunnel between two servers using iperf3. The reason for this behavior is that each Proxmox node does not learn the MAC addresses of the other nodes, causing the traffic pushed into vxlan tunnel to flood to every node.
To debug this, you can use command: bridge fdb |grep [vxlan interface]
To fix this issue, add advertise-svi-ip under bgp configuration address-family l2vpn evpn (FRR).
Here are example configs:
/etc/network/interfaces:
Code:auto br_ceph iface br_ceph inet manual address [SVI IP] bridge_stp off bridge-ports none bridge-fd 0 auto vxlan666 iface vxlan666 inet manual pre-up ip link add vxlan666 type vxlan id 666 dstport 4789 local [LOOPBACK IP] nolearning pre-up ip link set dev vxlan666 master br_ceph pre-up ip link set up dev vxlan666 post-up ip link set mtu 9000 dev vxlan666
frr:
Code:router bgp 65002 bgp router-id [LOOPBACK IP] bgp graceful-restart-disable neighbor LEAF peer-group neighbor LEAF remote-as 65001 neighbor LEAF capability dynamic neighbor [ IP] peer-group LEAF neighbor [ IP] peer-group LEAF ! address-family ipv4 unicast network [LOOPBACK IP]/32 neighbor LEAF allowas-in maximum-paths 8 exit-address-family ! address-family l2vpn evpn neighbor LEAF activate neighbor LEAF allowas-in advertise-all-vni advertise-svi-ip advertise ipv4 unicast exit-address-family exit
another tunning possible:In the case you have multiple links connected into node, you probably want load balancing:
Code:sysctl -w net.ipv4.fib_multipath_hash_policy = 1
With this setup, we were able to have full 20Gbit/s throughput between two nodes trough vxlan by using iperf3 with multiple parallel streams.
# sysctl -wq net.ipv4.fib_multipath_hash_fields=0x0037
# sysctl -wq net.ipv4.fib_multipath_hash_policy=3
0x0001 Source IP address
0x0002 Destination IP address
0x0004 IP protocol
0x0008 Flow Label
0x0010 Source port
0x0020 Destination port
0x0040 Inner source IP address
0x0080 Inner destination IP address
0x0100 Inner IP protocol
0x0200 Inner Flow Label
0x0400 Inner source port
0x0800 Inner destination port
====== ==========================
Yes. Unique IP for each br_ceph interface in each node, used for Ceph in this example. The setup gives read/write speeds of ~2000MB/s in rados bench, practically full network line rate. Each server contains 4x 6.4TB nvme SSD.interesting. In your usecase, the SVI for br_ceph is different on each host ?
another tunning possible:
Code:# sysctl -wq net.ipv4.fib_multipath_hash_fields=0x0037 # sysctl -wq net.ipv4.fib_multipath_hash_policy=3 0x0001 Source IP address 0x0002 Destination IP address 0x0004 IP protocol 0x0008 Flow Label 0x0010 Source port 0x0020 Destination port 0x0040 Inner source IP address 0x0080 Inner destination IP address 0x0100 Inner IP protocol 0x0200 Inner Flow Label 0x0400 Inner source port 0x0800 Inner destination port ====== ==========================
We use essential cookies to make this site work, and optional cookies to enhance your experience.