[SOLVED] VXLAN ARP timeouts

chrispage1

Well-Known Member
Sep 1, 2021
100
50
48
34
Hi,

We've setup a VXLAN SDN with VRF and its working great, appreciate all the hard work that the Proxmox team have put in to the SDN functionality and look forward to seeing it grow!

However, some of our less 'chatty' virtual machines after a while drop off of the MAC-VRF table and in-turn lose their BGP connection. I'm assuming this is some kind of ARP timeout issue because the VM is just sat idle - is there a way to prevent this from happening?

Thanks,
Chris.
 
Yes, this is currently an issue with silent hosts - since the type-2 routes are generated from the hosts' neighbor table, the VM needs to regularly connect outside so the neighbor table on the PVE host doesn't get lost. A workaround for this could be inserting routes statically into the neighbor table (e.g. via the VM lifecycle hooks) or regularly initiating connections from inside the guest.
 
Thanks Stefan - I guess if I add a single outbound ping every minute, this should prevent it from ever going silent.
 
The 'Advertise Subnet' option in the EVPN zone could potentially help as well, since then the full subnet gets advertised in addition to the /32 routes.
 
Thanks. I did actually have that option checked however I dont think our upstreams are importing anything greater than /32.

For now I'll stick with a nice simple ping. For those that might come across this in the future (perhaps even my future self!), here's what I've done in our Alpine Linux box -

Code:
# crontab -e
* * * * * /bin/ping 8.8.8.8 -c 1 > /dev/null 2>&1

Out of interest, are there any future proposed fixes for this within Proxmox?
 
how about only pinging the default GW ?

Bash:
* * * * * /bin/ping $(ip route | awk '/default/ {print $3}') -c 1 || echo "GW unreachable at $(date)" >> /var/log/gw_ping.log
 
how about only pinging the default GW ?

Bash:
* * * * * /bin/ping $(ip route | awk '/default/ {print $3}') -c 1 || echo "GW unreachable at $(date)" >> /var/log/gw_ping.log

Not a bad idea at all actually - it doesn't need a WAN ping does it, wasn't really thinking there! Thanks
 
Yes, this is currently an issue with silent hosts - since the type-2 routes are generated from the hosts' neighbor table, the VM needs to regularly connect outside so the neighbor table on the PVE host doesn't get lost. A workaround for this could be inserting routes statically into the neighbor table (e.g. via the VM lifecycle hooks) or regularly initiating connections from inside the guest.
I 'm not sure that hook can work with live migration, as the arp entry need to be set after source vm stop (the arp entry need to be flushed from source horst) and before the resume of the target vm.

This is a thing that we should implemented officially. (like we already do for mac address with 'bridge-disable-mac-learning).

Don't known what between ipam or the ipconfigX: in vm config is the best source.
 
I 'm not sure that hook can work with live migration, as the arp entry need to be set after source vm stop (the arp entry need to be flushed from source horst) and before the resume of the target vm.

This is a thing that we should implemented officially. (like we already do for mac address with 'bridge-disable-mac-learning).

Don't known what between ipam or the ipconfigX: in vm config is the best source.
I remember once reading a blog post from someone who solved it that way, but it seems to be offline atm :/ (edit: https://web.archive.org/web/2023032...posts/2022/announce-proxmox-vm-ips-via-bgp-1/ - it was for announcing /32 though)

But you're right, this is probably something that is better solved by integrating it into the stack rather than relying on custom hookscripts..
 
Last edited: