containers unreachable until traffic is issued outbound

alexskysilk

Distinguished Member
Oct 16, 2015
2,340
626
183
Chatsworth, CA
www.skysilk.com
I have an intermittent problem with containers. On boot, their network is unreachable until I manually lxc-attach them and ping out. This only happens to SOME containers but there doesnt seem to be any common denominator.

How are routes propagated when a container is powered on? how can I mitigate this issue?
 
if you cannot ping the container, test from your outside machine located in the same LAN if the container is answering arp whohas requests

tcpdump -nnn -i vmbr0 -e arp src ip_of_the_container_you_try to join

you should see something like

10:08:25.423797 c2:cd:8b:f6:ab:cd > 0c:c4:7a:31:xx:xx, ethertype ARP (0x0806), length 60: Reply ip_of_the_container_you_try to join is-at c2:cd:8b:f6:ab:cd, length 46
 
More info:

this seems to happen when we reassign the IP address to another container (eg, different mac address). the moment I issue a tcpdump ON the hypervisor to the affected bridge, the IP responds correctly and the interface begins functioning- but ONLY after touching it from inside the hypervisor.

Are there bridge specific actions that can/should be performed in order for the arp table to refresh correctly upon ip/mac change?
 
doing it on the container is kind of a pain because we have a large number of containers. What do you think about a systemd watcher that responds to containers spinning up? eg, I could watch for the presence of new devices in /sys/devices/virtual/net, pull the IP by grepping the IP from lxc-info and issuing an arping. am I missing something or (preferrably) do you have a more efficient method?