ARP BUG with Ubuntu 12.04 Precise and 14.04 Trusty

Tobia

New Member
Aug 31, 2016
12
2
3
I've been having a network problem with the aforementioned CT OSes, which took me some time to get to the root of.

It happens when a CT is given two or more IP addresses (they may be LAN addresses, host-only private addresses, routable IPs, whatever) on different interfaces (eth0 and eth1) on the same bridge (vmbr0).

The bug is that both IP addresses appear on the host's ARP cache with the same HW address, usually the one of eth0. This makes the other IP address broken or unreliable: a small percentage of packets go through, especially ping and syn, but regular TCP connections almost always fail.

One workaround is to identify the wrong ARP entry and override it with:
Code:
arp -i vmbr0 -s $IP_ADDRESS $HW_ADDRESS

This did NOT happen with OpenVZ! I moved to PVE 4 some old Ubuntu containers that worked perfectly under PVE 3 and this issue showed up. But it's easy to reproduce on new containers.

Steps to reproduce:
  1. Download the official Ubuntu 12.04 or 14.04 template
  2. Create a container using that template, leaving every parameter to default values, adding an IP address as eth0
  3. Add a second interface eth1 with another IP address
  4. Clear the host's arp cache: ip neigh flush all
  5. Watch the arp cache in another terminal: watch -n.2 arp -n
  6. Start the container
  7. Ping both addresses (one after the other) from the host: they should reply and fill the arp table
  8. The two IP addresses should appear in the arp table with two HW addresses,* the same ones configured in PVE. But for containers built using these two OSes they both appear with the same HW address, which makes one of the two addresses unroutable / broken.
* Actually, they first appear in the arp table with the same HW address, but the wrong entry rights itself in a few seconds. I don't know why is that. This is visible by observing the arp cache in real time as suggested: watch -n.2 arp -n

Example:
Code:
# cat /etc/pve/lxc/190.conf
net0: name=eth0,bridge=vmbr0,hwaddr=7A:25:84:FD:20:41,ip=192.168.10.90/16,type=veth
net1: name=eth1,bridge=vmbr0,hwaddr=0A:F7:70:D7:4F:7B,ip=192.168.10.91/16,type=veth
...                                 ^^^^^^^^^^^^^^^^^
                                    different MACs

# ping 192.168.10.90
PING 192.168.10.90 (192.168.10.90) 56(84) bytes of data.
64 bytes from 192.168.10.90: icmp_seq=1 ttl=64 time=0.070 ms
...

# ping 192.168.10.91
PING 192.168.10.91 (192.168.10.91) 56(84) bytes of data.
64 bytes from 192.168.10.91: icmp_seq=1 ttl=64 time=0.062 ms
...

# arp -n
Address                  HWtype  HWaddress           Flags Mask            Iface
192.168.10.90            ether   7a:25:84:fd:20:41   C                     vmbr0
192.168.10.91            ether   7a:25:84:fd:20:41   C                     vmbr0
...                              ^^^^^^^^^^^^^^^^^
                                 same MAC!

As a reference, I have run this test repeatedly using the following templates:

ubuntu-12.04-standard_12.04-1_amd64 (Precise) BUG!!
ubuntu-14.04-standard_14.04-1_amd64 (Trusty) BUG!!
ubuntu-16.04-standard_16.04-1_amd64 (Xenial) OK
ubuntu-16.10-standard_16.10-1_amd64 (Yakkety) OK

debian-6.0-standard_6.0-7_amd64 (Squeeze) OK
debian-7.0-standard_7.11-1_amd64 (Wheezy) OK
debian-8.0-standard_8.6-1_amd64 (Jessie) OK

What gives?
 
the question here is why the arp cache of the host get these wrong arp entries.

When you monitor the arp traffic with tpdump -i vmbr0 proto arp do you see wrong arp replies coming, or are the answers allways right ?

( example of an arp reply)
13:29:18.409876 ARP, Reply 192.168.16.1 is-at 0c:c4:7a:c5:26:d4 (oui Unknown), length 46
 
This is really weird.

Here is a capture for one of the CTs that are working correctly:
Code:
# tcpdump -i vmbr0 -n arp
(pct start $ID returns without any ARP packet mentioning the CT's eth0 or eth1)
(ping $ETH0_IP)
17:14:31.837010 ARP, Request who-has $ETH0_IP tell $HOST_IP, length 28
17:14:31.837045 ARP, Reply $ETH0_IP is-at $ETH1_MAC, length 28
17:14:31.837053 ARP, Reply $ETH0_IP is-at $ETH0_MAC, length 28
17:14:31.837453 ARP, Reply $ETH0_IP is-at $UNKNOWN, length 46
(ping $ETH1_IP)
17:14:35.444887 ARP, Request who-has $ETH1_IP tell $HOST_IP, length 28
17:14:35.444909 ARP, Reply $ETH1_IP is-at $ETH1_MAC, length 28
17:14:35.444918 ARP, Reply $ETH1_IP is-at $ETH0_MAC, length 28

This shows that the IPs for the CT's eth0 and eth1 can be reached at any of the mac addresses bound to the bridge. I'm not a Linux bridge expert, but maybe this is the expected behavior?
$UNKNOWN is 00:1f:9e:8f:xx:xx, which is something I couldn't find in my network. It seems to be from Cisco, of which we do have a few devices, but what it's doing in this capture on vmbr0, I have no idea.

Here is a capture for one of the CTs with the buggy behaviour:
Code:
# tcpdump -i vmbr0 -n arp
(pct start $ID)
16:48:04.494413 ARP, Request who-has $DNS_IP tell $ETH0_IP, length 28
16:48:04.494597 ARP, Reply $DNS_IP is-at $DNS_MAC, length 46
16:48:04.495488 ARP, Reply $ETH0_IP is-at $UNKNOWN, length 46
(ping $ETH0_IP)
16:48:19.145935 ARP, Request who-has $HOST_IP tell $ETH0_IP, length 28
16:48:19.145963 ARP, Reply $HOST_IP is-at $HOST_MAC, length 28
16:48:19.146689 ARP, Reply $ETH0_IP is-at $UNKNOWN, length 46
(ping $ETH1_IP)
16:48:26.009997 ARP, Request who-has $ETH1_IP tell $HOST_IP, length 28
16:48:26.010021 ARP, Reply $ETH1_IP is-at $ETH0_MAC, length 28          <-- WRONG
($ETH1_MAC never appears in the tcpdump)

This is completely bewildering to me. It seems to show the wrong arp requests. First the CT is trying to reach the DNS, which comes attached with a spurious reply mentioning the same unknown Cisco mac address as above. Then, as soon as I issue ping eth0, the CT asks how to reach the host, and not vice versa. Then, after ping eth1, an actual arp request is issued in the correct direction, but the answer clearly contains the wrong mac address.

If anybody can make heads or tails of this, please help.

This is a fresh install from the latest PVE ISO, so I presume it's easy to reproduce.
 
on second thought I am wondering if the way you assign multiple IPs to the container is the best.

If you assign two IPs in the same subnet to two devices in this net, your default route from inside the inside the container (ip route) will only use one of these devices, because there is only ONE default route, and your packets will always come out from the same eth device in the container. This might explain the arp who-has problem you're mentionning.

in your case ( adding two IPs in the same subnet to one CT), it would rather make sense to use IP aliases, and a single NIC
see https://www.cyberciti.biz/faq/linux-creating-or-adding-new-network-alias-to-a-network-card-nic/

you can create a file in /etc/network.d/interfaces for eth0:0 in the container, and create your alias here

and BTW why two IPs ? shouldn't VirtualHosts be enough ? :)
 
This is more of a problem with the network capture test I performed, than with the original problem I'm having. In the test I assigned two IPs from the same subnet, but in my actual use case I assign two IPs from different subnets: one is a local IP for use between CTs on the same host (10.x.x.x) and the other is a public routable IP.

I can redo the network capture with this setup if you'd like to see it.
 
Here is a test I just made on my production system, using an actual public IP address.

Host networking:
Code:
auto eth0
iface eth0 inet static
    address         $HOST_PUBLIC_IP
    netmask         ...
    gateway         ...

auto vmbr0
iface vmbr0 inet static
    address         10.0.0.1
    netmask         255.255.0.0
    bridge_ports    none
    bridge_stp      off
    bridge_waitport 0
    bridge_fd       0

In this particular server, the host's eth0 is NOT added to the bridge (but in the previous test, it was.) In this case, the bridge only links the CT and VM virtual interfaces by themselves, but is itself routed, not bridged, onto the outside world. The bridge has a private IP (10.0.0.1) which is used by the host to communicate with the CTs private IPs in that subnet (10.0.0.0/16)

The bridge vmbr0 has mac address fe:a4:1f:1d:27:ac. I mention it because it appears many times in the network capture below.

The CT under test, Ubuntu 14.04 template, has two interfaces, one with a private IP and no gateway, and one with a public IP and a gateway:
Code:
net0: name=eth0,bridge=vmbr0,hwaddr=02:11:11:11:11:11,ip=10.0.9.99/16,type=veth
net1: name=eth1,bridge=vmbr0,hwaddr=02:22:22:22:22:22,ip=$CT_PUBLIC_IP,gw=$CT_GATEWAY,type=veth

I start the test with this host arp table:
Code:
10.0.9.99                        (incomplete)                        vmbr0
$CT_PUBLIC_IP                    (incomplete)                        vmbr0

Then i do "pct start" and these queries appear:
Code:
 0.000000 fe:a4:1f:1d:27:ac -> ff:ff:ff:ff:ff:ff ARP 42 Who has $CT_PUBLIC_IP?  Tell 10.0.0.1
 0.999179 fe:a4:1f:1d:27:ac -> ff:ff:ff:ff:ff:ff ARP 42 Who has $CT_PUBLIC_IP?  Tell 10.0.0.1
 1.976055 02:22:22:22:22:22 -> ff:ff:ff:ff:ff:ff ARP 42 Who has $CT_GATEWAY?  Tell $CT_PUBLIC_IP
 1.976099 fe:a4:1f:1d:27:ac -> 02:22:22:22:22:22 ARP 42 $CT_GATEWAY is at fe:a4:1f:1d:27:ac

The host arp table becomes:
Code:
10.0.9.99                        (incomplete)                        vmbr0
$CT_PUBLIC_IP            ether   02:22:22:22:22:22   C               vmbr0

5 sec later, without me doing anything, this additional query is made, without an answer:
Code:
 6.979206 fe:a4:1f:1d:27:ac -> 02:22:22:22:22:22 ARP 42 Who has $CT_PUBLIC_IP?  Tell 10.0.0.1
 7.979175 fe:a4:1f:1d:27:ac -> 02:22:22:22:22:22 ARP 42 Who has $CT_PUBLIC_IP?  Tell 10.0.0.1
 8.979208 fe:a4:1f:1d:27:ac -> 02:22:22:22:22:22 ARP 42 Who has $CT_PUBLIC_IP?  Tell 10.0.0.1

and the arp table becomes:
Code:
10.0.9.99                        (incomplete)                        vmbr0
$CT_PUBLIC_IP                    (incomplete)                        vmbr0

At around 15.2 of capture time I perform "ping 10.0.9.99 -I 10.0.0.1" and this query is made:
Code:
15.295369 fe:a4:1f:1d:27:ac -> ff:ff:ff:ff:ff:ff ARP 42 Who has 10.0.9.99?  Tell 10.0.0.1
15.295415 02:11:11:11:11:11 -> fe:a4:1f:1d:27:ac ARP 42 10.0.9.99 is at 02:11:11:11:11:11

Ping works and the arp table becomes:
Code:
10.0.9.99                ether   02:11:11:11:11:11   C               vmbr0
$CT_PUBLIC_IP                    (incomplete)                        vmbr0

After 2 sec of continuous pinging, this additional query is made and replied wrongly:
Code:
17.065787 fe:a4:1f:1d:27:ac -> ff:ff:ff:ff:ff:ff ARP 42 Who has $CT_PUBLIC_IP?  Tell 10.0.0.1
17.065855 02:11:11:11:11:11 -> fe:a4:1f:1d:27:ac ARP 42 $CT_PUBLIC_IP is at 02:11:11:11:11:11

The host arp table becomes corrupted:
Code:
10.0.9.99                ether   02:11:11:11:11:11   C               vmbr0
$CT_PUBLIC_IP            ether   02:11:11:11:11:11   C               vmbr0

When I kill the previous ping and try pinging the public IP with "ping $CT_PUBLIC_IP -I $HOST_PUBLIC_IP", no additional arp packets are exchanged, the host arp table remains corrupted, and ping doesn't work.

After 7 seconds of ping not working (but it's a random time) the CT tries to contact the outside world using its public IP, for some unrelated reason, therefore making an arp request from its public IP and mac address:
Code:
36.083209 02:22:22:22:22:22 -> fe:a4:1f:1d:27:ac ARP 42 Who has $CT_GATEWAY?  Tell $CT_PUBLIC_IP
36.083253 fe:a4:1f:1d:27:ac -> 02:22:22:22:22:22 ARP 42 $CT_GATEWAY is at fe:a4:1f:1d:27:ac

This has the side effect of fixing the host arp table:
Code:
10.0.9.99                ether   02:11:11:11:11:11   C               vmbr0
$CT_PUBLIC_IP            ether   02:22:22:22:22:22   C               vmbr0

and ping starts working again... until the host arp table will be corrupted again, and so on.

This gives the intermittent network behaviour.

As I mentioned I did the previous test on a different server, freshly installed, without any public IPs, and I got the same result. This server has the host's eth0 excluded from the bridge, but the previous test had the host's eth0 inside the bridge. The results are the same: Ubuntu 12.04 and 14.04 show this arp pollution. No other CT OS that I have tried does anything like that. But I cannot understand what Ubuntu could be doing different, since a CT doesn't have a running kernel!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!