I've been having a network problem with the aforementioned CT OSes, which took me some time to get to the root of.
It happens when a CT is given two or more IP addresses (they may be LAN addresses, host-only private addresses, routable IPs, whatever) on different interfaces (eth0 and eth1) on the same bridge (vmbr0).
The bug is that both IP addresses appear on the host's ARP cache with the same HW address, usually the one of eth0. This makes the other IP address broken or unreliable: a small percentage of packets go through, especially ping and syn, but regular TCP connections almost always fail.
One workaround is to identify the wrong ARP entry and override it with:
This did NOT happen with OpenVZ! I moved to PVE 4 some old Ubuntu containers that worked perfectly under PVE 3 and this issue showed up. But it's easy to reproduce on new containers.
Steps to reproduce:
Example:
As a reference, I have run this test repeatedly using the following templates:
ubuntu-12.04-standard_12.04-1_amd64 (Precise) BUG!!
ubuntu-14.04-standard_14.04-1_amd64 (Trusty) BUG!!
ubuntu-16.04-standard_16.04-1_amd64 (Xenial) OK
ubuntu-16.10-standard_16.10-1_amd64 (Yakkety) OK
debian-6.0-standard_6.0-7_amd64 (Squeeze) OK
debian-7.0-standard_7.11-1_amd64 (Wheezy) OK
debian-8.0-standard_8.6-1_amd64 (Jessie) OK
What gives?
It happens when a CT is given two or more IP addresses (they may be LAN addresses, host-only private addresses, routable IPs, whatever) on different interfaces (eth0 and eth1) on the same bridge (vmbr0).
The bug is that both IP addresses appear on the host's ARP cache with the same HW address, usually the one of eth0. This makes the other IP address broken or unreliable: a small percentage of packets go through, especially ping and syn, but regular TCP connections almost always fail.
One workaround is to identify the wrong ARP entry and override it with:
Code:
arp -i vmbr0 -s $IP_ADDRESS $HW_ADDRESS
This did NOT happen with OpenVZ! I moved to PVE 4 some old Ubuntu containers that worked perfectly under PVE 3 and this issue showed up. But it's easy to reproduce on new containers.
Steps to reproduce:
- Download the official Ubuntu 12.04 or 14.04 template
- Create a container using that template, leaving every parameter to default values, adding an IP address as eth0
- Add a second interface eth1 with another IP address
- Clear the host's arp cache: ip neigh flush all
- Watch the arp cache in another terminal: watch -n.2 arp -n
- Start the container
- Ping both addresses (one after the other) from the host: they should reply and fill the arp table
- The two IP addresses should appear in the arp table with two HW addresses,* the same ones configured in PVE. But for containers built using these two OSes they both appear with the same HW address, which makes one of the two addresses unroutable / broken.
Example:
Code:
# cat /etc/pve/lxc/190.conf
net0: name=eth0,bridge=vmbr0,hwaddr=7A:25:84:FD:20:41,ip=192.168.10.90/16,type=veth
net1: name=eth1,bridge=vmbr0,hwaddr=0A:F7:70:D7:4F:7B,ip=192.168.10.91/16,type=veth
... ^^^^^^^^^^^^^^^^^
different MACs
# ping 192.168.10.90
PING 192.168.10.90 (192.168.10.90) 56(84) bytes of data.
64 bytes from 192.168.10.90: icmp_seq=1 ttl=64 time=0.070 ms
...
# ping 192.168.10.91
PING 192.168.10.91 (192.168.10.91) 56(84) bytes of data.
64 bytes from 192.168.10.91: icmp_seq=1 ttl=64 time=0.062 ms
...
# arp -n
Address HWtype HWaddress Flags Mask Iface
192.168.10.90 ether 7a:25:84:fd:20:41 C vmbr0
192.168.10.91 ether 7a:25:84:fd:20:41 C vmbr0
... ^^^^^^^^^^^^^^^^^
same MAC!
As a reference, I have run this test repeatedly using the following templates:
ubuntu-12.04-standard_12.04-1_amd64 (Precise) BUG!!
ubuntu-14.04-standard_14.04-1_amd64 (Trusty) BUG!!
ubuntu-16.04-standard_16.04-1_amd64 (Xenial) OK
ubuntu-16.10-standard_16.10-1_amd64 (Yakkety) OK
debian-6.0-standard_6.0-7_amd64 (Squeeze) OK
debian-7.0-standard_7.11-1_amd64 (Wheezy) OK
debian-8.0-standard_8.6-1_amd64 (Jessie) OK
What gives?