Bridge issues with Ubuntu CT

Tobia

New Member
Aug 31, 2016
12
2
3
Hello

I am experiencing a weird network bug in the latest Proxmox 4, kernels 4.4.13-2-pve and 4.4.15-1-pve, with Ubuntu containers. I'm not sure if I made a mistake in the network configuration, which is admittedly a bit convoluted, but the problem only appears with Ubuntu containers, not with Debian or CentOS, which is very strange.

You may want to skip to the text in bold below, and then go back if you need further info.

The network setup is as follows. I need to give some CTs both a local IP address (local to the other CTs and VMs on the same host) and a public IP address. Other CTs don't have a public IP address and they will only have the local one.

The host has:
  • eth0 with its own public IP address (H.H.H.H/29)
  • vmbr0 with an address on the local network (10.0.0.1/16)
  • an entry to route the public IP address of the container (Y.Y.Y.Y/32) to device vmbr0. This is needed because I can't give vmbr0 an address in the same network as Y.Y.Y.Y, because I'm only given that single routable /32 address by the provider.
  • a SNAT to allow CTs without a public IP address to get onto the internet using their local IP address (10.0.C.C), masquerading as the host
  • finally, the host has net.ipv4.ip_forward = 1
Relevant /etc/network/interfaces of the host:
Code:
auto  eth0
iface eth0 inet static
    address         H.H.H.H
    netmask         255.255.255.248
    gateway         H.H.H.GW

auto  vmbr0
iface vmbr0 inet static
    address         10.0.0.1
    netmask         255.255.0.0
    bridge_ports    none
    bridge_stp      off
    bridge_waitport 0
    bridge_fd       0

    up ip route add Y.Y.Y.Y/32 dev vmbr0
    # and other similar static routes...

    up iptables -t nat -A POSTROUTING -s 10.0.0.0/16 -o eth0 -j SNAT --to H.H.H.H
The containers can be configured in 3 ways (or 4, depending on how you count):
  • with a private 10.0.C.C/16 address and default gw 10.0.0.1
  • with its public Y.Y.Y.Y/32 address and default gw H.H.H.H
  • with both addresses on different interfaces and only one of the two default gateways set
Then I proceed to ping 8.8.8.8 or any other public IP. In the scenarios where the CT has both a local and a public IP addresses, I can use ping -I to force the use of one or the other (the public IP of the CT; or the SNAT and public IP of the host.)

With various Debian and CentOS containers, all these setups work perfectly.

With Ubuntu 14.04 containers, in the scenario where the CT has two addresses, the network is broken / unstable / intermittent! Using ping -I 10.0.C.C 8.8.8.8 sometimes results in 100% packet loss with no recovery; other times I have personally seen replies for 10 seconds, then 100% loss for 20 seconds, then replies for 10 seconds, then nothing again for 20 seconds, and so on! (No, I had not been smoking or drinking! I even rebooted the server a couple times and tried the same test again, with the same results.)

No iptables were installed in either the host or the guests, excluding the SNAT shown above.

In all the cases where ping (inside the CT) was getting packet loss, running tcpdump on the host showed that the packets were exiting the CT on eth1 and coming back on eth0, which of course is an asymmetric path and caused the CT to reject them:
Code:
# tcpdump -i vethNNNi1 -n icmp
19:15:11.379079 IP 10.0.C.C > 8.8.8.8: ICMP echo request, id 1890, seq 29, length 64
19:15:12.387255 IP 10.0.C.C > 8.8.8.8: ICMP echo request, id 1890, seq 30, length 64
19:15:13.395162 IP 10.0.C.C > 8.8.8.8: ICMP echo request, id 1890, seq 31, length 64

# tcpdump -i vethNNNi0 -n icmp
19:15:05.344956 IP 8.8.8.8 > 10.0.C.C: ICMP echo reply, id 1890, seq 23, length 64
19:15:06.352950 IP 8.8.8.8 > 10.0.C.C: ICMP echo reply, id 1890, seq 24, length 64
19:15:07.360994 IP 8.8.8.8 > 10.0.C.C: ICMP echo reply, id 1890, seq 25, length 64
Why is the return packet coming back on eth0, when the 10.0.C.C address is clearly assigned to eth1? Why does it only happen with Ubuntu guests?

The intermittent 10 seconds / 20 seconds cyclic behaviour is especially troubling. Between this and the asymmetric path, this smells of bridge issues. But how can a different CT guest OS affect the bridge internals? Especially a bridge with STP turned off? Is Ubuntu mucking with the mac addresses?

Of course, given the nature of the problem, if I configure only one network interface in the Ubuntu CT (either the one with the public address or the one with the local address) the problem disappears.

I have tested this with a brand new Ubuntu 14.04 CT created from the official template, where I have touched absolutely nothing, to confirm that it's not a local CT configuration issue. I have not tested other Ubuntu versions.

I hope this info is enough to allow reproducibility.
 
> Why does it only happen with Ubuntu guests?

Could it be that the ubuntu ping binary does not properly honour the -l flag ? what about copying the ping binary from somewhere else ?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!