Hello
I am experiencing a weird network bug in the latest Proxmox 4, kernels 4.4.13-2-pve and 4.4.15-1-pve, with Ubuntu containers. I'm not sure if I made a mistake in the network configuration, which is admittedly a bit convoluted, but the problem only appears with Ubuntu containers, not with Debian or CentOS, which is very strange.
You may want to skip to the text in bold below, and then go back if you need further info.
The network setup is as follows. I need to give some CTs both a local IP address (local to the other CTs and VMs on the same host) and a public IP address. Other CTs don't have a public IP address and they will only have the local one.
The host has:
The containers can be configured in 3 ways (or 4, depending on how you count):
With various Debian and CentOS containers, all these setups work perfectly.
With Ubuntu 14.04 containers, in the scenario where the CT has two addresses, the network is broken / unstable / intermittent! Using ping -I 10.0.C.C 8.8.8.8 sometimes results in 100% packet loss with no recovery; other times I have personally seen replies for 10 seconds, then 100% loss for 20 seconds, then replies for 10 seconds, then nothing again for 20 seconds, and so on! (No, I had not been smoking or drinking! I even rebooted the server a couple times and tried the same test again, with the same results.)
No iptables were installed in either the host or the guests, excluding the SNAT shown above.
In all the cases where ping (inside the CT) was getting packet loss, running tcpdump on the host showed that the packets were exiting the CT on eth1 and coming back on eth0, which of course is an asymmetric path and caused the CT to reject them:
Why is the return packet coming back on eth0, when the 10.0.C.C address is clearly assigned to eth1? Why does it only happen with Ubuntu guests?
The intermittent 10 seconds / 20 seconds cyclic behaviour is especially troubling. Between this and the asymmetric path, this smells of bridge issues. But how can a different CT guest OS affect the bridge internals? Especially a bridge with STP turned off? Is Ubuntu mucking with the mac addresses?
Of course, given the nature of the problem, if I configure only one network interface in the Ubuntu CT (either the one with the public address or the one with the local address) the problem disappears.
I have tested this with a brand new Ubuntu 14.04 CT created from the official template, where I have touched absolutely nothing, to confirm that it's not a local CT configuration issue. I have not tested other Ubuntu versions.
I hope this info is enough to allow reproducibility.
I am experiencing a weird network bug in the latest Proxmox 4, kernels 4.4.13-2-pve and 4.4.15-1-pve, with Ubuntu containers. I'm not sure if I made a mistake in the network configuration, which is admittedly a bit convoluted, but the problem only appears with Ubuntu containers, not with Debian or CentOS, which is very strange.
You may want to skip to the text in bold below, and then go back if you need further info.
The network setup is as follows. I need to give some CTs both a local IP address (local to the other CTs and VMs on the same host) and a public IP address. Other CTs don't have a public IP address and they will only have the local one.
The host has:
- eth0 with its own public IP address (H.H.H.H/29)
- vmbr0 with an address on the local network (10.0.0.1/16)
- an entry to route the public IP address of the container (Y.Y.Y.Y/32) to device vmbr0. This is needed because I can't give vmbr0 an address in the same network as Y.Y.Y.Y, because I'm only given that single routable /32 address by the provider.
- a SNAT to allow CTs without a public IP address to get onto the internet using their local IP address (10.0.C.C), masquerading as the host
- finally, the host has net.ipv4.ip_forward = 1
Code:
auto eth0
iface eth0 inet static
address H.H.H.H
netmask 255.255.255.248
gateway H.H.H.GW
auto vmbr0
iface vmbr0 inet static
address 10.0.0.1
netmask 255.255.0.0
bridge_ports none
bridge_stp off
bridge_waitport 0
bridge_fd 0
up ip route add Y.Y.Y.Y/32 dev vmbr0
# and other similar static routes...
up iptables -t nat -A POSTROUTING -s 10.0.0.0/16 -o eth0 -j SNAT --to H.H.H.H
- with a private 10.0.C.C/16 address and default gw 10.0.0.1
- with its public Y.Y.Y.Y/32 address and default gw H.H.H.H
- with both addresses on different interfaces and only one of the two default gateways set
With various Debian and CentOS containers, all these setups work perfectly.
With Ubuntu 14.04 containers, in the scenario where the CT has two addresses, the network is broken / unstable / intermittent! Using ping -I 10.0.C.C 8.8.8.8 sometimes results in 100% packet loss with no recovery; other times I have personally seen replies for 10 seconds, then 100% loss for 20 seconds, then replies for 10 seconds, then nothing again for 20 seconds, and so on! (No, I had not been smoking or drinking! I even rebooted the server a couple times and tried the same test again, with the same results.)
No iptables were installed in either the host or the guests, excluding the SNAT shown above.
In all the cases where ping (inside the CT) was getting packet loss, running tcpdump on the host showed that the packets were exiting the CT on eth1 and coming back on eth0, which of course is an asymmetric path and caused the CT to reject them:
Code:
# tcpdump -i vethNNNi1 -n icmp
19:15:11.379079 IP 10.0.C.C > 8.8.8.8: ICMP echo request, id 1890, seq 29, length 64
19:15:12.387255 IP 10.0.C.C > 8.8.8.8: ICMP echo request, id 1890, seq 30, length 64
19:15:13.395162 IP 10.0.C.C > 8.8.8.8: ICMP echo request, id 1890, seq 31, length 64
# tcpdump -i vethNNNi0 -n icmp
19:15:05.344956 IP 8.8.8.8 > 10.0.C.C: ICMP echo reply, id 1890, seq 23, length 64
19:15:06.352950 IP 8.8.8.8 > 10.0.C.C: ICMP echo reply, id 1890, seq 24, length 64
19:15:07.360994 IP 8.8.8.8 > 10.0.C.C: ICMP echo reply, id 1890, seq 25, length 64
The intermittent 10 seconds / 20 seconds cyclic behaviour is especially troubling. Between this and the asymmetric path, this smells of bridge issues. But how can a different CT guest OS affect the bridge internals? Especially a bridge with STP turned off? Is Ubuntu mucking with the mac addresses?
Of course, given the nature of the problem, if I configure only one network interface in the Ubuntu CT (either the one with the public address or the one with the local address) the problem disappears.
I have tested this with a brand new Ubuntu 14.04 CT created from the official template, where I have touched absolutely nothing, to confirm that it's not a local CT configuration issue. I have not tested other Ubuntu versions.
I hope this info is enough to allow reproducibility.