[SOLVED] Weird problem with VM running HA Proxy

snakeoilos · Mar 6, 2020

I have this weird problem with a VM running HAProxy.

Here's my setup.

VM#1

Runs HAProxy listening at 443
Subnet is 192.168.222.0/24
Handles all my SSL certificates
Based on the name, it will forward the HTTPS request to VM#2 at port 10000
Firewall rules:

VM#2

Runs docker
Subnet is also 192.168.222.0/24
Runs gitlab in a container, only mapping out port 443 to 10000 (i.e. 22/tcp, 80/tcp, 0.0.0.0:10000->443/tcp)
Firewall rules:

VM#3

Gitlab-runner, register to gitlab in VM#2
Subnet is 192.168.101.0/24
So the URL of Gitlab actually points to VM#1, which in turn proxies it back to VM#2
Firewall rules: None set

For the 3 VMs above, for Firewall options, Input policy is set to DROP, Output policy is set to Accept.

Firewall is enabled on DataCenter, the host, the VM, and all the NIC (VirtIO). There 3 rules set at DataCenter level to allow VNC, SSH and SPICE. No rules set host. VM rules see above.

The security group is set as such:

All VMs are running Ubuntu 18.0.4 LTS, using VirtIO interface. This is all running on a single host.

With the above setup. when I run "gitlab-runner verify" inside VM#3. I will get a I/O or TLS handshake timeout. Trying to browse the Gitlab site (using lynx) also results in an error. I just cannot get this VM#3 to connect to VM#1 at port 443. Netstat output from VM#1:

Code:

tcp        0      0 192.168.222.40:443      192.168.101.63:57302    SYN_RECV
tcp        0      0 192.168.222.40:443      192.168.101.63:57304    SYN_RECV

But if I try to connect from a normal PC (Different subnet), it works fine. I can also connect to Gitlab from the Internet.

I have tried adding the interface (net0) or leaving it blank, same problem. The only way to get VMs to connect is if I disable firewall on VM#1 (HAProxy) entirely.

With firewall activated on VM#1, what is stopping the VMs from talking to VM#1, but allowing Internet to pass through? This has to be a simple user error, but I can't seem to identify the problem. And I had already spent a week working on this.

Any help will be appreciated. TIA.

spirit · Mar 6, 2020

does it with with http instead https ?

I have already seeing this kind of problem with too big mtu, and ssl /ssh procotol not working, as they use tcp "do not fragment" bit. (you can try to reduce the mtu inside the vm for testing)

snakeoilos · Mar 6, 2020

Thanks.

spirit said:
does it with with http instead https ?

Same thing. Probably because I set HAProxy to go forward HTTP to HTTPS?

spirit said:
I have already seeing this kind of problem with too big mtu, and ssl /ssh procotol not working, as they use tcp "do not fragment" bit. (you can try to reduce the mtu inside the vm for testing)

You may be on to something there.

I am indeed using Jumbo frames (9000), although the MTU for those VMs in question are all set to the default of 1500. On the router side though (I'm using software firewall called Untangle), some of the NIC on Untangle are all set to 9000. (I'm only using MTU of 9000 between my home VLAN and NAS, the other VLANs are all on 1500).

I'll try and see what I can find out from sniffing on the firewall.

Any ideas why turning off the firewall on the VM will fix this?

snakeoilos · Mar 6, 2020

Thanks @spirit.

I'm seeing a packet with a payload of 3695 bytes. That shouldn't happen, hopefully this will just work if I manage to figure out why it's doing this.

Have already removed the MTU of 9000 from the bridge. Now may have to reboot the router for the change to take effect. Will try again once I can get that payload down to 1514 (or whatever the number should be).

snakeoilos · Mar 9, 2020

So, I have changed the MTU back to 1500. Wiped out all the config files in /etc/pve/firewall, and restart the same tests from the first post. Still not working, but this time the problem is fixed if I turn off firewall at VM#3.

Next, I'm trying the same test using another service - MQTT at port 1883 (not encrypted).

VM#1:

Ubuntu 18.04 LTS
Running mosquitto MQTT service
Fire wall, default INPUT is drop, default OUTPUT is accept.
MQTT rule:

VM#2:

Linux Ubuntu. Running on the same node as VM#1.
Firewall on, default INPUT is drop, default OUTPUT is accept. No other additional rules set
Publishing messages to MQTT service to VM#1 (using mosquitto_pub)

PC#1:

Windows 10. Physical machine.
Publishing messages to MQTT service to VM#1 (using windows version of mosquitto_pub)

Tests:

MQTT rule is disabled on VM#1. Firewall on NIC is enabled on VM#2:
- Both VM#2 and PC#1 cannot publish messages to VM#1
MQTT rule is enabled on VM#1. Firewall on NIC is enabled on VM#2:
- VM#2 cannot publish messages to VM#1, PC#1 can publish messages to VM#1
MQTT rule is enabled on VM#1. Firewall on NIC is disabled on VM#2
- Both VM#2 and PC#1 can now publish messages to VM#1

AFAICT, external traffic (non virtual machines) going into the VMs, the firewall rules work wonderfully as expected. Any dropped traffic appear in the logs (if I set output to debug).

But with inter-vm traffic, it seems if I enable firewall on the NIC on both VMs, inter VM network communications between the two will be cut (regardless of the firewall rules set). For some reason one of the VM's NIC must have firewall off for networking to work. Again AFAICT, dropped traffic do not show up in the firewall logs.

The network is VLAN tagged, I'm using Linux bridge and it's bridged to a 10 Gbps network back to a L2 switch.

What am I doing wrong here?

snakeoilos · Mar 17, 2020

I know it's a conntrack issue. But adding the rule manually to the chain doesn't seem to work. e.g.

Code:

 iptables -I tapXXXXi0-IN 1 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT

After some googling, finally found the answer by searching this very forum. from this thread.

Code:

sysctl -w net.netfilter.nf_conntrack_tcp_be_liberal=1

Too early to tell, but so far everything looks promising when this is set to true. The firewalls are finally working as specified in the rules - not just between real physical machines to the VMs, but between the VMs as well.

I'm not sure if this is a problem that's common to all, or only specific to me. But if you find your VMs can't talk to each other when both have firewall rules turn on - set the above to true and it'll work. Create a new file in /etc/sysctl.d/ with the above option should make it work on reboot. (Untested as I don't reboot my host).

Spent nearly a month on this problem. Glad to finally solve it.

Search

Search

[SOLVED] Weird problem with VM running HA Proxy

snakeoilos

Well-Known Member

spirit

Distinguished Member

snakeoilos

Well-Known Member

snakeoilos

Well-Known Member

snakeoilos

Well-Known Member

snakeoilos

Well-Known Member

We value your privacy