Random "connection refused" due to firewall from other container

Sebastian256

Active Member
Aug 12, 2017
4
4
43
Hello,

I encountered a weird problem with my proxmox server. When trying to ssh into one of the LXC containers, I sometimes get "connection refused". With a retry, the connection works fine.

After capturing packets at different spots I found out that the problem occurs in the following scenario:
  • PVE host is on IP 2001:db8::1
  • There are 2 containers:
    • 101 on 2001:db8::101
      The pve firewall accepts ssh connections
    • 102 on 2001:db8::102
      The pve firewall rejects ssh connections
  • Both containers are bridged to vmbr0.
  • For each container, pve creates an internal interface
    • veth101i0 represents container 101 on vmbr0
    • veth102i0 represents container 102 on vmbr0

The following steps trigger the problem:
  1. I want to ssh from the pve host to container 101
  2. The pve host initiates a tcp connection from 2001:db8::1 to 2001:db8::101
  3. The pve host looks for the IP 2001:db8::101 in its ndp cache and finds the mac address 01:23:45:67:89:ab
  4. A tcp packet is created with SYN flag, destination ip 2001:db8::101, and destination mac 01:23:45:67:89:ab. This packet is sent to vmbr0.
  5. The bridge does not have the bridge port for mac address 01:23:45:67:89:ab in its cache, the packet is flooded to all bridge ports.
  6. Now we have two copies of the same packet. One is on veth101i0, the other one on veth102i0.
  7. The copy on veth102i0 is processed first. It hits the ip6tables reject rule that is configured as reject-with tcp-reset. A tcp response packet with the RST flag is generated.
  8. Now the copy on veth101i0 is processed. It reaches the container where we send the SYN+ACK response.
  9. The pve host sees the response generated in step 7 first. The connection is aborted and a 'Connection Refused' error is shown by the ssh client.
  10. Now the pve host sees the response generated in step 8, but it doesn't match to any pending tcp connection. The response packet is therefore being ignored.
  11. Subsequent connection attempts work fine because vmbr0 has now cached on which bridge port 01:23:45:67:89:ab can be reached. Interface veth102i0 will no longer receive the initial packet and therefore no RST response is created.

To avoid this issue, you can use DROP instead of REJECT as policy, or ensure that the vmbr0 bridge cache always knows where to reach your containers (e.g. constantly producing traffic). Any better idea?

Code:
root@pve ~ # pveversion -v
proxmox-ve: 5.0-21 (running kernel: 4.10.17-3-pve)
pve-manager: 5.0-32 (running version: 5.0-32/2560e073)
pve-kernel-4.10.17-2-pve: 4.10.17-20
pve-kernel-4.10.15-1-pve: 4.10.15-15
pve-kernel-4.10.17-3-pve: 4.10.17-21
pve-kernel-4.10.17-1-pve: 4.10.17-18
libpve-http-server-perl: 2.0-6
lvm2: 2.02.168-pve3
corosync: 2.4.2-pve3
libqb0: 1.0.1-1
pve-cluster: 5.0-12
qemu-server: 5.0-15
pve-firmware: 2.0-2
libpve-common-perl: 5.0-18
libpve-guest-common-perl: 2.0-11
libpve-access-control: 5.0-6
libpve-storage-perl: 5.0-15
pve-libspice-server1: 0.12.8-3
vncterm: 1.5-2
pve-docs: 5.0-9
pve-qemu-kvm: 2.9.1-1
pve-container: 2.0-16
pve-firewall: 3.0-3
pve-ha-manager: 2.0-2
ksm-control-daemon: 1.2-2
glusterfs-client: 3.8.8-1
lxc-pve: 2.1.0-1
lxcfs: 2.0.7-pve4
criu: 2.11.1-1~bpo90
novnc-pve: 0.6-4
smartmontools: 6.5+svn4324-1
zfsutils-linux: 0.6.5.11-pve17~bpo90

Edit:
I opened a bug for it because there has been no reply in this thread.
 
Last edited:
Hello, I want up the topic. Same in 8.1
My nginx (reverse proxy, lxc) sometimes get error 'connection refused' when connect to lxc in the same node.
But if I disable pve-firewall - all working fine.

Code:
proxmox-ve: 8.1.0 (running kernel: 6.2.16-15-pve)
pve-manager: 8.1.3 (running version: 8.1.3/b46aac3b42da5d15)
proxmox-kernel-helper: 8.1.0
pve-kernel-6.2: 8.0.5
proxmox-kernel-6.5: 6.5.11-7
proxmox-kernel-6.5.11-7-pve-signed: 6.5.11-7
proxmox-kernel-6.2.16-20-pve: 6.2.16-20
proxmox-kernel-6.2: 6.2.16-20
proxmox-kernel-6.2.16-15-pve: 6.2.16-15
ceph-fuse: 16.2.11+ds-2
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx7
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.0
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.1
libpve-access-control: 8.0.7
libpve-apiclient-perl: 3.3.1
libpve-common-perl: 8.1.0
libpve-guest-common-perl: 5.0.6
libpve-http-server-perl: 5.0.5
libpve-network-perl: 0.9.5
libpve-rs-perl: 0.8.7
libpve-storage-perl: 8.0.5
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 5.0.2-4
lxcfs: 5.0.3-pve4
novnc-pve: 1.4.0-3
openvswitch-switch: 3.1.0-2
proxmox-backup-client: 3.1.2-1
proxmox-backup-file-restore: 3.1.2-1
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.2.2
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.3
proxmox-widget-toolkit: 4.1.3
pve-cluster: 8.0.5
pve-container: 5.0.8
pve-docs: 8.1.3
pve-edk2-firmware: 4.2023.08-2
pve-firewall: 5.0.3
pve-firmware: 3.9-1
pve-ha-manager: 4.0.3
pve-i18n: 3.1.4
pve-qemu-kvm: 8.1.2-4
pve-xtermjs: 5.3.0-2
qemu-server: 8.0.10
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.2-pve1
 
I encountered this problem as well and finally found someone who dug into it. More than six years ago ...
I had the most problems with VMs which have a lot of virtual nics. Setting the input policy to DROP rather than REJECT seems to resolve it.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!