Hello,
I encountered a weird problem with my proxmox server. When trying to ssh into one of the LXC containers, I sometimes get "connection refused". With a retry, the connection works fine.
After capturing packets at different spots I found out that the problem occurs in the following scenario:
The following steps trigger the problem:
To avoid this issue, you can use DROP instead of REJECT as policy, or ensure that the vmbr0 bridge cache always knows where to reach your containers (e.g. constantly producing traffic). Any better idea?
Edit:
I opened a bug for it because there has been no reply in this thread.
I encountered a weird problem with my proxmox server. When trying to ssh into one of the LXC containers, I sometimes get "connection refused". With a retry, the connection works fine.
After capturing packets at different spots I found out that the problem occurs in the following scenario:
- PVE host is on IP 2001:db8::1
- There are 2 containers:
- 101 on 2001:db8::101
The pve firewall accepts ssh connections - 102 on 2001:db8::102
The pve firewall rejects ssh connections
- 101 on 2001:db8::101
- Both containers are bridged to vmbr0.
- For each container, pve creates an internal interface
- veth101i0 represents container 101 on vmbr0
- veth102i0 represents container 102 on vmbr0
- veth101i0 represents container 101 on vmbr0
The following steps trigger the problem:
- I want to ssh from the pve host to container 101
- The pve host initiates a tcp connection from 2001:db8::1 to 2001:db8::101
- The pve host looks for the IP 2001:db8::101 in its ndp cache and finds the mac address 01:23:45:67:89:ab
- A tcp packet is created with SYN flag, destination ip 2001:db8::101, and destination mac 01:23:45:67:89:ab. This packet is sent to vmbr0.
- The bridge does not have the bridge port for mac address 01:23:45:67:89:ab in its cache, the packet is flooded to all bridge ports.
- Now we have two copies of the same packet. One is on veth101i0, the other one on veth102i0.
- The copy on veth102i0 is processed first. It hits the ip6tables reject rule that is configured as reject-with tcp-reset. A tcp response packet with the RST flag is generated.
- Now the copy on veth101i0 is processed. It reaches the container where we send the SYN+ACK response.
- The pve host sees the response generated in step 7 first. The connection is aborted and a 'Connection Refused' error is shown by the ssh client.
- Now the pve host sees the response generated in step 8, but it doesn't match to any pending tcp connection. The response packet is therefore being ignored.
- Subsequent connection attempts work fine because vmbr0 has now cached on which bridge port 01:23:45:67:89:ab can be reached. Interface veth102i0 will no longer receive the initial packet and therefore no RST response is created.
To avoid this issue, you can use DROP instead of REJECT as policy, or ensure that the vmbr0 bridge cache always knows where to reach your containers (e.g. constantly producing traffic). Any better idea?
Code:
root@pve ~ # pveversion -v
proxmox-ve: 5.0-21 (running kernel: 4.10.17-3-pve)
pve-manager: 5.0-32 (running version: 5.0-32/2560e073)
pve-kernel-4.10.17-2-pve: 4.10.17-20
pve-kernel-4.10.15-1-pve: 4.10.15-15
pve-kernel-4.10.17-3-pve: 4.10.17-21
pve-kernel-4.10.17-1-pve: 4.10.17-18
libpve-http-server-perl: 2.0-6
lvm2: 2.02.168-pve3
corosync: 2.4.2-pve3
libqb0: 1.0.1-1
pve-cluster: 5.0-12
qemu-server: 5.0-15
pve-firmware: 2.0-2
libpve-common-perl: 5.0-18
libpve-guest-common-perl: 2.0-11
libpve-access-control: 5.0-6
libpve-storage-perl: 5.0-15
pve-libspice-server1: 0.12.8-3
vncterm: 1.5-2
pve-docs: 5.0-9
pve-qemu-kvm: 2.9.1-1
pve-container: 2.0-16
pve-firewall: 3.0-3
pve-ha-manager: 2.0-2
ksm-control-daemon: 1.2-2
glusterfs-client: 3.8.8-1
lxc-pve: 2.1.0-1
lxcfs: 2.0.7-pve4
criu: 2.11.1-1~bpo90
novnc-pve: 0.6-4
smartmontools: 6.5+svn4324-1
zfsutils-linux: 0.6.5.11-pve17~bpo90
Edit:
I opened a bug for it because there has been no reply in this thread.
Last edited: