TCP RST packets dropped by PVE Firewall

iBug

Member
Feb 20, 2020
20
2
23
USTC
ibug.io
I'm running into exactly the same issue as #56300. The previous thread was old and I have more details on that, so I thought I'd just open a new thread.
  • PVE version is almost up-to-date: proxmox-ve: 8.0.2 (running kernel: 6.2.16-6-pve)
  • VM → Firewall → Options → Firewall = No: No effect
  • VM → Firewall → Options → Input Policy / Output Policy = both ACCEPT: No effect
  • VM → Hardware → net0 → Uncheck "Firewall": Working normally
  • Writing nf_conntrack_allow_invalid: 1 to the OPTIONS section of /etc/pve/nodes/<node>/host.fw: Working normally (This solution comes from #55634)
Running tcpdump -ni any with appropriate filters suggests that the packet mutated in the fwbr* bridges and gets dropped as INVALID in the main vmbr*:

Without nf_conntrack_allow_invalid: 1:

Code:
tcpdump: data link type LINUX_SLL2
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
16:33:11.911184 veth101i1 P   IP 172.31.0.2.50198 > 172.31.1.11.80: Flags [S], seq 3404503761, win 64240, options [mss 1460,sackOK,TS val 178881785 ecr 0,nop,wscale 7], length 0
16:33:11.911202 fwln101i1 Out IP 172.31.0.2.50198 > 172.31.1.11.80: Flags [S], seq 3404503761, win 64240, options [mss 1460,sackOK,TS val 178881785 ecr 0,nop,wscale 7], length 0
16:33:11.911203 fwpr101p1 P   IP 172.31.0.2.50198 > 172.31.1.11.80: Flags [S], seq 3404503761, win 64240, options [mss 1460,sackOK,TS val 178881785 ecr 0,nop,wscale 7], length 0
16:33:11.911206 fwpr811p0 Out IP 172.31.0.2.50198 > 172.31.1.11.80: Flags [S], seq 3404503761, win 64240, options [mss 1460,sackOK,TS val 178881785 ecr 0,nop,wscale 7], length 0
16:33:11.911207 fwln811i0 P   IP 172.31.0.2.50198 > 172.31.1.11.80: Flags [S], seq 3404503761, win 64240, options [mss 1460,sackOK,TS val 178881785 ecr 0,nop,wscale 7], length 0
16:33:11.911213 tap811i0 Out IP 172.31.0.2.50198 > 172.31.1.11.80: Flags [S], seq 3404503761, win 64240, options [mss 1460,sackOK,TS val 178881785 ecr 0,nop,wscale 7], length 0
16:33:11.911262 tap811i0 P   IP 172.31.1.11.80 > 172.31.0.2.50198: Flags [R.], seq 0, ack 3404503762, win 0, length 0
16:33:11.911267 fwln811i0 Out IP 172.31.1.11.80 > 172.31.0.2.50198: Flags [R.], seq 0, ack 1, win 0, length 0
16:33:11.911269 fwpr811p0 P   IP 172.31.1.11.80 > 172.31.0.2.50198: Flags [R.], seq 0, ack 1, win 0, length 0
^C
9 packets captured
178 packets received by filter
0 packets dropped by kernel

With nf_conntrack_allow_invalid: 1:

Code:
tcpdump: data link type LINUX_SLL2
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
16:46:15.243002 veth101i1 P   IP 172.31.0.2.58784 > 172.31.1.11.80: Flags [S], seq 301948896, win 64240, options [mss 1460,sackOK,TS val 179665117 ecr 0,nop,wscale 7], length 0
16:46:15.243015 fwln101i1 Out IP 172.31.0.2.58784 > 172.31.1.11.80: Flags [S], seq 301948896, win 64240, options [mss 1460,sackOK,TS val 179665117 ecr 0,nop,wscale 7], length 0
16:46:15.243016 fwpr101p1 P   IP 172.31.0.2.58784 > 172.31.1.11.80: Flags [S], seq 301948896, win 64240, options [mss 1460,sackOK,TS val 179665117 ecr 0,nop,wscale 7], length 0
16:46:15.243020 fwpr811p0 Out IP 172.31.0.2.58784 > 172.31.1.11.80: Flags [S], seq 301948896, win 64240, options [mss 1460,sackOK,TS val 179665117 ecr 0,nop,wscale 7], length 0
16:46:15.243021 fwln811i0 P   IP 172.31.0.2.58784 > 172.31.1.11.80: Flags [S], seq 301948896, win 64240, options [mss 1460,sackOK,TS val 179665117 ecr 0,nop,wscale 7], length 0
16:46:15.243027 tap811i0 Out IP 172.31.0.2.58784 > 172.31.1.11.80: Flags [S], seq 301948896, win 64240, options [mss 1460,sackOK,TS val 179665117 ecr 0,nop,wscale 7], length 0
16:46:15.243076 tap811i0 P   IP 172.31.1.11.80 > 172.31.0.2.58784: Flags [R.], seq 0, ack 301948897, win 0, length 0
16:46:15.243081 fwln811i0 Out IP 172.31.1.11.80 > 172.31.0.2.58784: Flags [R.], seq 0, ack 1, win 0, length 0
16:46:15.243083 fwpr811p0 P   IP 172.31.1.11.80 > 172.31.0.2.58784: Flags [R.], seq 0, ack 1, win 0, length 0
16:46:15.243086 fwpr101p1 Out IP 172.31.1.11.80 > 172.31.0.2.58784: Flags [R.], seq 0, ack 1, win 0, length 0
16:46:15.243087 fwln101i1 P   IP 172.31.1.11.80 > 172.31.0.2.58784: Flags [R.], seq 0, ack 1, win 0, length 0
16:46:15.243090 veth101i1 Out IP 172.31.1.11.80 > 172.31.0.2.58784: Flags [R.], seq 0, ack 1, win 0, length 0
^C
12 packets captured
200 packets received by filter
0 packets dropped by kernel

Notable clues are:
  • The RST packet came out correctly from tap*, but its ACK number mutated into 1 after passing through fwbr* and coming out from fwln*.
  • Without nf_conntrack_allow_invalid: 1, the output is cut off after fwpr811p0 P and the packet did not come out into fwpr101p1, so it's dropped inside vmbr0 as INVALID.
The one thing I couldn't understand is how the ACK number changed. ebtables is disabled at cluster level and ebtables-save shows empty chains.

Even with interface-level firewall disabled (removing firewall=1 from net0), the ACK number is still wrong, but somehow doesn't get dropped:

Code:
17:19:11.029092 tap811i0 P   IP 172.31.1.11.80 > 172.31.0.2.39736: Flags [R.], seq 0, ack 3674488031, win 0, length 0
17:19:11.029100 fwpr101p1 Out IP 172.31.1.11.80 > 172.31.0.2.39736: Flags [R.], seq 0, ack 1, win 0, length 0
17:19:11.029104 fwln101i1 P   IP 172.31.1.11.80 > 172.31.0.2.39736: Flags [R.], seq 0, ack 1, win 0, length 0
17:19:11.029111 veth101i1 Out IP 172.31.1.11.80 > 172.31.0.2.39736: Flags [R.], seq 0, ack 1, win 0, length 0

Any ideas what went wrong?
 
Last edited:
Worked out the ACK number issue: By default tcpdump prints sequence numbers relative to previous packets. Adding -S to tcpdump options shows the correct numbers. Nothing wrong on this side.

Still wondering what's wrong with conntrack INVALID state.
 
After an hour's debugging, I draw the conclusion that it's a bug in PVE Firewall. I've submitted it as #4983.

To whoever stumbling upon this issue, go add nf_conntrack_allow_invalid: 1 to your host firewall config. This is the best workaround available at the moment.

The reason why the RST packet is considered INVALID is:
  • When it comes out from tap811i0, it enters fwbr811i0. Conntrack sees it and DESTROYs the corresponding connection.
  • The packet then enters vmbr0. By default, pve-firewall inserts this rule:
    Code:
    -A PVEFW-FORWARD -m conntrack --ctstate INVALID -j DROP
    Since the corresponding conntrack item has already been destroyed, it hits this rule and gets dropped.
Here's my experiment setup that leads to my discovery:

In order to make timing more prominent, I use tc to add artificial delay:

Code:
tc qdisc add dev tap811i0 root netem delay 200ms
tc qdisc add dev fwln811i0 root netem delay 200ms

Without nf_conntrack_allow_invalid, here's the output of tcpdump -ttSni any with appropriate filters. Blank lines are added for readability, and irrelevant details are trimmed.

Code:
listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
1696412047.886575 veth101i1 P   IP 172.31.0.2.47066 > 172.31.1.11.80: Flags [S]
1696412047.886592 fwln101i1 Out IP 172.31.0.2.47066 > 172.31.1.11.80: Flags [S]
1696412047.886594 fwpr101p1 P   IP 172.31.0.2.47066 > 172.31.1.11.80: Flags [S]
1696412047.886599 fwpr811p0 Out IP 172.31.0.2.47066 > 172.31.1.11.80: Flags [S]
1696412047.886600 fwln811i0 P   IP 172.31.0.2.47066 > 172.31.1.11.80: Flags [S]

1696412048.086620 tap811i0 Out IP 172.31.0.2.47066 > 172.31.1.11.80: Flags [S]
1696412048.086841 tap811i0 P   IP 172.31.1.11.80 > 172.31.0.2.47066: Flags [R.]

1696412048.286919 fwln811i0 Out IP 172.31.1.11.80 > 172.31.0.2.47066: Flags [R.]
1696412048.286930 fwpr811p0 P   IP 172.31.1.11.80 > 172.31.0.2.47066: Flags [R.]

Here's the output of conntrack -E -o timestamp with the same filters:

Code:
[1696412047.886657]         [NEW] tcp      6 120 SYN_SENT src=172.31.0.2 dst=172.31.1.11 sport=47066 dport=80 [UNREPLIED] src=172.31.1.11 dst=172.31.0.2 sport=80 dport=47066
[1696412048.086899]     [DESTROY] tcp      6 119 CLOSE src=172.31.0.2 dst=172.31.1.11 sport=47066 dport=80 [UNREPLIED] src=172.31.1.11 dst=172.31.0.2 sport=80 dport=47066

Comparing the timestamps, it's evident that the conntrack item is destroyed as soon as the RST packet appears from tap811i0.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!