TCP RST packets dropped by PVE Firewall

iBug

Member
Feb 20, 2020
20
2
23
USTC
ibug.io
I'm running into exactly the same issue as #56300. The previous thread was old and I have more details on that, so I thought I'd just open a new thread.
  • PVE version is almost up-to-date: proxmox-ve: 8.0.2 (running kernel: 6.2.16-6-pve)
  • VM → Firewall → Options → Firewall = No: No effect
  • VM → Firewall → Options → Input Policy / Output Policy = both ACCEPT: No effect
  • VM → Hardware → net0 → Uncheck "Firewall": Working normally
  • Writing nf_conntrack_allow_invalid: 1 to the OPTIONS section of /etc/pve/nodes/<node>/host.fw: Working normally (This solution comes from #55634)
Running tcpdump -ni any with appropriate filters suggests that the packet mutated in the fwbr* bridges and gets dropped as INVALID in the main vmbr*:

Without nf_conntrack_allow_invalid: 1:

Code:
tcpdump: data link type LINUX_SLL2
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
16:33:11.911184 veth101i1 P   IP 172.31.0.2.50198 > 172.31.1.11.80: Flags [S], seq 3404503761, win 64240, options [mss 1460,sackOK,TS val 178881785 ecr 0,nop,wscale 7], length 0
16:33:11.911202 fwln101i1 Out IP 172.31.0.2.50198 > 172.31.1.11.80: Flags [S], seq 3404503761, win 64240, options [mss 1460,sackOK,TS val 178881785 ecr 0,nop,wscale 7], length 0
16:33:11.911203 fwpr101p1 P   IP 172.31.0.2.50198 > 172.31.1.11.80: Flags [S], seq 3404503761, win 64240, options [mss 1460,sackOK,TS val 178881785 ecr 0,nop,wscale 7], length 0
16:33:11.911206 fwpr811p0 Out IP 172.31.0.2.50198 > 172.31.1.11.80: Flags [S], seq 3404503761, win 64240, options [mss 1460,sackOK,TS val 178881785 ecr 0,nop,wscale 7], length 0
16:33:11.911207 fwln811i0 P   IP 172.31.0.2.50198 > 172.31.1.11.80: Flags [S], seq 3404503761, win 64240, options [mss 1460,sackOK,TS val 178881785 ecr 0,nop,wscale 7], length 0
16:33:11.911213 tap811i0 Out IP 172.31.0.2.50198 > 172.31.1.11.80: Flags [S], seq 3404503761, win 64240, options [mss 1460,sackOK,TS val 178881785 ecr 0,nop,wscale 7], length 0
16:33:11.911262 tap811i0 P   IP 172.31.1.11.80 > 172.31.0.2.50198: Flags [R.], seq 0, ack 3404503762, win 0, length 0
16:33:11.911267 fwln811i0 Out IP 172.31.1.11.80 > 172.31.0.2.50198: Flags [R.], seq 0, ack 1, win 0, length 0
16:33:11.911269 fwpr811p0 P   IP 172.31.1.11.80 > 172.31.0.2.50198: Flags [R.], seq 0, ack 1, win 0, length 0
^C
9 packets captured
178 packets received by filter
0 packets dropped by kernel

With nf_conntrack_allow_invalid: 1:

Code:
tcpdump: data link type LINUX_SLL2
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
16:46:15.243002 veth101i1 P   IP 172.31.0.2.58784 > 172.31.1.11.80: Flags [S], seq 301948896, win 64240, options [mss 1460,sackOK,TS val 179665117 ecr 0,nop,wscale 7], length 0
16:46:15.243015 fwln101i1 Out IP 172.31.0.2.58784 > 172.31.1.11.80: Flags [S], seq 301948896, win 64240, options [mss 1460,sackOK,TS val 179665117 ecr 0,nop,wscale 7], length 0
16:46:15.243016 fwpr101p1 P   IP 172.31.0.2.58784 > 172.31.1.11.80: Flags [S], seq 301948896, win 64240, options [mss 1460,sackOK,TS val 179665117 ecr 0,nop,wscale 7], length 0
16:46:15.243020 fwpr811p0 Out IP 172.31.0.2.58784 > 172.31.1.11.80: Flags [S], seq 301948896, win 64240, options [mss 1460,sackOK,TS val 179665117 ecr 0,nop,wscale 7], length 0
16:46:15.243021 fwln811i0 P   IP 172.31.0.2.58784 > 172.31.1.11.80: Flags [S], seq 301948896, win 64240, options [mss 1460,sackOK,TS val 179665117 ecr 0,nop,wscale 7], length 0
16:46:15.243027 tap811i0 Out IP 172.31.0.2.58784 > 172.31.1.11.80: Flags [S], seq 301948896, win 64240, options [mss 1460,sackOK,TS val 179665117 ecr 0,nop,wscale 7], length 0
16:46:15.243076 tap811i0 P   IP 172.31.1.11.80 > 172.31.0.2.58784: Flags [R.], seq 0, ack 301948897, win 0, length 0
16:46:15.243081 fwln811i0 Out IP 172.31.1.11.80 > 172.31.0.2.58784: Flags [R.], seq 0, ack 1, win 0, length 0
16:46:15.243083 fwpr811p0 P   IP 172.31.1.11.80 > 172.31.0.2.58784: Flags [R.], seq 0, ack 1, win 0, length 0
16:46:15.243086 fwpr101p1 Out IP 172.31.1.11.80 > 172.31.0.2.58784: Flags [R.], seq 0, ack 1, win 0, length 0
16:46:15.243087 fwln101i1 P   IP 172.31.1.11.80 > 172.31.0.2.58784: Flags [R.], seq 0, ack 1, win 0, length 0
16:46:15.243090 veth101i1 Out IP 172.31.1.11.80 > 172.31.0.2.58784: Flags [R.], seq 0, ack 1, win 0, length 0
^C
12 packets captured
200 packets received by filter
0 packets dropped by kernel

Notable clues are:
  • The RST packet came out correctly from tap*, but its ACK number mutated into 1 after passing through fwbr* and coming out from fwln*.
  • Without nf_conntrack_allow_invalid: 1, the output is cut off after fwpr811p0 P and the packet did not come out into fwpr101p1, so it's dropped inside vmbr0 as INVALID.
The one thing I couldn't understand is how the ACK number changed. ebtables is disabled at cluster level and ebtables-save shows empty chains.

Even with interface-level firewall disabled (removing firewall=1 from net0), the ACK number is still wrong, but somehow doesn't get dropped:

Code:
17:19:11.029092 tap811i0 P   IP 172.31.1.11.80 > 172.31.0.2.39736: Flags [R.], seq 0, ack 3674488031, win 0, length 0
17:19:11.029100 fwpr101p1 Out IP 172.31.1.11.80 > 172.31.0.2.39736: Flags [R.], seq 0, ack 1, win 0, length 0
17:19:11.029104 fwln101i1 P   IP 172.31.1.11.80 > 172.31.0.2.39736: Flags [R.], seq 0, ack 1, win 0, length 0
17:19:11.029111 veth101i1 Out IP 172.31.1.11.80 > 172.31.0.2.39736: Flags [R.], seq 0, ack 1, win 0, length 0

Any ideas what went wrong?
 
Last edited:
Worked out the ACK number issue: By default tcpdump prints sequence numbers relative to previous packets. Adding -S to tcpdump options shows the correct numbers. Nothing wrong on this side.

Still wondering what's wrong with conntrack INVALID state.
 
After an hour's debugging, I draw the conclusion that it's a bug in PVE Firewall. I've submitted it as #4983.

To whoever stumbling upon this issue, go add nf_conntrack_allow_invalid: 1 to your host firewall config. This is the best workaround available at the moment.

The reason why the RST packet is considered INVALID is:
  • When it comes out from tap811i0, it enters fwbr811i0. Conntrack sees it and DESTROYs the corresponding connection.
  • The packet then enters vmbr0. By default, pve-firewall inserts this rule:
    Code:
    -A PVEFW-FORWARD -m conntrack --ctstate INVALID -j DROP
    Since the corresponding conntrack item has already been destroyed, it hits this rule and gets dropped.
Here's my experiment setup that leads to my discovery:

In order to make timing more prominent, I use tc to add artificial delay:

Code:
tc qdisc add dev tap811i0 root netem delay 200ms
tc qdisc add dev fwln811i0 root netem delay 200ms

Without nf_conntrack_allow_invalid, here's the output of tcpdump -ttSni any with appropriate filters. Blank lines are added for readability, and irrelevant details are trimmed.

Code:
listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
1696412047.886575 veth101i1 P   IP 172.31.0.2.47066 > 172.31.1.11.80: Flags [S]
1696412047.886592 fwln101i1 Out IP 172.31.0.2.47066 > 172.31.1.11.80: Flags [S]
1696412047.886594 fwpr101p1 P   IP 172.31.0.2.47066 > 172.31.1.11.80: Flags [S]
1696412047.886599 fwpr811p0 Out IP 172.31.0.2.47066 > 172.31.1.11.80: Flags [S]
1696412047.886600 fwln811i0 P   IP 172.31.0.2.47066 > 172.31.1.11.80: Flags [S]

1696412048.086620 tap811i0 Out IP 172.31.0.2.47066 > 172.31.1.11.80: Flags [S]
1696412048.086841 tap811i0 P   IP 172.31.1.11.80 > 172.31.0.2.47066: Flags [R.]

1696412048.286919 fwln811i0 Out IP 172.31.1.11.80 > 172.31.0.2.47066: Flags [R.]
1696412048.286930 fwpr811p0 P   IP 172.31.1.11.80 > 172.31.0.2.47066: Flags [R.]

Here's the output of conntrack -E -o timestamp with the same filters:

Code:
[1696412047.886657]         [NEW] tcp      6 120 SYN_SENT src=172.31.0.2 dst=172.31.1.11 sport=47066 dport=80 [UNREPLIED] src=172.31.1.11 dst=172.31.0.2 sport=80 dport=47066
[1696412048.086899]     [DESTROY] tcp      6 119 CLOSE src=172.31.0.2 dst=172.31.1.11 sport=47066 dport=80 [UNREPLIED] src=172.31.1.11 dst=172.31.0.2 sport=80 dport=47066

Comparing the timestamps, it's evident that the conntrack item is destroyed as soon as the RST packet appears from tap811i0.