Firewall randomly dropping all packets from time to time

Dec 26, 2019
6
0
6
37
Hi,

I have a cluster of three nodes, with a few VMs in staging no running and I see the following problem: from time to time (like 5 minutes, hourly, every minute, not stable at all) the firewall drops ALL the packets for one (or many) of my VMs. I can see in the log that the packets are dropped from my ip, although they work seconds before that. It's also not just one specific port etc. but every setting in the VM. The only thing that seems to fix it for some time is to migrate the VM to another host.

This is my setup:
  • 3 hosts
  • Datacenter Firewall
    • Options
      • Enabled
      • ebtables YES
      • Input DROP
      • Output Accept
    • Two Security Groups
      • one for SSH & ping with an IP set of 2 IPs based on 2 aliases
      • one for internal network communication
  • Node Firewall
    • enabled
    • no specific settings
  • VM Firewalls
    • enabled
    • log level to alert
    • added both groups on the interfaces they belong too
    • on some VMS specific routes (like port 80, 443 to specific ip)
Output for the log from a PING packet for example:

103 1 tap103i1-IN 26/Dec/2019:19:57:28 +0100 policy DROP: IN=fwbr103i1 OUT=fwbr103i1 PHYSIN=fwln103i1 PHYSOUT=tap103i1 MAC=xyz SRC=my-ip DST=server-ip LEN=84 TOS=0x00 PREC=0x00 TTL=57 ID=15022 PROTO=ICMP TYPE=8 CODE=0 ID=36689 SEQ=278

Sometime its one VM, sometimes all of them, doesn't matter which host etc. As soon as I migrate the VM the host the rules are working again and ssh / ping for example is working without something in the logs.

Any hints or info if I am missing something? With the current setup the firewall is unpredictable and unusable, so let's hope I'll be able to pin the problem down with your help, thanks!

Version running:

proxmox-ve: 6.1-2 (running kernel: 5.3.13-1-pve)
pve-manager: 6.1-5 (running version: 6.1-5/9bf06119)
pve-kernel-5.3: 6.1-1
pve-kernel-helper: 6.1-1
pve-kernel-5.3.13-1-pve: 5.3.13-1
ceph: 14.2.5-pve1
ceph-fuse: 14.2.5-pve1
corosync: 3.0.2-pve4
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
libjs-extjs: 6.0.1-10
libknet1: 1.13-pve1
libpve-access-control: 6.0-5
libpve-apiclient-perl: 3.0-2
libpve-common-perl: 6.0-9
libpve-guest-common-perl: 3.0-3
libpve-http-server-perl: 3.0-3
libpve-storage-perl: 6.1-3
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve3
lxc-pve: 3.2.1-1
lxcfs: 3.0.3-pve60
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.1-1
pve-cluster: 6.1-2
pve-container: 3.0-15
pve-docs: 6.1-3
pve-edk2-firmware: 2.20191127-1
pve-firewall: 4.0-9
pve-firmware: 3.0-4
pve-ha-manager: 3.0-8
pve-i18n: 2.0-3
pve-qemu-kvm: 4.1.1-2
pve-xtermjs: 3.13.2-1
qemu-server: 6.1-4
smartmontools: 7.0-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.2-pve2
 
Any ideas / pointers how I could debug the issue? I can't find a reliable pattern what's happening, just that I can see the drops in the logs from the host and VM .. and after some time even without doing something it works again, and some time later not working again etc. Thanks!
 
Any ideas / pointers how I could debug the issue? I can't find a reliable pattern what's happening, just that I can see the drops in the logs from the host and VM .. and after some time even without doing something it works again, and some time later not working again etc. Thanks!
I really don't known, I never have seen that in production. (with around 3000vms cluster)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!