VM network interruptions and Conntrack weirdness

ManFriday · Sep 25, 2025

Hi!

Since I upgraded to PVE 9 I have had several complaints from our OS Admin customers saying that they get periodic network disconnects on their VMs.
This happens on isolated VMs, not necessarily to all the VMs on a bridge.

I correlated these events to these messages:

Sep 25 05:52:56 pxmx-host kernel: net_ratelimit: 534 callbacks suppressed
Sep 25 05:52:56 pxmx-host kernel: nf_conntrack: nf_conntrack: table full, dropping packet
Sep 25 05:52:56 pxmx-host kernel: nf_conntrack: nf_conntrack: table full, dropping packet
Sep 25 05:52:56 pxmx-host kernel: nf_conntrack: nf_conntrack: table full, dropping packet
Sep 25 05:52:56 pxmx-host kernel: nf_conntrack: nf_conntrack: table full, dropping packet
Sep 25 05:52:56 pxmx-host kernel: nf_conntrack: nf_conntrack: table full, dropping packet
Sep 25 05:52:56 pxmx-host kernel: nf_conntrack: nf_conntrack: table full, dropping packet
Sep 25 05:52:56 pxmx-host kernel: nf_conntrack: nf_conntrack: table full, dropping packet

it is during these periods they say they cannot ping some of their machines.

Anyone else seen similar? Any way to fix it?
Is it a good idea (or even possible) to increase the table size or will I just be shooting myself in the foot?

Thanks

bbgeek17 · Sep 26, 2025

Hi @ManFriday ,

Out of curiosity - how large if your environment? Nodes, VMs, etc?

What is your current max? sysctl net.netfilter.nf_conntrack_max

Write a basic loop to log number of connections, every 1-10seconds, along with date. That should tell you average use, as well as peak times:

Code:

while true; do
    echo "$(date '+%F %T') Count: $(cat /proc/sys/net/netfilter/nf_conntrack_count) / $(cat /proc/sys/net/netfilter/nf_conntrack_max)" >> /var/log/conntrack_usage.log
    sleep 5
done

You can somewhat safely raise the max number:

262144 entries - 80–100 MB RAM.
524288 entries - 160–200 MB RAM.

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

ManFriday · Sep 26, 2025

Hey there @bbgeek17
we are up to 13 hosts and about 864 powered on VMs.

currently I have 262144 for conntrack_max.
I have a TB of ram in each host so I imagine I can try bumping up that max.

bbgeek17 · Sep 26, 2025

ManFriday said:
I have a TB of ram in each host so I imagine I can try bumping up that max.

Don't forget to ensure that you make the change persistent, or you will be in for a surprise sometime after next reboot.
I'd even recommend testing it by rebooting the host and checking that it was set again.

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

ManFriday · Sep 26, 2025

bbgeek17 said:
Don't forget to ensure that you make the change persistent, or you will be in for a surprise sometime after next reboot.
I'd even recommend testing it by rebooting the host and checking that it was set again.

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

I created a file in /etc/sysctl.d/99-conntrack.conf
net.netfilter.nf_conntrack_max = 524288

restart the service and do
sysctl -n net.netfilter.nf_conntrack_max

it returns the correct 524288, but then a little while later it reverts to the original 262144.

is there somewhere else I need to modify this?

bbgeek17 · Sep 26, 2025

ManFriday said:
it returns the correct 524288, but then a little while later it reverts to the original 262144.

Curious.
Perhaps try:
echo "options nf_conntrack nf_conntrack_max=524288" > /etc/modprobe.d/nf_conntrack.conf

You'll need to reboot, or unload/reload the module which will cause connection reset.

Not sure, of the top my head, what could be causing the reversion.

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

VictorSTS · Sep 26, 2025

Maybe you have it set at host's level firewall on PVE?
Check on webUI or in /etc/pve/nodes/<NODE>/host.fw

bbgeek17 · Sep 26, 2025

One other thought, so you don't experiment on your production systems - install a PVE as VM and get the process worked out there.

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

ManFriday · Sep 26, 2025

VictorSTS said:
Maybe you have it set at host's level firewall on PVE?
Check on webUI or in /etc/pve/nodes/<NODE>/host.fw

Ahh look at that! I didnt even know that setting existed.
It is currently set to 'default' so im guessing thats the issue.

Thanks so much Victor!

ManFriday · Sep 26, 2025

bbgeek17 said:
One other thought, so you don't experiment on your production systems - install a PVE as VM and get the process worked out there.

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

Oh yeah! that reminds me. I have a dev environment for this sort of shenanigans!

VictorSTS · Sep 27, 2025

ManFriday said:
Ahh look at that! I didnt even know that setting existed.
It is currently set to 'default' so im guessing thats the issue.

Thanks so much Victor!

Manual [1] says default is 262144, although it should not apply unless you have firewall enabled both for the host and at Datacenter level.
I would also think that "default" could mean "use whatever is in the system", but it does in fact apply PVE's default for the value. Which, OTOH, is high enough for many use cases. I would take a look at what is consuming that many connections, maybe you have some misbehaving application / VM / customer / or some kind of DoS attack.

[1] https://pve.proxmox.com/pve-docs/chapter-pve-firewall.html#pve_firewall_host_specific_configuration

ManFriday · Sep 29, 2025

VictorSTS said:
Manual [1] says default is 262144, although it should not apply unless you have firewall enabled both for the host and at Datacenter level.
I would also think that "default" could mean "use whatever is in the system", but it does in fact apply PVE's default for the value. Which, OTOH, is high enough for many use cases. I would take a look at what is consuming that many connections, maybe you have some misbehaving application / VM / customer / or some kind of DoS attack.

[1] https://pve.proxmox.com/pve-docs/chapter-pve-firewall.html#pve_firewall_host_specific_configuration

It was occurring during Veeam backups.
The VLAN the veeam workers use is on the same 10g uplink as the VM traffic.
Not idea, I realize.
We are working on separating the veeam backup traffic onto its own 10G uplink.

Search

Search

VM network interruptions and Conntrack weirdness

ManFriday

New Member

bbgeek17

Distinguished Member

ManFriday

New Member

bbgeek17

Distinguished Member

ManFriday

New Member

bbgeek17

Distinguished Member

VictorSTS

Distinguished Member

bbgeek17

Distinguished Member

ManFriday

New Member

ManFriday

New Member

VictorSTS

Distinguished Member

ManFriday

New Member

We value your privacy