Proxmox Weird Nating issue limiting throughput, Maybe CPU related?

mikos

Member
Jun 23, 2022
21
1
8
Dear,

i have a weird issue with Proxmox servers, hosted currently in OVH, and i am having troubles troubleshooting the issue, i do not think it's OVH related, but not 100% positive.
Hopefully some of you folks can point me to the right direction.

two dedicated servers, clustered using private network vRack, all traffic, private and public goes through the vRack, witch is vmbr1.
One server is Intel, the other is AMD EPYC.
I use a VM with public and private IP's internet facing, and other VM's with private IP's behind it, the VM can be simple debian machine or a Pfsense one.

The issue summary is that whenever i have a VM, on the AMD EPYC server NATING traffic for another VM, the throughput is limited to 5 or 10Mbits, the internet facing machine itself or any other machine directly connected to vmbr1 has full speed, 500Mbits+
Migrating the default gateway VM or the internet facing one to the Intel server everything works as expected.

I have tried several things to conclude the issue could be with the CPU
speedtest run through iperf or public speed tests websites
- change virtual network card on VM's
- change CPU models to KVM and host
- increased RAM and CPU
- SDN with VLAN's
- Bridges
- VM to VM through public network and private network

Please note there are no limitations or capping in Proxmox host or VM configurations

Please also find below

Intel server
Code:
proxmox-ve: 8.2.0 (running kernel: 6.8.12-2-pve)
pve-manager: 8.2.5 (running version: 8.2.5/12c0a59769080547)
proxmox-kernel-helper: 8.1.0
proxmox-kernel-6.8: 6.8.12-2
proxmox-kernel-6.8.12-2-pve-signed: 6.8.12-2
proxmox-kernel-6.8.8-4-pve-signed: 6.8.8-4
proxmox-kernel-6.8.4-3-pve-signed: 6.8.4-3
proxmox-kernel-6.5.13-6-pve-signed: 6.5.13-6
proxmox-kernel-6.5: 6.5.13-6
amd64-microcode: 3.20240820.1~deb12u1
ceph-fuse: 16.2.11+ds-2
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx9
intel-microcode: 3.20240813.1~deb12u1
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.1
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.4
libpve-access-control: 8.1.4
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.0.7
libpve-cluster-perl: 8.0.7
libpve-common-perl: 8.2.2
libpve-guest-common-perl: 5.1.4
libpve-http-server-perl: 5.1.0
libpve-network-perl: 0.9.8
libpve-rs-perl: 0.8.10
libpve-storage-perl: 8.2.4
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.4.0-4
proxmox-backup-client: 3.2.7-1
proxmox-backup-file-restore: 3.2.7-1
proxmox-firewall: 0.5.0
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.2.3
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.7
proxmox-widget-toolkit: 4.2.3
pve-cluster: 8.0.7
pve-container: 5.2.0
pve-docs: 8.2.3
pve-edk2-firmware: not correctly installed
pve-esxi-import-tools: 0.7.2
pve-firewall: 5.0.7
pve-firmware: 3.13-2
pve-ha-manager: 4.0.5
pve-i18n: 3.2.3
pve-qemu-kvm: 9.0.2-3
pve-xtermjs: 5.3.0-3
qemu-server: 8.2.4
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.6-pve1

AMD Server
Code:
proxmox-ve: 8.2.0 (running kernel: 6.8.12-2-pve)
pve-manager: 8.2.7 (running version: 8.2.7/3e0176e6bb2ade3b)
proxmox-kernel-helper: 8.1.0
proxmox-kernel-6.8: 6.8.12-2
proxmox-kernel-6.8.12-2-pve-signed: 6.8.12-2
proxmox-kernel-6.8.12-1-pve-signed: 6.8.12-1
amd64-microcode: 3.20240820.1~deb12u1
ceph-fuse: 16.2.11+ds-2
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx9
intel-microcode: 3.20240813.1~deb12u1
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.1
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.4
libpve-access-control: 8.1.4
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.0.7
libpve-cluster-perl: 8.0.7
libpve-common-perl: 8.2.3
libpve-guest-common-perl: 5.1.4
libpve-http-server-perl: 5.1.1
libpve-network-perl: 0.9.8
libpve-rs-perl: 0.8.10
libpve-storage-perl: 8.2.5
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.4.0-4
proxmox-backup-client: 3.2.7-1
proxmox-backup-file-restore: 3.2.7-1
proxmox-firewall: 0.5.0
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.2.3
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.7
proxmox-widget-toolkit: 4.2.3
pve-cluster: 8.0.7
pve-container: 5.2.0
pve-docs: 8.2.3
pve-edk2-firmware: not correctly installed
pve-esxi-import-tools: 0.7.2
pve-firewall: 5.0.7
pve-firmware: 3.13-2
pve-ha-manager: 4.0.5
pve-i18n: 3.2.3
pve-qemu-kvm: 9.0.2-3
pve-xtermjs: 5.3.0-3
qemu-server: 8.2.4
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.6-pve1
 
UPDATE:

Seems the upload from the private VM is much better then the download, almost reaching 50% of the total upload bandwidth
Disabling the firewall at the cluster level improves the download speed for a while, then back to 10% again

strange
 
UPDATE:

InterVLAN routing via pfSense on Intel and AMD server is around 1.5Gb ( Although VM to VM L2 is around 5Gb )
so L3 capabilities for current CPU's does not seem to be the problem here.
 
UPDATE:

Okay sorry for many updates, seems the issue is not with the NATING on Proxmox

iperf3 tests from the host to the vm and vise-versa, going through pfSense box is reaching 1.5Gb which is the same as interVlan speed as is what expected.

not sure if i need to start considering OVH problem here, but still how can you explain the Intel vs AMD issue?

confusing