Network Optimization for High-Volume UDP Traffic in PVE

PG1024

New Member
Feb 27, 2026
2
0
1
PVE_20260227140256_36_1091.png
LVS_20260227140321_37_1091.png
Hardware Specification: The physical server network card model is the 82599ES 10-Gigabit SFI/SFP.
Problem Description: A virtual machine is running an LVS (Linux Virtual Server) service. When a single-source IP generates UDP traffic exceeding approximately 300 Mbps, packet loss begins to occur on the network interface. The actual business requires handling peak traffic rates of around 400 Mbps.
Requirement: Besides PCIe Passthrough (NIC Direct Pass-through), are there any other solutions that can ensure a single-source IP can receive UDP traffic exceeding 1 Gbps normally?
Traffic Characteristics: The UDP packets primarily consist of firewall logs. While individual packets are not large, the volume is high, with an estimated rate of 100,000 to 200,000 packets per second.
Baseline Observation: The same physical machine running VMware vSphere 6.7 with an identical virtual machine configuration appears to handle the 400 Mbps traffic load stably during testing.
 
I have dealt with similar high PPS UDP cases, and 100k to 200k packets per second is usually the real problem, not the Mbps. Small packets kill you with interrupt overhead and softirq saturation.

Before jumping to passthrough, I would check vNIC type. Make sure you are using something like vmxnet3, not e1000. Then tune multiqueue, RSS, RPS, and RFS so traffic spreads across CPU cores. Also increase ring buffers with ethtool and check for drops at the driver level.

Pin vCPUs, isolate IRQs, and confirm CPU is not the bottleneck. In my experience, careful CPU and queue tuning often fixes it without full PCIe passthrough.
 
Hi PG1024,

I made several test using iperf with this result:

Server
VM, Ubuntu 24.04, 4CPU, 4GB RAM, PVE node 1
$iperf -su

Client 1
Physical, Ubuntu 25.04, 1Gbps LAN
$iperf -c 192.168.1.69 -u -t 120 -b 500M
[ ID] Interval Transfer Bandwidth Jitter Lost/Total Datagrams
[ 1] 0.0000-119.9998 sec 7.32 GBytes 524 Mbits/sec 0.027 ms 231/5349881 (0.0043%)

$iperf -c 192.168.1.69 -u -t 120 -b 1000M
[ ID] Interval Transfer Bandwidth Jitter Lost/Total Datagrams
[ 2] 0.0000-120.0009 sec 13.4 GBytes 957 Mbits/sec 0.024 ms 661/9765300 (0.0068%)

Client 2
VM, Ubuntu 24.04., 4CPU, 4GB RAM, 10Gbps LAN, PVE node 2
$iperf -c 192.168.1.69 -u -t 120 -b 1000M
[ ID] Interval Transfer Bandwidth Jitter Lost/Total Datagrams
[ 1] 0.0000-119.9999 sec 14.6 GBytes 1.05 Gbits/sec 0.011 ms 460/10699758 (0.0043%)

$iperf -c 192.168.1.69 -u -t 120 -b 5000M
[ ID] Interval Transfer Bandwidth Jitter Lost/Total Datagrams
[ 2] 0.0000-119.9996 sec 43.8 GBytes 3.14 Gbits/sec 0.006 ms 19684/32038968 (0.061%)

Can you specify the percentage of your high packet loss?

The limitation may be in VM configuration. You write you use the same VM image and configuration as in VMware. Can you use Network Device Mode = virtio? (Post complete VM configuration).
Can you post CPU and CPU Pressure Stall graphs from high packet loss period?

R.
 
as far I remember, virtio-net is limit is around 2millions pps by core (depend of the cpu frequency). The only way is to increase number of queue on the virtio nic.

(if you are cpu limited, you should see a vhost-net process at 100% on the pve host)

doing iperf with big packet will not help to test (virtio-net can reach 20~40gbit/s easily, it's not a problem, but it's with big packet.

doing iperf with "-l 64" to test worst case of synflood should show pss limit.
 
as far I remember, virtio-net is limit is around 2millions pps by core (depend of the cpu frequency). The only way is to increase number of queue on the virtio nic.

(if you are cpu limited, you should see a vhost-net process at 100% on the pve host)

doing iperf with big packet will not help to test (virtio-net can reach 20~40gbit/s easily, it's not a problem, but it's with big packet.

doing iperf with "-l 64" to test worst case of synflood should show pss limit.

My tests were made with default = 1470B packet size. Measured limit is 250566pps. I guess in my case the limit is single-thread CPU performance.
Using multi-thread in iperf I get 5.95 Gbits/sec 505548 pps but with high packet loss 16% - this may be the switch limitation.

I posted these numbers for the questioner to compare real word numbers...even made in our test lab.

R.
 
Have the tests been performed without any interfering L2/L3 equipment between VM and Client? If not, how did you rule out its not the networking equipment inbetween?
 
Have the tests been performed without any interfering L2/L3 equipment between VM and Client? If not, how did you rule out its not the networking equipment inbetween?
I am not sure I understand the question.

"Client 1" is physical machine (actually my work computer) traffic to VM goes thru two switches. "Client 2" is on another physical Proxmox node than "Server" (we have 3-node PVE cluster) in this case traffic goes thru one switch. In both cases the trafic goes thu physical ethernet cards. Common traffic in our company is around 0.5-2MB/s and it is less than 1% of measured numbers.

So I think my numbers may be trustable. There is one big but - those are is only network trafic measurements because PG1024's question is about UDP Proxmox incomming limits. Packet are not processed. When you process packets it takes time and can make packet proces limitation and make packet loss because there is packet queue size limitation and UDP can be droped by specification.

Is this answer to your question?

R.
 
I am not sure I understand the question.
So I think my numbers may be trustable. There is one big but - those are is only network trafic measurements because PG1024's question is about UDP Proxmox incomming limits. Packet are not processed. When you process packets it takes time and can make packet proces limitation and make packet loss because there is packet queue size limitation and UDP can be droped by specification.
I'm totally with you. Its not about the traffic amount, its about the pps, thats the bottleneck. My question was just aiming at the possibility that your network equipment might bottleneck aswell and was just curious if you ruled that possibility out.
 
I'm totally with you. Its not about the traffic amount, its about the pps, thats the bottleneck. My question was just aiming at the possibility that your network equipment might bottleneck aswell and was just curious if you ruled that possibility out.

My "Client 1" connection is 1Gbps and the number are very close (957 / 1024 = 93%) I guess here is not much space to improve. This twice as much then PG1024 issue point.

VM "Client 2" <-> "Server" connection is 10Gbis I measure (3.14 / 10Gbps = 31%) as I wrote it is probably single thread CPU limit of iperf. But this is almost 10x higher than PG1024's limit. On same envitonment using TCP I transfers 9.29 / 10 Gbps.

Actually I don't think about my bottleneck but about PG1024's bottleneck. If you want I can make more tests and try to find out the limits. At first place I need to solve VM CPU limits. But this is out scope this thread where PG1024 needs help with his VM's limits. Do you want to create such thread?

R.
 
Actually I don't think about my bottleneck but about PG1024's bottleneck. If you want I can make more tests and try to find out the limits. At first place I need to solve VM CPU limits. But this is out scope this thread where PG1024 needs help with his VM's limits. Do you want to create such thread?
Is your client able to multithread said application? I've seen (long time ago) a very nice talk from Valve about defending DDoS attacks in online games. Different topic but with the same technical aspect - PPS vs Bandwidth. Latter one neglectable while PPS are the real bummer. In case you're interested: https://youtu.be/2CQ1sxPppV4?t=462 (video starts at the interesting point for this discussion).

P.S.: Thanks for your answers, and clarifications, these helped me understand your problem in more detail :)
 
My tests were made with default = 1470B packet size. Measured limit is 250566pps. I guess in my case the limit is single-thread CPU performance.
Using multi-thread in iperf I get 5.95 Gbits/sec 505548 pps but with high packet loss 16% - this may be the switch limitation.

I posted these numbers for the questioner to compare real word numbers...even made in our test lab.

R.
250566pps is quite low, I mean , you should reach 1~2mpps for any each packetsize. I remember to reach easily 7~9gbit with 1core/thread with standard 1500mtu. (with epyc v3 3,5ghz and cpu forced to max frequency)
 
微信图片_20260228085128_38_1091.png微信图片_20260228085128_39_1091.png
微信图片_20260228085142_40_1091.png
  1. The situation is more severe when the virtual machine's network card model is set to vmxnet3. Monitoring with iftop -i enp6s18 shows that the bandwidth on the VM's interface does not exceed 200M, indicating significant data loss.
  2. iperf3 tests do not reveal obvious anomalies and show normal bandwidth figures.
  3. We have attempted several optimizations, including ethtool -G enp129sf01 rx 4096 tx 4096, enabling RPS, and removing the bonding configuration. However, the results were unsatisfactory and unstable.
  4. The related screenshots show the configurations for both the virtual machine and the PVE host. Could you please review them to see if there are any misconfigurations? Thank you.