Network packet loss in high traffic VMs

Hi,

I don't think you can do that, and that also makes sense, I think.
In the end it is the physical interfaces, that handle the input/output, so it makes sense that it is those interfaces, where you can modify the buffer-sizes etc. All the logical/virtual interfaces will then benefit from that.
Have you been unable to get any improved results by modifying the "real" interfaces buffersizes etc.?
 
I set the BUFFER SIZE of the real interface, then restart the network interface of the host, and restart the VM in the PVE panel, but it still has 30% packet loss.

Perhaps I should delete the VM and rebuild or reboot the host? It is not clear why.

I now set Multiqueue to 2 and the problem is solved.
 
Hi, in coming proxmox 7.3, the rx|tx buffer size of qemu nic has been bumped to 1024 and also have improvement on vm multi-queue.

with default queue=1, only 1 vm core is used to handle incoming traffic. if you have a lot of small packets per second, this core can saturated and you can have packet drop.
 
  • Like
Reactions: vesalius
@mika,
it is possible to change ring buffer on virtio-net vnic (I have not tested)
this should logically reduce the packets loss during intensive udp traffic

qemu-system-x86_64 -device virtio-net-pci,?
rx_queue_size=<uint16> - (default: 256)
tx_queue_size=<uint16> - (default: 256)

dont forgot to check others conditions, path L2MTU, VM tx_buffer (txqlen), VM scaling governor, hypervisor pci profile (max perf is mandatory for packet processing)
Additionnally it is important to notice that kvm hypervisor with virtio-net (vhost-net really) is not very efficient to process intensive network with many VMs. It is a linux kernel (not only) parrallelisation issue.
for example on single node (SMP) processing +1M pps is possible on single VM but it is difficult to achieve this distribution over 40 VMs
When you notice packet loss can you detail the node load and the load of each VM
finally I suggest to have a look at VM cpu STEAL time , it represent the amout of cpu the VM neaded but the hypervisor cannot give because used by another task or VM. (command top)
 
Last edited:
  • Like
Reactions: guletz
the default value has been bumped to 1024 in proxmox 7.3.
@spirit What is the cli command verify these new default setting? Should we check from Proxmox host cli or vm?

you need to increase queue to balance on multiple vm vcpus.
(as indeed, 1core is limited to 1-2 mpps)
any general advice on the number of queues relative to the number of VM vcpus? 1 to 1 a good starting point?
 
Last edited:
@spirit What is the cli command verify these new default setting? Should we check from Proxmox host cli or vm?


any general advice on the number of queues relative to the number of VM vcpus? 1 to 1 a good starting point?
queues should be lower or equal to number of cpus. (1 to 1 max, each queue use 1dedicated thread, so need 1 vcpu)
 
Hi, in coming proxmox 7.3, the rx|tx buffer size of qemu nic has been bumped to 1024 and also have improvement on vm multi-queue.

with default queue=1, only 1 vm core is used to handle incoming traffic. if you have a lot of small packets per second, this core can saturated and you can have packet drop.

Thanks!!

I'm planning to upgrade from 7.2 to 7.3. Will all this change require us to delete the VM and rebuild?

Previously, I increased the queue and the problem was solved, but with more and more TCP connections, the bandwidth of the VM cannot be fully utilized. I am not sure whether it has something to do with BUFFER SIZE.
 
the default value has been bumped to 1024 in proxmox 7.3.
Hi,
I am doing some qemu 7 tests , rx_queue_size=1024,tx_queue_size=1024 are present in vm cmdline
In the guestOS only rx queue is increased, tx stay at 256

did you notice that too ? is this normal behavior ?
 
hi, I have a TrueNAS SCALE VM and an Ubuntu VM.
When I configure VirtIO NIC and use smb to transfer the file, the file will be corrupted.
Using E1000 and RTL8139 is normal.
I use hex to view the file and find a lot of 00 bytes in it.
Could this be related to the issue?
PVE version: 7.3-3
 

Attachments

  • photo_2023-02-10_08-15-13.jpg
    photo_2023-02-10_08-15-13.jpg
    264.5 KB · Views: 10
Last edited:
hi, I have a TrueNAS SCALE VM and an Ubuntu VM.
When I configure VirtIO NIC and use smb to transfer the file, the file will be corrupted.
Using E1000 and RTL8139 is normal.
I use hex to view the file and find a lot of 00 bytes in it.
Could this be related to the issue?
PVE version: 7.3-3
Maybe it's a bug in freebsd virtio driver ?
 
Maybe it's a bug in freebsd virtio driver ?
But TrueNAS SCALE is based on Debian
And I have the same issue using virtio in Ubuntu (although they are both based on Debian
Could this be a Debian driver bug?
 
Last edited:
But TrueNAS SCALE is based on Debian
And I have the same issue using virtio in Ubuntu (although they are both based on Debian
Could this be a Debian driver bug?
this is strange. I have 4000 debian vm in production (stretch,buster,bulleseye with stock kernel), and I never have see this problem.
 
this is strange. I have 4000 debian vm in production (stretch,buster,bulleseye with stock kernel), and I never have see this problem.
I don't know if this problem has anything to do with my strange network structure.
I drew a diagram, it's kind of bad, but it should be readable XD

topology.png
 
I've solved the problem with multi queue on kernel 5.15, but since kernel 6.x the problem is back. I can't figure out where, why and how. The only workaround is to start with the old kernel again.
Really frustrating
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!