Network packet loss in high traffic VMs

Nov 25, 2019
8
2
23
Situation: we're running KVM VMs based on Debian/bullseye on Proxmox/PVE with 10GB NICs, but experience packet loss with the virtio-net interfaces under high network traffic situations (RTP/UDP).

This underlying problem also exists on VMware clusters. VMware provides a solution for that: one can increase the ring buffer sizes (see https://kb.vmware.com/s/article/50121760), and the problem with packet loss goes away (we verified this ourselves).

Now, we'd love to do the same for the virtio-net NICs on the PVE KVM VMs :) - sadly this isn't yet supported:

# ethtool -G neth1 rx 4048 tx 4048 netlink error: Operation not supported

To avoid any possible side effects, we use 1 dedicated guest VM per hypervisor in our test environment.
We use a dedicated bridge for the guest connected to the second port of the Intel X722 10G NIC.
Example for such a VM configuration:

root@pve-test-00:~# qm config 100 boot: order=scsi0;ide2;net0 cores: 16 cpu: host ide2: none,media=cdrom machine: q35 memory: 52048 meta: creation-qemu=6.2.0,ctime=1661099025 name: Sp1 net0: virtio=C2:50:0E:F5:6E:BC,bridge=vmbr1,queues=4,tag=3762 net1: virtio=56:4C:C5:75:7D:79,bridge=vmbr1,firewall=1,tag=901 numa: 0 ostype: l26 scsi0: local-btrfs:100/vm-100-disk-0.raw,size=320G scsihw: virtio-scsi-pci smbios1: uuid=3a55db64-5fea-4165-bcf1-a640b0caf909 sockets: 1 vmgenid: f280627f-b65d-4b7d-b841-1966d64f7ff9

The NIC on the host supports multiqueues (we tried different values for the multiqueue settings, but they sadly don't change the situation):

root@pve-test-00:~# ethtool -l eno2 Channel parameters for eno2: Pre-set maximums: RX: n/a TX: n/a Other: 1 Combined: 32 Current hardware settings: RX: n/a TX: n/a Other: 1 Combined: 20

The RX size on the NIC of the *hypervisor* can be adjusted (though its settings don't really matter for our packet loss situation):

root@pve-test-00:~# ethtool -g eno2 Ring parameters for eno2: Pre-set maximums: RX: 4096 RX Mini: n/a RX Jumbo: n/a TX: 4096 Current hardware settings: RX: 2048 RX Mini: n/a RX Jumbo: n/a TX: 2048

We're running latest PVE:

root@pve-test-00:~# pveversion pve-manager/7.2-7/d0dd0e85 (running kernel: 5.15.39-3-pve) root@pve-test-00:~# uname -a Linux pve-test-00 5.15.39-3-pve #2 SMP PVE 5.15.39-3 (Wed, 27 Jul 2022 13:45:39 +0200) x86_64 GNU/Linux

Hypervisor host specs:
  • Lenovo SN550
  • CPU Intel(R) Xeon(R) Silver 4210R CPU @ 2.40GHz with HT enabled
  • 64 Gb RAM
  • 480 SSD Micron SATA
  • Net Ethernet Connection X722 for 10GbE backplane (i40e) [Ethernet controller: Intel Corporation Ethernet Connection X722 for 10GbE backplane (rev 09)]
We're aware that some folks had similar issues back in 2014 (see https://groups.google.com/g/snabb-devel/c/ng78LbcaFgI?pli=1).
We're wondering, whether we seem to be the only ones noticing such a performance/packet loss problem in 2022? :)

We're also aware, that when using SR-IOV instead, the packet loss problem disappears. But we'd like to avoid the drawbacks of SR-IOV and are looking for ways, how to get high network traffic also with virtio-net NICs, and understand what's actually the limiting factor here with virtio-net.

We were considering increasing qemu's hardcoded VIRTIO_NET_RX_QUEUE_DEFAULT_SIZE/VIRTIO_NET_TX_QUEUE_DEFAULT_SIZE settings in hw/net/virtio-net.c from 256 to something bigger. Someone else already tried this (https://github.com/qemu/qemu/pull/115), but sadly it didn't make its way to upstream, and before putting further efforts into it, we'd like to understand whether that's the right approach or if we're missing anything yet.

Does anyone have similar issues or has anyone suggestions, how to tackle this?
Could this is be related to a Linux bridge problem, and usage of e.g. OpenVSwitch might help?
Anything else we could try?
 
  • Like
Reactions: zeha
Are you losing incoming or outgoing packets?
Is the packet loss permanent until a reboot of the VM?

Have you tried to increase the tx_queuelen of the tap interface of the VM?

We can't really say for sure whether the packet loss takes place incoming or outgoing, but we see less inbound traffic than outbound traffic on our test system, so it's "lost somewhere™" :)

I don't see what a reboot should solve here or maybe I misunderstand your question. To clarify: it's a permanent issue, reboots don't help at all, we can reproduce the issue at any time.

With "increase the tx_queuelen of the tap interface of the VM" you mean like:

Code:
ip link set dev tap100i0 txqueuelen 4096

executed on our PVE/hypervisor system? Do we need to align the network device inside the VM as well (it currently reports `[...] default qlen 1000`)?
 
We see an issue at a customer site with KVM (not Proxmox) losing incoming traffic completely practically shutting off the VM from the network. Outgoing packets are still sent.

As this issue is not easily reproducible the customer tries several things including the txqueuelen.

Ee currently have no solution yet, I was asking to see if you have the same issue. But it looks like something else.
 
Hi @mika

Maybe the buffer ring could be only the effect and not the primary cause!

I would test on the client side if IP mtu is the same for the all path from client to VM.

You could do it in a Linux sistem with tracepath.

If is not the same IP mtu, then your VM buffer ring will be filled with IP fragments that need to be reassembled to reconstruct each IP packet.

Also Chech L2MTU(Ethernet mtu) on any switch from client to VM. Some switches will not even try to reassemble Ethernet frames if they are received out of order. And Ethernet out of order can be seen if in the path from client to VM is a lacp interface(bond round robin).

Good luck / Bafta.
 
  • Like
Reactions: mika and gurubert
Hi Mika

I'm investigating the exact same symptoms on a system at work.
And the buffer-size should be changed on the underlying physical nics, I guess
I've found out that it's quite easy to increase the ring-buffersize using 'ethtool -G <interface> rx <size> tx <sixe>', but my problem is how to make this change permanent so to speak.
I think the buffersize should be set up BEFORE any derived bonds come online as discussed in this thread.
I can't seem to get it to work right now.
The suggestion on https://pve.proxmox.com/wiki/Network_Configuration using something like

iface eno1 inet manual
post-up ethtool -G <interface> tx 4096 rx 4096

in the /etc/network/interfaces apparently using the script called ethtool in the /etc/network/if-up.d directory doesn't work.
Using post-up /sbin/ethtool.... doesn't work either.
Some people suggest using a systemd-method, some suggest good-old rc.local - but generally there seems to be confusion about this and it is very badly documented in my opinion.
How should this be done? Does anyone know?
 
Last edited:
Hi vesalius

Thanks for your feedback!

>> Did you try: pre-up ethtool -G <interface> tx 4096 rx 4096?

No, actually I didn't. But now I will.
I've been having a hard time finding some documentation on this.
And i'll be back with some results and considerations.
 
Last edited:
Ok, I finally solved it.
It turns out, that I should have had the package ifupdown2 installed, which I didn't have.
When this is installed - the pre-up stuff works as expected.
I find this a little strange since these proxmox-hosts are all fully updated to the very latest version.
Shouldn't this update from ifupdown to ifupdown2 have taken place automatically, or is this sort of thing optional?
And by the way, after this small update/change I now have a tap* and fw* interface extra for each VM running on a host.
But I guess that is how it should be now?
Anyway thanks for your feedback!
 
  • Like
Reactions: vesalius
Definitely have seen, and personally experienced, some inconsistencies for ifupdown to ifupdown2 upgrades for those coming from older versions of PVE. New PVE 7.* installs not so much. Glad you have it sorted.
 
So you finally increased the ring buffer size of the physical network interface and the bridge interface to tx 4096 rx 4096 and the issue has gone?
I'm also having problems with losing udp/rtp packets since the upgrade to version 7.

Thanks in advance!
 
Last edited:
Hi mm553
Yes, I increased the buffer-sizes. After experimenting with the tx-size I settled for 4096 on both rx and tx.
And it has definitively helped. If I run my tests, there a can still be a few retransmisssions left, but in the order of 0-50 on my standard iperf3-test.
Before it could be in the thousands.
To sum up, I've done this change in buffersizes on the individual NIC's that are used in the bond on which the virtio-interfaces are based.
I hope you get it solved, and will be glad to assist, if I can.
 
Hi mm553
Yes, I increased the buffer-sizes. After experimenting with the tx-size I settled for 4096 on both rx and tx.
And it has definitively helped. If I run my tests, there a can still be a few retransmisssions left, but in the order of 0-50 on my standard iperf3-test.
Before it could be in the thousands.
To sum up, I've done this change in buffersizes on the individual NIC's that are used in the bond on which the virtio-interfaces are based.
I hope you get it solved, and will be glad to assist, if I can.
Ok, nice!
I tried with an e1000 adapter, and changed the rx- and tx ring-buffer in the vm to 1024, which helped in the first step. Then I figured out that multiqueue on the virtio interface also fixes my issue. Maybe you could verify this?
 
Ok, nice!
I tried with an e1000 adapter, and changed the rx- and tx ring-buffer in the vm to 1024, which helped in the first step. Then I figured out that multiqueue on the virtio interface also fixes my issue. Maybe you could verify this?
Out of curiousity where or how exactly are you changing things in the VM?

AS you figured out Virtio should almost always be used instead of e1000
 
Out of curiousity where or how exactly are you changing things in the VM?

As you can't change the ring buffer size from virtio interfaces in the vm, I've allocated an e1000 interface to it and here I could changed the size with ethtool from 256 to 1024.

AS you figured out Virtio should almost always be used instead of e1000
That's why I've tried with multiqueue afterwards.
 
Hi mm553
Yes, I increased the buffer-sizes. After experimenting with the tx-size I settled for 4096 on both rx and tx.
And it has definitively helped. If I run my tests, there a can still be a few retransmisssions left, but in the order of 0-50 on my standard iperf3-test.
Before it could be in the thousands.
To sum up, I've done this change in buffersizes on the individual NIC's that are used in the bond on which the virtio-interfaces are based.
I hope you get it solved, and will be glad to assist, if I can.

Hello,

I have the same problem. I can modify the buffer size of the physical network interface, but can't to set bridge interface size.

Can you tell me how to do it? Thanks.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!