Network packet loss in high traffic VMs

mika · Sep 12, 2022

Situation: we're running KVM VMs based on Debian/bullseye on Proxmox/PVE with 10GB NICs, but experience packet loss with the virtio-net interfaces under high network traffic situations (RTP/UDP).

This underlying problem also exists on VMware clusters. VMware provides a solution for that: one can increase the ring buffer sizes (see https://kb.vmware.com/s/article/50121760), and the problem with packet loss goes away (we verified this ourselves).

Now, we'd love to do the same for the virtio-net NICs on the PVE KVM VMs

- sadly this isn't yet supported:

# ethtool -G neth1 rx 4048 tx 4048 
netlink error: Operation not supported

To avoid any possible side effects, we use 1 dedicated guest VM per hypervisor in our test environment.
We use a dedicated bridge for the guest connected to the second port of the Intel X722 10G NIC.
Example for such a VM configuration:

root@pve-test-00:~# qm config 100 
boot: order=scsi0;ide2;net0 
cores: 16 
cpu: host 
ide2: none,media=cdrom 
machine: q35 
memory: 52048 
meta: creation-qemu=6.2.0,ctime=1661099025 
name: Sp1 
net0: virtio=C2:50:0E:F5:6E:BC,bridge=vmbr1,queues=4,tag=3762 
net1: virtio=56:4C:C5:75:7D:79,bridge=vmbr1,firewall=1,tag=901 
numa: 0 
ostype: l26 
scsi0: local-btrfs:100/vm-100-disk-0.raw,size=320G 
scsihw: virtio-scsi-pci 
smbios1: uuid=3a55db64-5fea-4165-bcf1-a640b0caf909 
sockets: 1 
vmgenid: f280627f-b65d-4b7d-b841-1966d64f7ff9

The NIC on the host supports multiqueues (we tried different values for the multiqueue settings, but they sadly don't change the situation):

root@pve-test-00:~# ethtool -l eno2 
Channel parameters for eno2: 
Pre-set maximums: 
RX:             n/a 
TX:             n/a 
Other:          1 
Combined:       32 
Current hardware settings: 
RX:             n/a 
TX:             n/a 
Other:          1 
Combined:       20

The RX size on the NIC of the *hypervisor* can be adjusted (though its settings don't really matter for our packet loss situation):

root@pve-test-00:~# ethtool -g eno2 
Ring parameters for eno2: 
Pre-set maximums: 
RX:             4096 
RX Mini:        n/a 
RX Jumbo:       n/a 
TX:             4096 
Current hardware settings: 
RX:             2048 
RX Mini:        n/a 
RX Jumbo:       n/a 
TX:             2048

We're running latest PVE:

root@pve-test-00:~# pveversion 
pve-manager/7.2-7/d0dd0e85 (running kernel: 5.15.39-3-pve)
root@pve-test-00:~# uname -a
Linux pve-test-00 5.15.39-3-pve #2 SMP PVE 5.15.39-3 (Wed, 27 Jul 2022 13:45:39 +0200) x86_64 GNU/Linux

Hypervisor host specs:

Lenovo SN550
CPU Intel(R) Xeon(R) Silver 4210R CPU @ 2.40GHz with HT enabled
64 Gb RAM
480 SSD Micron SATA
Net Ethernet Connection X722 for 10GbE backplane (i40e) [Ethernet controller: Intel Corporation Ethernet Connection X722 for 10GbE backplane (rev 09)]

We're aware that some folks had similar issues back in 2014 (see https://groups.google.com/g/snabb-devel/c/ng78LbcaFgI?pli=1).
We're wondering, whether we seem to be the only ones noticing such a performance/packet loss problem in 2022?

We're also aware, that when using SR-IOV instead, the packet loss problem disappears. But we'd like to avoid the drawbacks of SR-IOV and are looking for ways, how to get high network traffic also with virtio-net NICs, and understand what's actually the limiting factor here with virtio-net.

We were considering increasing qemu's hardcoded VIRTIO_NET_RX_QUEUE_DEFAULT_SIZE/VIRTIO_NET_TX_QUEUE_DEFAULT_SIZE settings in hw/net/virtio-net.c from 256 to something bigger. Someone else already tried this (https://github.com/qemu/qemu/pull/115), but sadly it didn't make its way to upstream, and before putting further efforts into it, we'd like to understand whether that's the right approach or if we're missing anything yet.

Does anyone have similar issues or has anyone suggestions, how to tackle this?
Could this is be related to a Linux bridge problem, and usage of e.g. OpenVSwitch might help?
Anything else we could try?

gurubert · Sep 12, 2022

Are you losing incoming or outgoing packets?
Is the packet loss permanent until a reboot of the VM?

Have you tried to increase the tx_queuelen of the tap interface of the VM?

mika · Sep 12, 2022

gurubert said:
Are you losing incoming or outgoing packets?
Is the packet loss permanent until a reboot of the VM?

Have you tried to increase the tx_queuelen of the tap interface of the VM?

We can't really say for sure whether the packet loss takes place incoming or outgoing, but we see less inbound traffic than outbound traffic on our test system, so it's "lost somewhere™"

I don't see what a reboot should solve here or maybe I misunderstand your question. To clarify: it's a permanent issue, reboots don't help at all, we can reproduce the issue at any time.

With "increase the tx_queuelen of the tap interface of the VM" you mean like:

Code:

ip link set dev tap100i0 txqueuelen 4096

executed on our PVE/hypervisor system? Do we need to align the network device inside the VM as well (it currently reports `[...] default qlen 1000`)?

gurubert · Sep 12, 2022

We see an issue at a customer site with KVM (not Proxmox) losing incoming traffic completely practically shutting off the VM from the network. Outgoing packets are still sent.

As this issue is not easily reproducible the customer tries several things including the txqueuelen.

Ee currently have no solution yet, I was asking to see if you have the same issue. But it looks like something else.

mika · Sep 12, 2022

Yes, this indeed sounds unrelated, AFAICS, thanks for your feedback, we'll test the txqueuelen changes for now.

guletz · Sep 12, 2022

Hi @mika

Maybe the buffer ring could be only the effect and not the primary cause!

I would test on the client side if IP mtu is the same for the all path from client to VM.

You could do it in a Linux sistem with tracepath.

If is not the same IP mtu, then your VM buffer ring will be filled with IP fragments that need to be reassembled to reconstruct each IP packet.

Also Chech L2MTU(Ethernet mtu) on any switch from client to VM. Some switches will not even try to reassemble Ethernet frames if they are received out of order. And Ethernet out of order can be seen if in the path from client to VM is a lacp interface(bond round robin).

Good luck / Bafta.

Hans Otto Lunde · Sep 12, 2022

Hi Mika

I'm investigating the exact same symptoms on a system at work.
And the buffer-size should be changed on the underlying physical nics, I guess
I've found out that it's quite easy to increase the ring-buffersize using 'ethtool -G <interface> rx <size> tx <sixe>', but my problem is how to make this change permanent so to speak.
I think the buffersize should be set up BEFORE any derived bonds come online as discussed in this thread.
I can't seem to get it to work right now.
The suggestion on https://pve.proxmox.com/wiki/Network_Configuration using something like

iface eno1 inet manual
post-up ethtool -G <interface> tx 4096 rx 4096

in the /etc/network/interfaces apparently using the script called ethtool in the /etc/network/if-up.d directory doesn't work.
Using post-up /sbin/ethtool.... doesn't work either.
Some people suggest using a systemd-method, some suggest good-old rc.local - but generally there seems to be confusion about this and it is very badly documented in my opinion.
How should this be done? Does anyone know?

vesalius · Sep 13, 2022

Did you try: pre-up ethtool -G <interface> tx 4096 rx 4096?

Hans Otto Lunde · Sep 14, 2022

Hi vesalius

Thanks for your feedback!

>> Did you try: pre-up ethtool -G <interface> tx 4096 rx 4096?

No, actually I didn't. But now I will.
I've been having a hard time finding some documentation on this.
And i'll be back with some results and considerations.

Hans Otto Lunde · Sep 15, 2022

Ok, I finally solved it.
It turns out, that I should have had the package ifupdown2 installed, which I didn't have.
When this is installed - the pre-up stuff works as expected.
I find this a little strange since these proxmox-hosts are all fully updated to the very latest version.
Shouldn't this update from ifupdown to ifupdown2 have taken place automatically, or is this sort of thing optional?
And by the way, after this small update/change I now have a tap* and fw* interface extra for each VM running on a host.
But I guess that is how it should be now?
Anyway thanks for your feedback!

vesalius · Sep 15, 2022

Definitely have seen, and personally experienced, some inconsistencies for ifupdown to ifupdown2 upgrades for those coming from older versions of PVE. New PVE 7.* installs not so much. Glad you have it sorted.

Hans Otto Lunde · Sep 15, 2022

So am I.
And I really appreciate your help.

Danish greetings / Hans Otto

gurubert · Sep 15, 2022

Hans Otto Lunde said:
I now have a tap* and fw* interface extra for each VM running on a host.

These are configured when you enable the firewall on the vNIC.

Hans Otto Lunde · Sep 16, 2022

Ok, but that must have been introduced on my systems by upgrading from ifupdown to ifupdown2.
Thanks for the info!

mm553 · Sep 27, 2022

So you finally increased the ring buffer size of the physical network interface and the bridge interface to tx 4096 rx 4096 and the issue has gone?
I'm also having problems with losing udp/rtp packets since the upgrade to version 7.

Thanks in advance!

Hans Otto Lunde · Sep 29, 2022

Hi mm553
Yes, I increased the buffer-sizes. After experimenting with the tx-size I settled for 4096 on both rx and tx.
And it has definitively helped. If I run my tests, there a can still be a few retransmisssions left, but in the order of 0-50 on my standard iperf3-test.
Before it could be in the thousands.
To sum up, I've done this change in buffersizes on the individual NIC's that are used in the bond on which the virtio-interfaces are based.
I hope you get it solved, and will be glad to assist, if I can.

mm553 · Sep 29, 2022

Hans Otto Lunde said:
Hi mm553
Yes, I increased the buffer-sizes. After experimenting with the tx-size I settled for 4096 on both rx and tx.
And it has definitively helped. If I run my tests, there a can still be a few retransmisssions left, but in the order of 0-50 on my standard iperf3-test.
Before it could be in the thousands.
To sum up, I've done this change in buffersizes on the individual NIC's that are used in the bond on which the virtio-interfaces are based.
I hope you get it solved, and will be glad to assist, if I can.

Ok, nice!
I tried with an e1000 adapter, and changed the rx- and tx ring-buffer in the vm to 1024, which helped in the first step. Then I figured out that multiqueue on the virtio interface also fixes my issue. Maybe you could verify this?

vesalius · Sep 29, 2022

mm553 said:
Ok, nice!
I tried with an e1000 adapter, and changed the rx- and tx ring-buffer in the vm to 1024, which helped in the first step. Then I figured out that multiqueue on the virtio interface also fixes my issue. Maybe you could verify this?

Out of curiousity where or how exactly are you changing things in the VM?

AS you figured out Virtio should almost always be used instead of e1000

mm553 · Sep 29, 2022

vesalius said:
Out of curiousity where or how exactly are you changing things in the VM?

As you can't change the ring buffer size from virtio interfaces in the vm, I've allocated an e1000 interface to it and here I could changed the size with ethtool from 256 to 1024.

vesalius said:
AS you figured out Virtio should almost always be used instead of e1000

That's why I've tried with multiqueue afterwards.

nicebug · Nov 17, 2022

Hans Otto Lunde said:
Hi mm553
Yes, I increased the buffer-sizes. After experimenting with the tx-size I settled for 4096 on both rx and tx.
And it has definitively helped. If I run my tests, there a can still be a few retransmisssions left, but in the order of 0-50 on my standard iperf3-test.
Before it could be in the thousands.
To sum up, I've done this change in buffersizes on the individual NIC's that are used in the bond on which the virtio-interfaces are based.
I hope you get it solved, and will be glad to assist, if I can.

Hello,

I have the same problem. I can modify the buffer size of the physical network interface, but can't to set bridge interface size.

Can you tell me how to do it? Thanks.

Network packet loss in high traffic VMs

Member

Distinguished Member

Member

Distinguished Member

Member

Famous Member

Active Member

Renowned Member

Active Member

Active Member

Renowned Member

Active Member

Distinguished Member

Active Member

Member

Active Member

Member

Renowned Member

Member

New Member