Situation: we're running KVM VMs based on Debian/bullseye on Proxmox/PVE with 10GB NICs, but experience packet loss with the virtio-net interfaces under high network traffic situations (RTP/UDP).
This underlying problem also exists on VMware clusters. VMware provides a solution for that: one can increase the ring buffer sizes (see https://kb.vmware.com/s/article/50121760), and the problem with packet loss goes away (we verified this ourselves).
Now, we'd love to do the same for the virtio-net NICs on the PVE KVM VMs - sadly this isn't yet supported:
To avoid any possible side effects, we use 1 dedicated guest VM per hypervisor in our test environment.
We use a dedicated bridge for the guest connected to the second port of the Intel X722 10G NIC.
Example for such a VM configuration:
The NIC on the host supports multiqueues (we tried different values for the multiqueue settings, but they sadly don't change the situation):
The RX size on the NIC of the *hypervisor* can be adjusted (though its settings don't really matter for our packet loss situation):
We're running latest PVE:
Hypervisor host specs:
We're wondering, whether we seem to be the only ones noticing such a performance/packet loss problem in 2022?
We're also aware, that when using SR-IOV instead, the packet loss problem disappears. But we'd like to avoid the drawbacks of SR-IOV and are looking for ways, how to get high network traffic also with virtio-net NICs, and understand what's actually the limiting factor here with virtio-net.
We were considering increasing qemu's hardcoded VIRTIO_NET_RX_QUEUE_DEFAULT_SIZE/VIRTIO_NET_TX_QUEUE_DEFAULT_SIZE settings in hw/net/virtio-net.c from 256 to something bigger. Someone else already tried this (https://github.com/qemu/qemu/pull/115), but sadly it didn't make its way to upstream, and before putting further efforts into it, we'd like to understand whether that's the right approach or if we're missing anything yet.
Does anyone have similar issues or has anyone suggestions, how to tackle this?
Could this is be related to a Linux bridge problem, and usage of e.g. OpenVSwitch might help?
Anything else we could try?
This underlying problem also exists on VMware clusters. VMware provides a solution for that: one can increase the ring buffer sizes (see https://kb.vmware.com/s/article/50121760), and the problem with packet loss goes away (we verified this ourselves).
Now, we'd love to do the same for the virtio-net NICs on the PVE KVM VMs - sadly this isn't yet supported:
# ethtool -G neth1 rx 4048 tx 4048
netlink error: Operation not supported
To avoid any possible side effects, we use 1 dedicated guest VM per hypervisor in our test environment.
We use a dedicated bridge for the guest connected to the second port of the Intel X722 10G NIC.
Example for such a VM configuration:
root@pve-test-00:~# qm config 100
boot: order=scsi0;ide2;net0
cores: 16
cpu: host
ide2: none,media=cdrom
machine: q35
memory: 52048
meta: creation-qemu=6.2.0,ctime=1661099025
name: Sp1
net0: virtio=C2:50:0E:F5:6E:BC,bridge=vmbr1,queues=4,tag=3762
net1: virtio=56:4C:C5:75:7D:79,bridge=vmbr1,firewall=1,tag=901
numa: 0
ostype: l26
scsi0: local-btrfs:100/vm-100-disk-0.raw,size=320G
scsihw: virtio-scsi-pci
smbios1: uuid=3a55db64-5fea-4165-bcf1-a640b0caf909
sockets: 1
vmgenid: f280627f-b65d-4b7d-b841-1966d64f7ff9
The NIC on the host supports multiqueues (we tried different values for the multiqueue settings, but they sadly don't change the situation):
root@pve-test-00:~# ethtool -l eno2
Channel parameters for eno2:
Pre-set maximums:
RX: n/a
TX: n/a
Other: 1
Combined: 32
Current hardware settings:
RX: n/a
TX: n/a
Other: 1
Combined: 20
The RX size on the NIC of the *hypervisor* can be adjusted (though its settings don't really matter for our packet loss situation):
root@pve-test-00:~# ethtool -g eno2
Ring parameters for eno2:
Pre-set maximums:
RX: 4096
RX Mini: n/a
RX Jumbo: n/a
TX: 4096
Current hardware settings:
RX: 2048
RX Mini: n/a
RX Jumbo: n/a
TX: 2048
We're running latest PVE:
root@pve-test-00:~# pveversion
pve-manager/7.2-7/d0dd0e85 (running kernel: 5.15.39-3-pve)
root@pve-test-00:~# uname -a
Linux pve-test-00 5.15.39-3-pve #2 SMP PVE 5.15.39-3 (Wed, 27 Jul 2022 13:45:39 +0200) x86_64 GNU/Linux
Hypervisor host specs:
- Lenovo SN550
- CPU Intel(R) Xeon(R) Silver 4210R CPU @ 2.40GHz with HT enabled
- 64 Gb RAM
- 480 SSD Micron SATA
- Net Ethernet Connection X722 for 10GbE backplane (i40e) [Ethernet controller: Intel Corporation Ethernet Connection X722 for 10GbE backplane (rev 09)]
We're wondering, whether we seem to be the only ones noticing such a performance/packet loss problem in 2022?
We're also aware, that when using SR-IOV instead, the packet loss problem disappears. But we'd like to avoid the drawbacks of SR-IOV and are looking for ways, how to get high network traffic also with virtio-net NICs, and understand what's actually the limiting factor here with virtio-net.
We were considering increasing qemu's hardcoded VIRTIO_NET_RX_QUEUE_DEFAULT_SIZE/VIRTIO_NET_TX_QUEUE_DEFAULT_SIZE settings in hw/net/virtio-net.c from 256 to something bigger. Someone else already tried this (https://github.com/qemu/qemu/pull/115), but sadly it didn't make its way to upstream, and before putting further efforts into it, we'd like to understand whether that's the right approach or if we're missing anything yet.
Does anyone have similar issues or has anyone suggestions, how to tackle this?
Could this is be related to a Linux bridge problem, and usage of e.g. OpenVSwitch might help?
Anything else we could try?