Strange Issue Using Virtio on 10Gb Network Adapters

dizzydre21

Member
Apr 10, 2023
60
2
13
Hello,

I'm having some strange behavior when using an i225-LM, x520-da or x550-t2 NIC and the Virtio network adapter in Proxmox. Those cards use theigb and ixgbe drivers, respectively. The issue only occurs on VMs in which I use Moonlight/Sunshine to gamestream from a VM with a GPU passed through. The video stream will be stuttery as all hell, or it won't connect at all at any bitrate. Changing the adapter to E1000 or E1000E seems to work without issue. Changing the NIC out for a Realtek 8125 based 2.5gbe NIC works just fine when using Virtio, even using the stock r8169 driver. I've ordered a Broadcom based 10gb SFP+ card to test something that isn't Intel based, but it won't arrive until this evening.

I would also mention that I haven't listed all of my hardware because this problem occurs on numerous machines namely Threadripper 7000, Intel LGA 1700, and AMD AM5 platforms.

Proxmox 8.4.1
EndeavourOS client
Win11 Pro and EndeavourOS Host (VM)
Both Nvidia 40xx and AMD 7xxx GPUs


I've tried disabling all offloading to no effect.
 
Last edited:
Hi!

Which kernel version are you using with these cards? Have you tried running other kernel versions? Could you provide a lspci -nnk of the network cards and if there's any interesting logs while these stutters persist in dmesg (e.g. dmesg | grep igb or dmesg | grep ixgbe)?
 
Hi!

Which kernel version are you using with these cards? Have you tried running other kernel versions? Could you provide a lspci -nnk of the network cards and if there's any interesting logs while these stutters persist in dmesg (e.g. dmesg | grep igb or dmesg | grep ixgbe)?
I'm on 6.14.5-1-bpo12-pvev currently, but it was happening on 6.8 as well. I'm not home right now, and have went back to the Realtek card for the time being. Just from memory, though, I didn't see anything unusual in dmesg or when running lspci -nnk. I intend to do some further troubleshooting this evening and can post the outputs of those with the Intel cards installed.

Of note, when using the x550 card and plugging it into a 2.5gb L2 switch, it would only link at 1gb, but would work without issue at that speed. I had an Epyc Milan system about a year ago with embedded x550 ports on the motherboard. I had to force advertisement of 2.5gbe and 5gbe to get it to link at those speeds, so that portion is probably due to the x550 not advertising multi-gig capabilities without user intervention. Still interesting though that there are no issues when linked at 1gb.
 
Hm, interesting, thanks for getting back so quick!

What is the exact switch model that you currently use? Are there any known issues between the i225-LM, x520-da or x550-t2 NICs and the switch you have? What does ethtool tell you about the NIC?
 
Hm, interesting, thanks for getting back so quick!

What is the exact switch model that you currently use? Are there any known issues between the i225-LM, x520-da or x550-t2 NICs and the switch you have? What does ethtool tell you about the NIC?
I have a Mikrotik CRS309-1G-8S+IN with a mix of transciever brands, but all of them support 1/2.5/5/10gbe. That switch is in L2 mode, so no routing. Port 1 is set up to pass all VLAN IDs and is wired to a pfsense box that does all the L3 stuff. Port 2-7 are set up for VLAN 1050. Port 8 also passes all VLAN IDs and is wired to a downstream TP-Link SG3218XP-M2 via 10gb DAC cable. That switch is sliced up for VLANs 1010, 1020, 1040, and 1050.

To my knowledge, there are no issues with the Intel NICs and either of my switches. I forgot to mention that if I use RDP on my Win11 VM, I can get in without issue at 10gb. I was also able to transfer a 65GB at ~500MBps from my NAS to the VM, which is about the max that my ZFS pool can do. It's almost like the NIC or maybe the driver is having issues with the bursts of traffic that is common when gamestreaming.
 
Hm, interesting, thanks for getting back so quick!

What is the exact switch model that you currently use? Are there any known issues between the i225-LM, x520-da or x550-t2 NICs and the switch you have? What does ethtool tell you about the NIC?
Reporting back.

I installed a Broadcom based 10gb NIC over the weekend and it has the same issues as the Intel cards. I also wanted to correct a mistake. The i-225 card does NOT have the same issues. I have a machine running one now that is working out without issue, so it is only when running at 10gb and it happens over SFP+ and with 10gbe Base-T.
 
To my knowledge, there are no issues with the Intel NICs and either of my switches. I forgot to mention that if I use RDP on my Win11 VM, I can get in without issue at 10gb. I was also able to transfer a 65GB at ~500MBps from my NAS to the VM, which is about the max that my ZFS pool can do. It's almost like the NIC or maybe the driver is having issues with the bursts of traffic that is common when gamestreaming.
The statistics of the NIC provided by ethtool would be interesting as they also can usually tell whether any errors occurred on the hardware itself and/or if there were any resolved by the kernel. Does a tcpdump indicate any failures to send packets (e.g. many TCP resends, etc.) between switch<->nic, nic<->bridge, bridge<->VM?
 
The statistics of the NIC provided by ethtool would be interesting as they also can usually tell whether any errors occurred on the hardware itself and/or if there were any resolved by the kernel. Does a tcpdump indicate any failures to send packets (e.g. many TCP resends, etc.) between switch<->nic, nic<->bridge, bridge<->VM?
I don't see any errors with ethtool:

Code:
ethtool enp8s0
Settings for enp8s0:
        Supported ports: [ FIBRE ]
        Supported link modes:   10000baseT/Full
        Supported pause frame use: Symmetric
        Supports auto-negotiation: No
        Supported FEC modes: Not reported
        Advertised link modes:  10000baseT/Full
        Advertised pause frame use: Symmetric
        Advertised auto-negotiation: No
        Advertised FEC modes: Not reported
        Speed: 10000Mb/s
        Duplex: Full
        Auto-negotiation: off
        Port: Direct Attach Copper
        PHYAD: 0
        Transceiver: internal
        Supports Wake-on: d
        Wake-on: d
        Current message level: 0x00000007 (7)
                               drv probe link
        Link detected: yes

Another curious thing is that using a Win11 VM and a Win11 client device I do not have any issues. If I instead try to get into my EndeavourOS VM from a Win11 client, I get the frame drops. The Win11 client has a 1gb Intel based NIC. Previously, I was using an EndeavourOS client that was hardwired using a 2.5gb realtek NIC. From that machine, I had frame drops on both the EndeavourOS and Win11 VMs (host side).

I may try some additional combinations tonight.
 
You are using DAC, is it certified for the Mikrotik? This sounds very similar to timing issues I had once with a Cisco switch, but that was CAT6a over 300ft, you basically have to adjust the packet timing on the link - firmware may solve it.

I don’t think this is a Proxmox/VM issue, I’m sure if you can fill the link on any system, after a few minutes you’ll see packet drops/link reset and then it will be fine again for a few minutes.
 
Last edited:
You are using DAC, is it certified for the Mikrotik? This sounds very similar to timing issues I had once with a Cisco switch, but that was CAT6a over 300ft, you basically have to adjust the packet timing on the link - firmware may solve it.

I don’t think this is a Proxmox/VM issue, I’m sure if you can fill the link on any system, after a few minutes you’ll see packet drops/link reset and then it will be fine again for a few minutes.
I don't know what certification to look for. The same issue happens with an x550 NIC over copper.
 
Certification means the vendor lists that brand and model DAC as supported. How do you go CAT6 - SFP for the Intel? Passive DAC generally are not recommended.

You also mention an upstream switch and VLAN. With these budget setups, I would recommend starting simple and test your gear first. Can it support the DAC you have, can it support 10G and do full streams between 2 ports without issue. Then start adding features like VLAN. If you have an issue with link stability over multiple NIC, then it may be the switch or setup. Eg it sounds like your 2.5G switch is routing the VLANs, is it capable of doing that once you got 2-3 clients on the 10G uplink
 
Certification means the vendor lists that brand and model DAC as supported. How do you go CAT6 - SFP for the Intel? Passive DAC generally are not recommended.

You also mention an upstream switch and VLAN. With these budget setups, I would recommend starting simple and test your gear first. Can it support the DAC you have, can it support 10G and do full streams between 2 ports without issue. Then start adding features like VLAN. If you have an issue with link stability over multiple NIC, then it may be the switch or setup. Eg it sounds like your 2.5G switch is routing the VLANs, is it capable of doing that once you got 2-3 clients on the 10G uplink
In this case, there is no routing between VLANs because the client and host machine are on the same VLAN. If the switch was doing the routing, it would indeed be awful, but even in that case, I have pfsense doing any inter-VLAN routing, not the switch. The pfsense box can easily do near line speed on a 1gbe link.

I don't know where to look regarding certifications, but the same issue occurs across several transceiver brands when I'm not using a DAC (intel x550). There are numerous reviews for my DAC cable saying that it works with Mikrotik switches, and several that also say they are also using it for Proxmox > Mikrotik.

I believe the issue here has something to do with the buffers filling up and dropping frames when the client NIC is not a 10gb or 1gb device. It also seems like it depends on whether or not the client is running Linux or Windows as Win11 to Win11 works if the client has a 1gb NIC. I do not have a Windows machine I can test 2.5gb on or I would.
 
So you’re routing between 10G VLAN on a 1G PfSense router?

Again, the Intel X550 presumably uses a CAT6a, you have a Mikrotik with SFP+ ports - they don’t just plug into each other. If it’s a switch issue, it doesn’t matter what NIC or wiring you have. If you have 1 dodgy wire anywhere the traffic flows, it doesn’t matter what clients do.

Yes, others may be able to do VLAN on a Mikrotik, your setup sounds very unique though, you also have a TP-Link 2.5G switch and you bring in a PfSense router in the mix.

Again, isolate, simplify, put two clients on the 10G, use a simple Proxmox or Ubuntu or whatever, let them talk to each other without any fancy VLAN setup, does it work, do you get 10G. If not, what exactly happens with your tests, both clients will have logs. Are they dropping packets, is the link resetting. Then try another 10G switch, another NIC, other cabling, other clients, what stays the same, what changes.

My suspicion is dodgy cabling or dodgy switch, not a big deal, return it, get a new one, preferably a better brand/model. The Chinese make all sorts of knockoffs whether it’s an Intel NIC or a Cisco switch, we all get bad products, as I said, I had a $15k Cisco switch with the same problem - had Leviton come and inspect the cabling, they confirmed it was a firmware issue.

https://www.intel.com/content/www/u.../500-series-network-adapters-up-to-10gbe.html
Note that SFP and SFP+ are not the same thing, they may work, they may not.
 
Last edited: