Slow Download Speed Linux Guests

Nov 18, 2024
2
2
1
We're experiencing weird download speeds on Linux VMs in our Troy Proxmox cluster. Upload speeds are fine (300-500 Mbps) but downloads are not what we'd expect (6-15 Mbps). This only happens with Linux VMs using VirtIO adapters - our Windows VMs work fine.

What we've found:
• When using VirtIO adapters on Linux VMs in Troy: downloads ~10-70 Mbps, uploads ~300+ Mbps
• In Albany Linux & Windows Guests (different Proxmox cluster): downloads ~1000+ Mbps, uploads ~500+ Mbps
• In Troy with Windows Guests : downloads ~1000+ Mbps, uploads ~500+ Mbps
• Host speed tests are fine (~600-1400 Mbps)
• Problem follows VMs when migrated between hosts in Troy
• Tested with multiple Linux VMs, all show the same behavior with VirtIO

Weird behavior with VMware adapters:
• Two specific VMs get good downloads (~800 Mbps) with vmxnet3 but poor download speeds with all other adapter types
• Other VMs still have poor downloads with vmxnet3
• When migrating these VMs to other hosts in the cluster, the good/bad performance follows the VM to the destination host

What we've checked:
• VirtIO drivers loaded properly: "grep -i virtio /boot/config-$(uname -r) | grep CONFIG_VIRTIO_NET" returns CONFIG_VIRTIO_NET=y
• TCP buffer sizes, window settings
• MTU settings (1500 on both)
• NUMA config
• QoS/traffic shaping
• Firewall rules
• Network offloading features

What we've tried:
• Disabled offloading with ethtool -K ens18 gro/tso/gso off
• Migrated VMs between hosts
• Compared identical settings between Troy and Albany VMs

I've run iperf3 tests between vms and get a lot of dropped packets with UDP tests:
iper3-udp-1.jpg

iper3-udp-2.jpg

Packets will drop like pictured above when doing iperf3 tests between two VMs that sit on the same host.

I worked with our network hardware vendor for hours yesterday and they could not determine any issue on the physical network layer. At this point I'm not sure where to turn other than the virtualization infrastructure or the guest. I'm leaning in the direction of an issue with drivers but I'm not sure where to begin to troubleshoot that.

The strange part is we can't find any meaningful differences in network configuration between the VMs in Troy vs Albany. Since some VMs work fine with vmxnet3 but not others, and the same template works fine in Albany, we're thinking it might be some Proxmox-level setting or driver issue we're missing.

I've attached a spreadsheet with speed test results using various settings to demonstrate the issue.

Another note about this, our workload was migrated from vmWare to Proxmox a few months ago.

Any ideas what could cause this behavior?

Thank you, any help is greatly appreciated.
 

Attachments

Last edited:
I've pinned down the speed issue. It's a bug or incompatibility with more recent Ubuntu Linux Kernel releases and more recent Broadcom fiber NIC firmware releases.

Ubuntu kernel releases 6.8.0-53 through 6.8.0-58 (latest) when combined with Broadcom NIC firmware versions, 22.92.06.10, 22.92.07.50, or 23.11.16.22 (latest) coalesce to form the perfect storm which causes slow download speeds.

The servers in Albany are running firmware version 22.71.11.13 which seems to work fine with the latest Ubuntu kernel as evident by migrating a VM to Albany and not experiencing any issues.

When deploying Proxmox in Troy I upgraded the NIC firmware along with the BIOS and all seemed fine, because 6.8.0-53 of the Ubuntu kernel was not released until Feb 10th. So it would was not something that was picked up at the time.

Since the 22.71.11.13 NIC firmware works with the latest kernel release the solution for us, is to rollback the NIC firmware on all of the Troy hosts. I've tested this with each firmware version and kernel release down to 6.8.0-50.
 

Attachments

  • Screenshot 2025-05-01 at 12.40.40 PM.png
    Screenshot 2025-05-01 at 12.40.40 PM.png
    116.6 KB · Views: 4