Proxmox 8.4.1 on AMD EPYC (slow virtio-net)

rjoensen

Renowned Member
Jun 3, 2017
14
0
66
36
Hello,

Anyone noticed slow network performance on AMD EPYC CPUs? I have now confirmed this on two generations.
- AMD EPYC 9454P (single socket)
- AMD EPYC 7502 (dual socket)

Network configuration:
- vmbr bridge dedicated for private VM traffic

VMs config:
- CPU type: host
- 10 cores (single socket)
- 10GB RAM
- SCSI VirtIO - Storage
- VirtIO network
- MTU 1500

VMs are on the same Proxmox node.

With iperf3/iperf between the two VMs, 15Gbps. If you enable multiqueues on virtio-net, you'll start seeing close to 50Gbps.

Setting up NFS server on one of the VMs and mounting it on the other, and trying to copy a 50G file, you'll only see about 500MB/s. But if I try to upload with another VM that is on another cluster (routed network), I see 1GB/s easily.

VM to VM inside of the Proxmox node, is poor, but as soon as the TX end is outside of cluster and node, its good throughput.

Any ideas on this? I attempted to disable SMT, and I saw a little better performance, wondering if there is anything else that goes for best practices and AMD CPUs like this.

Never seen anything like this on Intel Scalable CPUs. Also easily been able to see 100G iperfs between VMs on the same node with Intel CPUs.
 
I have no answer for you but maybe a hint where to look at: this could be a cross numa node problem. AMD is always multi-numa-node setup (chiplet design), often even 4 numa nodes, whereas one Xeon is most of the time only one numa node (only very high core count are also chiplet designs). Search the forum for information on this, maybe you can try out stuff presented there and it works.
 
Hello,

I appreciate your response LnxBil! And it just so happen that this resolved it too. Prior to BIOS changes, we were only seeing a single numa node. After some changes to the BIOS, we are now seeing 4 numa nodes and performance issue is gone.

Thank you so much!

Cheers,
rjoensen
 
where did you get the hardware, who did the bios setup?

if there are problematic hardware/bios setting, you cannot fix anything on the Proxmox side.

=> best practice:
buy testet/validated hardware configurations from a Proxmox partner
 
Tom,

The issue was already identified earlier by LnxBil pointing me in the right direction, NUMA was disabled in BIOS. Simple fix, nothing exotic.

Your comment about “where I got the hardware” or “who set up the BIOS” doesn’t really contribute anything at this point, especially after the problem was resolved.

This is standard AMD EPYC gear, not some unsupported edge case. The idea that Proxmox is only reliable if you buy from a partner is a bit dismissive, lots of people run perfectly stable deployments on broadly available hardware.

A more helpful approach might be acknowledging the root cause (disabled NUMA) rather than implying users should’ve known better or bought something else.
 
Your comment about “where I got the hardware” or “who set up the BIOS” doesn’t really contribute anything at this point, especially after the problem was resolved.

A more helpful approach might be acknowledging the root cause (disabled NUMA) rather than implying users should’ve known better or bought something else.

I agree that this comment does not help you, but the community forum is read by others too.

If a new Proxmox VE users purchases a AMD server with the suitable BIOS settings done already, there are likely less issues. I looks like you had to invest quite some time to got it fixed.
 
Hello,

I appreciate your response LnxBil! And it just so happen that this resolved it too. Prior to BIOS changes, we were only seeing a single numa node. After some changes to the BIOS, we are now seeing 4 numa nodes and performance issue is gone.

Thank you so much!

Cheers,
rjoensen
Hmm, that's interesting. I've been disappointed with networking as well, and I went with a 7302P by choice to avoid NUMA, but you are saying you see better networking performance when that option is disabled and allow it to present multiple nodes to the OS?
 
Hmm, that's interesting. I've been disappointed with networking as well, and I went with a 7302P by choice to avoid NUMA, but you are saying you see better networking performance when that option is disabled and allow it to present multiple nodes to the OS?

NUMA was enabled in the BIOS, and node went from showing single numa node to showing 4. VirtIO issues disappeared. Went from 30Gbps vm to vm to 93Gbps.

You could also use multiqueue for the virtio nic, but even then, without numa, it was slow. After numa, it was fast.
 
where did you get the hardware, who did the bios setup?

if there are problematic hardware/bios setting, you cannot fix anything on the Proxmox side.

=> best practice:
buy testet/validated hardware configurations from a Proxmox partner
Hello Tom,

The only requirements I've been able to find are these:
https://proxmox.com/en/products/proxmox-virtual-environment/requirements

The req. there state the following, regarding the CPU's:
  • Intel 64 or AMD64 with Intel VT/AMD-V CPU flag.

Please correct me if I'm mistaken, but my understanding is that those requirements are fully met in both cases in this thread, and assuming both are paying customers and within those requirement specs, Proxmox is expected to support any issues they have, like when the performance is shit, due to bios settings.

/Heðin