Hi All. We are in a deep trouble.
We use 3 x Dell PE 7625 servers with 2 x AMD 9374F (32 core processors), I am facing an bandwidth issue with VM to VM as well as VM to the Host Node in the same node.
The bandwidth is ~13 Gbps for Host to VM and ~8 Gbps for VM to VM for a 50 Gbps bridge(2 x 25Gbps ports bonded with LACP) with no other traffic(New nodes).
Counter measures tested:
1) No improvement even after configuring multiqueue, I have configured multiqueue(=8) in Proxmox VM Network device settings.
2) My BIOS is in performance profile with NUMA Node Per Socket = 1, and in host node if i run numactl --hardware it shows as Available : 2 Nodes.(=represents 2 socket and 1 Numa node per socket).
As per the post (https://forum.proxmox.com/threads/proxmox-8-4-1-on-amd-epyc-slow-virtio-net.167555/ I have changed BIOS settings with NPS=4/2 but no improvement.
3) I have a old Intel Cluster and I know that that itself has around 30Gbps speed within the node (VM to VM),
So to find underlying cause, I have installed same proxmox version in new Intel Xeon 5410 (5th gen-24 core) server (called as N2) and tested the iperf within the node( acting as server and client) .Please check the images the speed is 68 Gbps without any parallel (-P).
The same when i do in my new AMD 9374F processor, to my shock it was 38 Gbps (see N1 images), almost half the performance.
Now, this is the reason that the VM to VM bandwidth is very less inside a node. This results are very scarring because the AMD processor is a beast with High cache, 32GT/s interconnect etc., and I know its CCD architecture, but still the speed is very very less. I want to know any other method to increase the inter core/process bandwidth [2] to maximum throughput.
If it is the case AMD for virtualization is a big NO for the future buyers.
Note:
1) I have not added -P(parallel ) in iperf as i want to see the real case where if u want to copy a big file or backup to another node, there is no parallel connection.
2) As the tests are run in same node, if I am right, there is no network interface involvement (that's why I get 30Gbps with 1G network card in my old server), so its just the inter core/process bandwidth that we are measuring. And so no need of network level tuning required.
We are struggling so much, it will be helpful with your guidance, as no other resource available for this strange issue.
Thanks.
We use 3 x Dell PE 7625 servers with 2 x AMD 9374F (32 core processors), I am facing an bandwidth issue with VM to VM as well as VM to the Host Node in the same node.
The bandwidth is ~13 Gbps for Host to VM and ~8 Gbps for VM to VM for a 50 Gbps bridge(2 x 25Gbps ports bonded with LACP) with no other traffic(New nodes).
Counter measures tested:
1) No improvement even after configuring multiqueue, I have configured multiqueue(=8) in Proxmox VM Network device settings.
2) My BIOS is in performance profile with NUMA Node Per Socket = 1, and in host node if i run numactl --hardware it shows as Available : 2 Nodes.(=represents 2 socket and 1 Numa node per socket).
As per the post (https://forum.proxmox.com/threads/proxmox-8-4-1-on-amd-epyc-slow-virtio-net.167555/ I have changed BIOS settings with NPS=4/2 but no improvement.
3) I have a old Intel Cluster and I know that that itself has around 30Gbps speed within the node (VM to VM),
So to find underlying cause, I have installed same proxmox version in new Intel Xeon 5410 (5th gen-24 core) server (called as N2) and tested the iperf within the node( acting as server and client) .Please check the images the speed is 68 Gbps without any parallel (-P).
The same when i do in my new AMD 9374F processor, to my shock it was 38 Gbps (see N1 images), almost half the performance.
Now, this is the reason that the VM to VM bandwidth is very less inside a node. This results are very scarring because the AMD processor is a beast with High cache, 32GT/s interconnect etc., and I know its CCD architecture, but still the speed is very very less. I want to know any other method to increase the inter core/process bandwidth [2] to maximum throughput.
If it is the case AMD for virtualization is a big NO for the future buyers.
Note:
1) I have not added -P(parallel ) in iperf as i want to see the real case where if u want to copy a big file or backup to another node, there is no parallel connection.
2) As the tests are run in same node, if I am right, there is no network interface involvement (that's why I get 30Gbps with 1G network card in my old server), so its just the inter core/process bandwidth that we are measuring. And so no need of network level tuning required.
We are struggling so much, it will be helpful with your guidance, as no other resource available for this strange issue.
Thanks.
Attachments
Last edited: