Dual Socket terrible performance on some VMs

harmonyp

Member
Nov 26, 2020
195
4
23
46
This is a dual socket AMD EPYC system, NUMA is enabled but only for the single VPS I am testing with as I did not set it before noticing this issue.

Running the test
Code:
sysbench --test=memory --memory-block-size=4G --memory-total-size=32G run
on the Proxmox host and on virtual machines shows terrible performance. The only solution is a power off/on (VM) and then sometimes the VM gets good results

Code:
24576.00 MiB transferred (2370.00 MiB/sec)


General statistics:
    total time:                          10.3682s
    total number of events:              6

Latency (ms):
         min:                                 1189.33
         avg:                                 1727.95
         max:                                 2697.68
         95th percentile:                     2680.11
         sum:                                10367.71

Threads fairness:
    events (avg/stddev):           6.0000/0.00
    execution time (avg/stddev):   10.3677/0.00

931a34822b6ab495aaeb9c01f63e7a8a.png


After doing some research it sounds like an issue with NUMA not being enabled but as you can see even with it enabled sometimes there is poor performance. Also the proxmox node itself has bad results I am not sure if that matters.


9331fc03f47c80711f855472ce660e15.png


root@proxmoxhost3:~# numastat
Code:
node0           node1
numa_hit            195958341659    214547960812
numa_miss                      0               0
numa_foreign                   0               0
interleave_hit            154685          153642
local_node          195929675706    214517495564
other_node              28411951        29865698
root@proxmoxhost3:~# numactl -H
Code:
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143
node 0 size: 515867 MB
node 0 free: 266897 MB
node 1 cpus: 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191
node 1 size: 516015 MB
node 1 free: 262410 MB
node distances:
node   0   1
0:  10  32
1:  32  10

In BIOS the NUMA mode is set to "Auto"
 
if you only need 4 cores in the vm why not use 1 socket 4 cores?

2 sockets are only necessary if you need more cores or memory in a vm than one hardware cpu/socket provides.
 
if you only need 4 cores in the vm why not use 1 socket 4 cores?

2 sockets are only necessary if you need more cores or memory in a vm than one hardware cpu/socket provides.
True but that is not the issue other vm perform badly also the proxmox host itself.
 
run sysbench on your proxmox host with hwloc-bind --single than it will run on one socket only and should perform better


sysbench --test=memory --memory-block-size=4G --memory-total-size=32G run WARNING: the --test option is deprecated. You can pass a script name or path on the command line without any options. sysbench 1.0.20 (using system LuaJIT 2.1.0-beta3) Running the test with following options: Number of threads: 1 Initializing random number generator from current time Running memory speed test with the following options: block size: 4194304KiB total size: 32768MiB operation: write scope: global Initializing worker threads... Threads started! Total operations: 8 ( 1.24 per second) 32768.00 MiB transferred (5085.20 MiB/sec) General statistics: total time: 6.4419s total number of events: 8 Latency (ms): min: 573.65 avg: 805.21 max: 1276.23 95th percentile: 1280.93 sum: 6441.69 Threads fairness: events (avg/stddev): 8.0000/0.00 execution time (avg/stddev): 6.4417/0.00

hwloc-bind --single sysbench --test=memory --memory-block-size=4G --memory-total-size=32G run WARNING: the --test option is deprecated. You can pass a script name or path on the command line without any options. sysbench 1.0.20 (using system LuaJIT 2.1.0-beta3) Running the test with following options: Number of threads: 1 Initializing random number generator from current time Running memory speed test with the following options: block size: 4194304KiB total size: 32768MiB operation: write scope: global Initializing worker threads... Threads started! Total operations: 8 ( 2.18 per second) 32768.00 MiB transferred (8934.28 MiB/sec) General statistics: total time: 3.6658s total number of events: 8 Latency (ms): min: 434.13 avg: 458.20 max: 497.19 95th percentile: 493.24 sum: 3665.59 Threads fairness: events (avg/stddev): 8.0000/0.00 execution time (avg/stddev): 3.6656/0.00
 
Last edited:
run sysbench on your proxmox host with hwloc-bind --single than it will run on one socket only and should perform better


sysbench --test=memory --memory-block-size=4G --memory-total-size=32G run WARNING: the --test option is deprecated. You can pass a script name or path on the command line without any options. sysbench 1.0.20 (using system LuaJIT 2.1.0-beta3) Running the test with following options: Number of threads: 1 Initializing random number generator from current time Running memory speed test with the following options: block size: 4194304KiB total size: 32768MiB operation: write scope: global Initializing worker threads... Threads started! Total operations: 8 ( 1.24 per second) 32768.00 MiB transferred (5085.20 MiB/sec) General statistics: total time: 6.4419s total number of events: 8 Latency (ms): min: 573.65 avg: 805.21 max: 1276.23 95th percentile: 1280.93 sum: 6441.69 Threads fairness: events (avg/stddev): 8.0000/0.00 execution time (avg/stddev): 6.4417/0.00

hwloc-bind --single sysbench --test=memory --memory-block-size=4G --memory-total-size=32G run WARNING: the --test option is deprecated. You can pass a script name or path on the command line without any options. sysbench 1.0.20 (using system LuaJIT 2.1.0-beta3) Running the test with following options: Number of threads: 1 Initializing random number generator from current time Running memory speed test with the following options: block size: 4194304KiB total size: 32768MiB operation: write scope: global Initializing worker threads... Threads started! Total operations: 8 ( 2.18 per second) 32768.00 MiB transferred (8934.28 MiB/sec) General statistics: total time: 3.6658s total number of events: 8 Latency (ms): min: 434.13 avg: 458.20 max: 497.19 95th percentile: 493.24 sum: 3665.59 Threads fairness: events (avg/stddev): 8.0000/0.00 execution time (avg/stddev): 3.6656/0.00
It's not just bad performance on that test, the system overall feels very slow.
 
Hello, have you solved the performance problem? we have the same, or very similar, issue. And when we change the CPU type from host (EPYCv4) to EPYCv3 then the performance of VM is good.
Is there any problem with Epyc v4?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!