[SOLVED] Large Variation in VM Performance

jena

Member
Jul 9, 2020
47
8
13
34
Update:
I am chasing the ghost here.
I have set high performance in Win10 new setting menu in VM412. But the old style power plan (now called additional power plan setting) is still on balanced. Duh.
Now VM412's bench are similar to VM500.
The reason why VM500 has good benchmarks at the beginning is that the GPU driver forced to set the power plan to High Performance.


Hardware
CPU: AMD Threadripper 3970x (32C/64T)
GPU: NVIDIA 2080 Ti Founders Edition Blower Style
MB: MSI TRX40 Pro Wifi
RAM: 256GB DDR4 3200

Please help to diagnose large variation in VM performance when running MATLAB bench.

Both VM can achieve similar CPU-Z benchmarks and Cinebench.

But for my work, I need to use MATLAB 2020b.
PS: The MATLAB benchmark is not using a lot of RAM, 16GB is plenty, my RAM usage is about 15-20% of 16GB
https://www.mathworks.com/help/matlab/ref/bench.html

LU Perform LU matrix factorization of a full matrix Floating-point, regular memory access
FFT Perform fft of a full vector Floating-point, irregular memory access
ODE Solve van der Pol equation with ode45 Data structures and MATLAB function files
Sparse Solve a symmetric sparse linear system Mixed integer and floating-point

Both VM.conf are attached.

VM500, 16VCPU + 2080Ti Passthrough (access via RDP)
Screenshot shows 10 runs(the lower the better). Between different runs variations were small.
Run#LUFFTODESparse2D3D
1st 10-run avg0.210.260.360.430.741.12
MATLAB_Bench_VM500.JPG
This is close to native Win10 Threadripper 3970x performance. These bench doesn't scale too much once above certain core counts.

VM412, 16VGPU + SPICE, latest driver in virtio-win-0.1.189.iso, Power Plan: High Performance
I expect the 2D and 3D to be terrible due to lack of GPU.
But LU ODE and Sparse also are terrible.
There is also large variation between runs.
Some runs can achieve close to VM500's performance, but some runs are terrible.
Run#LUFFTODESparse2D3D
1st 10-run avg0.210.260.360.430.741.12
2nd 10-run avg0.730.480.370.730.741.12
 

Attachments

Last edited:
What is your hardware basis?
Also are multiple VMs running at the same time when you do your tests?
 
What is your hardware basis?
Also are multiple VMs running at the same time when you do your tests?

Thanks for the reply.
No they are not running at the same time.
The only other thing running is a Turnkey SAMBA server in LXC container using 1 VCPU and 512MB RAM

Hardware
CPU: AMD Threadripper 3970x (32C/64T)
GPU: NVIDIA 2080 Ti Founders Edition Blower Style
MB: MSI TRX40 Pro Wifi
RAM: 256GB DDR4 3200
 
Well on the hw-basis it shouldn't matter anyways, because you should have enough cores and threads available so the hypervisor should not have challenge to schedule the vCPUs.
What could explain your variation are several possibilities:
- NUMA node issues (afaik ryzen cpus have multiple packages and NUMA nodes) - crossing these nodes is expensive in terms of execution time.
- cores which might get paused / clocked down due to speed-step / power saving reasons or other technologies (turbo-boost for instance)
What are your bios settings on those topics. have you tried 8-core VMs already? Any changes there?
 
Well on the hw-basis it shouldn't matter anyways, because you should have enough cores and threads available so the hypervisor should not have challenge to schedule the vCPUs.
What could explain your variation are several possibilities:
- NUMA node issues (afaik ryzen cpus have multiple packages and NUMA nodes) - crossing these nodes is expensive in terms of execution time.
- cores which might get paused / clocked down due to speed-step / power saving reasons or other technologies (turbo-boost for instance)
What are your bios settings on those topics. have you tried 8-core VMs already? Any changes there?

But the baremetal Win10 benchmark that matlab is giving as an Threadripper 3970x example doesn't seems to suffer from this performance issue.
Also the VM500 (the one with GPU passthrough is also fine).

So what could be different from VM500? From VM config, I have set the CPU argument the same.
1. VM412 has to use CPU to render SPICE graphics?
2. VM500 due to passthrough can lock on to physical core better?
3. RAM mapping? because Threadripper uses quad-channel RAM, and VM500 can lock on to the same RAM space.

Occasionally VM412 can have a few runs in 10 runs like LU = 0.21-ish FFT = 0.26-ish right at the beginning when VM just started.
Then it would consistently run poorly like LU = 0.73 FFT = 0.48

There is the testing summary.
PC NameBigRun# (10 run average)ConfigLUFFTODESparse2D3D
VM 500
1​
16VCPU + 2080Ti, NUMA=1
0.21​
0.23​
0.37​
0.44​
0.58​
0.32​
VM 412
2​
16VCPU NUMA=1
0.21​
0.26​
0.36​
0.43​
0.73​
1.12​
VM 412
3​
16VCPU NUMA=1
0.73​
0.48​
0.37​
0.73​
0.74​
1.12​
VM 412
4​
16VCPU NUMA=0
0.70​
0.49​
0.37​
0.72​
0.74​
1.13​
VM 412
5​
8VCPU NUMA=1
0.43​
0.54​
0.37​
0.51​
0.73​
1.12​
VM 412
6​
8VCPU NUMA=0
0.43​
0.61​
0.37​
0.51​
0.89​
1.13​
VM 412
7​
4VCPU NUMA=1
0.53​
0.71​
0.36​
0.41​
0.75​
1.15​
VM 412
8​
4VCPU NUMA=0
0.52​
0.67​
0.36​
0.41​
0.76​
1.15​

Update:
I am chasing the ghost here.
I have set high performance in Win10 new setting menu in VM412. But the old style power plan (now called additional power plan setting) is still on balanced. Duh.
Now VM412's bench are similar to VM500.
The reason why VM500 has good benchmarks at the beginning is that the GPU driver forced to set the power plan to High Performance.
 
Last edited:
Oh that is something which I haven't thought of.
However nothing that surprises me because I had similar issues in the past...