Performance drop due microcode update?

udo · Apr 18, 2018

Hi,
today I had an strange effect. I updated the Bios (Intel Microcode Update) on an Dell R620, and after that an high IO DB-VMs shows a much higher load than before (actual pve-enterprise).

An short test with pveperf shows much lesser regex-values:

Code:

# pveperf                                                                                                                                               
CPU BOGOMIPS:      139190.64
REGEX/SECOND:      611333
HD SIZE:           327.16 GB (rpool/ROOT/pve-1)
FSYNCS/SECOND:     3172.26
DNS EXT:           11.71 ms
DNS INT:           1.19 ms

# pveversion
pve-manager/5.1-46/ae8241d4 (running kernel: 4.13.16-1-pve)
# uname -a
Linux pve01 4.13.16-1-pve #1 SMP PVE 4.13.16-43 (Fri, 16 Mar 2018 19:41:43 +0100) x86_64 GNU/Linux

an unpatched system (also updated pve, but older kernel running) shows better (normal) values

Code:

#pveperf
CPU BOGOMIPS:      139206.00
REGEX/SECOND:      1765695
HD SIZE:           325.49 GB (rpool/ROOT/pve-1)
FSYNCS/SECOND:     3999.74
DNS EXT:           12.07 ms
DNS INT:           0.91 ms

# pveversion
pve-manager/5.1-46/ae8241d4 (running kernel: 4.13.13-3-pve)
# uname -a
Linux pve02 4.13.13-3-pve #1 SMP PVE 4.13.13-34 (Sun, 7 Jan 2018 13:19:58 +0100) x86_64 GNU/Linux

On an second system updated last week I had the same drop at the regex-value (reconiced now), but on the testsystem the regex-values are normal (but pve-nosubscription):

Code:

# pveperf
CPU BOGOMIPS:      120001.68
REGEX/SECOND:      1752079
HD SIZE:           24.24 GB (/dev/mapper/pve-root)
BUFFERED READS:    443.77 MB/sec
AVERAGE SEEK TIME: 4.41 ms
FSYNCS/SECOND:     3317.69
DNS EXT:           11.79 ms
DNS INT:           0.88 ms

# pveversion
pve-manager/5.1-46/ae8241d4 (running kernel: 4.13.16-1-pve)
# uname -a
Linux pvetest 4.13.16-1-pve #1 SMP PVE 4.13.16-45 (Wed, 28 Mar 2018 15:47:11 +0200) x86_64 GNU/Linux

I must check, if the power settings was changed during the bios update, but this don't happens on the test-system and phoronix give on both systems (test-system + live-system updated last week) the same output:

Code:

Phoronix Test Suite v7.8.0
System Information

  PROCESSOR:          2 x Intel Xeon E5-2640 0 @ 3.00GHz
    Core Count:       12
    Thread Count:     24
    Extensions:       SSE 4.2 + AVX
    Cache Size:       15360 KB
    Microcode:        0x713
    Scaling Driver:   intel_pstate performance
...

Any hints?

Udo

Alwin · Apr 18, 2018

Depends on what the microcode update contains, but I assume they provide fixes around the Spectre + Meltdown bugs. Same for the different kernel versions.

udo · Apr 18, 2018

Hi Alwin,
right. AFAIK the microcode updates is for spectre/meltdown fixes only.

The strange thing is, that the testserver in the lab don't show this effect (same hardware).
Looks not, that it's kernel related - even with the kernel from the enterprise node the regey are fine on the test-node.
Try now to configure the test-node via puppet - perhaps there are settings, which produce this effect.

Udo

tom · Apr 18, 2018

If possible, please test also with the 4.15 kernel.

udo · Apr 18, 2018

tom said:
If possible, please test also with the 4.15 kernel.

Hi Tom,
is the pve-kernel-4.15.15-1-pve/stable 4.15.15-6 amd64 the same like before in pvetest?
https://forum.proxmox.com/threads/4...or-pve-5-x-available.42097/page-2#post-205516

In this case I don't have the best feeling...

Udo

udo · Apr 18, 2018

Hi,
status update: the reason, why I can't reproduce that in the lab, was due the missing VMs...

After reboot with the new kernel from today (pve-kernel-4.13.16-2-pve: 4.13.16-47) I got good regex-values with pveperf (round 1700000).
I was happy... until some VMs are started - the regex dropped to 500000.
All 18 VMs has the pcid + spec-ctrl flag enabled (this was not so before).
Then I switched of all VMs exect one - and I got alternate one good and one bad result:

Code:

# pveperf
CPU BOGOMIPS:      119999.76
REGEX/SECOND:      1699506
HD SIZE:           1452.74 GB (rpool/ROOT/pve-1)
FSYNCS/SECOND:     2679.17
DNS EXT:           243.83 ms
DNS INT:           1.20 ms
# pveperf
CPU BOGOMIPS:      119999.76
REGEX/SECOND:      523716
HD SIZE:           1452.74 GB (rpool/ROOT/pve-1)
FSYNCS/SECOND:     2720.58
DNS EXT:           11.81 ms
DNS INT:           1.21 ms
# pveperf
CPU BOGOMIPS:      119999.76
REGEX/SECOND:      1720468
HD SIZE:           1452.74 GB (rpool/ROOT/pve-1)
FSYNCS/SECOND:     2379.57
DNS EXT:           11.76 ms
DNS INT:           1.26 ms
# pveperf
CPU BOGOMIPS:      119999.76
REGEX/SECOND:      525149
HD SIZE:           1452.74 GB (rpool/ROOT/pve-1)
FSYNCS/SECOND:     2413.19
DNS EXT:           11.72 ms
DNS INT:           1.23 ms

After that I switched pcid/spec-ctrl on all VMs off, install "pve-kernel-4.15.10-1-pve: 4.15.10-4" and after reboot the regex-values are good.
Now I have on 6 VMs the pcid/spect-ctrl flag active and it's locks good.

Udo

udo · Apr 18, 2018

udo said:
Hi Tom,
is the pve-kernel-4.15.15-1-pve/stable 4.15.15-6 amd64 the same like before in pvetest?
https://forum.proxmox.com/threads/4...or-pve-5-x-available.42097/page-2#post-205516

In this case I don't have the best feeling...

Udo

Hi,
to answer myself - it's not the same kernel:

Code:

old: pve-kernel-4.15.15-1-pve_4.15.15-5_amd64.deb
new: pve-kernel-4.15.15-1-pve_4.15.15-6_amd64.deb

But unfortunality with the same issue - not bootable with an lsi perc 710H

Edit: I have filed an bug report: https://bugzilla.proxmox.com/show_bug.cgi?id=1733

Udo

udo · Apr 20, 2018

Hi,
status update:
with kernel 4.15.10 the DB run with normal load (app. 50% with 16 vCPUs) for one day, but we switched to the second DB host again, because the fail-over don't work well (and switched off the failover-sw).
For failover-switching is keepalived used (VRRP) with an higher prio on one host and after the upgrade, keepalive switched multible times for a short moment the fail-over-IP to the backup-node and back again...

Udo

LnxBil · Apr 21, 2018

How much is your performance drop in numbers? We have at least 10% for database loads in I/O alone after all spectre/meltdown patching.

udo · Apr 22, 2018

LnxBil said:
How much is your performance drop in numbers? We have at least 10% for database loads in I/O alone after all spectre/meltdown patching.

Hi,
can't say the numbers, but I have done measurings before (with an single VM on one Node. With Phornoonix redis-lpop test (because it's shows the highes changes due spectre/meltdown) I lost app. 11% due the patched VM ubuntu-kernel (and patched host kernel).
With an actual host-kernel and pcid/spect-ctrl flag to the VM I got app. 10% back...
In therory fine, but in real live the DB VM, which take normaly 40-60% CPU load, go up to 80-100% and the custumer reconiced much slower access, so that we move the master-Ip to the DB, which was running on an older host.

Looks for me, that with 4.13. I run in trouble, if more than one VM used pcid+spct-ctl flags...
With 4.15. it's looks better, but we have still an network issue...

Udo

Search

Search

Performance drop due microcode update?

udo

Distinguished Member

Alwin

Proxmox Retired Staff

udo

Distinguished Member

tom

Proxmox Staff Member

udo

Distinguished Member

udo

Distinguished Member

udo

Distinguished Member

Attachments

udo

Distinguished Member

LnxBil

Distinguished Member

udo

Distinguished Member

We value your privacy