Performance drop due microcode update?

udo

Distinguished Member
Apr 22, 2009
5,981
204
163
Ahrensburg; Germany
Hi,
today I had an strange effect. I updated the Bios (Intel Microcode Update) on an Dell R620, and after that an high IO DB-VMs shows a much higher load than before (actual pve-enterprise).

An short test with pveperf shows much lesser regex-values:
Code:
# pveperf                                                                                                                                               
CPU BOGOMIPS:      139190.64
REGEX/SECOND:      611333
HD SIZE:           327.16 GB (rpool/ROOT/pve-1)
FSYNCS/SECOND:     3172.26
DNS EXT:           11.71 ms
DNS INT:           1.19 ms

# pveversion
pve-manager/5.1-46/ae8241d4 (running kernel: 4.13.16-1-pve)
# uname -a
Linux pve01 4.13.16-1-pve #1 SMP PVE 4.13.16-43 (Fri, 16 Mar 2018 19:41:43 +0100) x86_64 GNU/Linux
an unpatched system (also updated pve, but older kernel running) shows better (normal) values
Code:
#pveperf
CPU BOGOMIPS:      139206.00
REGEX/SECOND:      1765695
HD SIZE:           325.49 GB (rpool/ROOT/pve-1)
FSYNCS/SECOND:     3999.74
DNS EXT:           12.07 ms
DNS INT:           0.91 ms

# pveversion
pve-manager/5.1-46/ae8241d4 (running kernel: 4.13.13-3-pve)
# uname -a
Linux pve02 4.13.13-3-pve #1 SMP PVE 4.13.13-34 (Sun, 7 Jan 2018 13:19:58 +0100) x86_64 GNU/Linux
On an second system updated last week I had the same drop at the regex-value (reconiced now), but on the testsystem the regex-values are normal (but pve-nosubscription):
Code:
# pveperf
CPU BOGOMIPS:      120001.68
REGEX/SECOND:      1752079
HD SIZE:           24.24 GB (/dev/mapper/pve-root)
BUFFERED READS:    443.77 MB/sec
AVERAGE SEEK TIME: 4.41 ms
FSYNCS/SECOND:     3317.69
DNS EXT:           11.79 ms
DNS INT:           0.88 ms

# pveversion
pve-manager/5.1-46/ae8241d4 (running kernel: 4.13.16-1-pve)
# uname -a
Linux pvetest 4.13.16-1-pve #1 SMP PVE 4.13.16-45 (Wed, 28 Mar 2018 15:47:11 +0200) x86_64 GNU/Linux
I must check, if the power settings was changed during the bios update, but this don't happens on the test-system and phoronix give on both systems (test-system + live-system updated last week) the same output:
Code:
Phoronix Test Suite v7.8.0
System Information

  PROCESSOR:          2 x Intel Xeon E5-2640 0 @ 3.00GHz
    Core Count:       12
    Thread Count:     24
    Extensions:       SSE 4.2 + AVX
    Cache Size:       15360 KB
    Microcode:        0x713
    Scaling Driver:   intel_pstate performance
...
Any hints?

Udo
 
Depends on what the microcode update contains, but I assume they provide fixes around the Spectre + Meltdown bugs. Same for the different kernel versions.
 
Hi Alwin,
right. AFAIK the microcode updates is for spectre/meltdown fixes only.

The strange thing is, that the testserver in the lab don't show this effect (same hardware).
Looks not, that it's kernel related - even with the kernel from the enterprise node the regey are fine on the test-node.
Try now to configure the test-node via puppet - perhaps there are settings, which produce this effect.

Udo
 
If possible, please test also with the 4.15 kernel.
 
Hi,
status update: the reason, why I can't reproduce that in the lab, was due the missing VMs...

After reboot with the new kernel from today (pve-kernel-4.13.16-2-pve: 4.13.16-47) I got good regex-values with pveperf (round 1700000).
I was happy... until some VMs are started - the regex dropped to 500000.
All 18 VMs has the pcid + spec-ctrl flag enabled (this was not so before).
Then I switched of all VMs exect one - and I got alternate one good and one bad result:
Code:
# pveperf
CPU BOGOMIPS:      119999.76
REGEX/SECOND:      1699506
HD SIZE:           1452.74 GB (rpool/ROOT/pve-1)
FSYNCS/SECOND:     2679.17
DNS EXT:           243.83 ms
DNS INT:           1.20 ms
# pveperf
CPU BOGOMIPS:      119999.76
REGEX/SECOND:      523716
HD SIZE:           1452.74 GB (rpool/ROOT/pve-1)
FSYNCS/SECOND:     2720.58
DNS EXT:           11.81 ms
DNS INT:           1.21 ms
# pveperf
CPU BOGOMIPS:      119999.76
REGEX/SECOND:      1720468
HD SIZE:           1452.74 GB (rpool/ROOT/pve-1)
FSYNCS/SECOND:     2379.57
DNS EXT:           11.76 ms
DNS INT:           1.26 ms
# pveperf
CPU BOGOMIPS:      119999.76
REGEX/SECOND:      525149
HD SIZE:           1452.74 GB (rpool/ROOT/pve-1)
FSYNCS/SECOND:     2413.19
DNS EXT:           11.72 ms
DNS INT:           1.23 ms
After that I switched pcid/spec-ctrl on all VMs off, install "pve-kernel-4.15.10-1-pve: 4.15.10-4" and after reboot the regex-values are good.
Now I have on 6 VMs the pcid/spect-ctrl flag active and it's locks good.

Udo
 
Hi Tom,
is the pve-kernel-4.15.15-1-pve/stable 4.15.15-6 amd64 the same like before in pvetest?
https://forum.proxmox.com/threads/4...or-pve-5-x-available.42097/page-2#post-205516

In this case I don't have the best feeling...

Udo
Hi,
to answer myself - it's not the same kernel:
Code:
old: pve-kernel-4.15.15-1-pve_4.15.15-5_amd64.deb
new: pve-kernel-4.15.15-1-pve_4.15.15-6_amd64.deb
But unfortunality with the same issue - not bootable with an lsi perc 710H

Edit: I have filed an bug report: https://bugzilla.proxmox.com/show_bug.cgi?id=1733

Udo
 

Attachments

  • kernel_4.15.15_megacli.png
    kernel_4.15.15_megacli.png
    97.5 KB · Views: 23
Last edited:
Hi,
status update:
with kernel 4.15.10 the DB run with normal load (app. 50% with 16 vCPUs) for one day, but we switched to the second DB host again, because the fail-over don't work well (and switched off the failover-sw).
For failover-switching is keepalived used (VRRP) with an higher prio on one host and after the upgrade, keepalive switched multible times for a short moment the fail-over-IP to the backup-node and back again...

Udo
 
How much is your performance drop in numbers? We have at least 10% for database loads in I/O alone after all spectre/meltdown patching.
 
How much is your performance drop in numbers? We have at least 10% for database loads in I/O alone after all spectre/meltdown patching.
Hi,
can't say the numbers, but I have done measurings before (with an single VM on one Node. With Phornoonix redis-lpop test (because it's shows the highes changes due spectre/meltdown) I lost app. 11% due the patched VM ubuntu-kernel (and patched host kernel).
With an actual host-kernel and pcid/spect-ctrl flag to the VM I got app. 10% back...
In therory fine, but in real live the DB VM, which take normaly 40-60% CPU load, go up to 80-100% and the custumer reconiced much slower access, so that we move the master-Ip to the DB, which was running on an older host.

Looks for me, that with 4.13. I run in trouble, if more than one VM used pcid+spct-ctl flags...
With 4.15. it's looks better, but we have still an network issue...

Udo