Performance drop due microcode update?

udo

Distinguished Member
Apr 22, 2009
5,975
196
163
Ahrensburg; Germany
Hi,
today I had an strange effect. I updated the Bios (Intel Microcode Update) on an Dell R620, and after that an high IO DB-VMs shows a much higher load than before (actual pve-enterprise).

An short test with pveperf shows much lesser regex-values:
Code:
# pveperf                                                                                                                                               
CPU BOGOMIPS:      139190.64
REGEX/SECOND:      611333
HD SIZE:           327.16 GB (rpool/ROOT/pve-1)
FSYNCS/SECOND:     3172.26
DNS EXT:           11.71 ms
DNS INT:           1.19 ms

# pveversion
pve-manager/5.1-46/ae8241d4 (running kernel: 4.13.16-1-pve)
# uname -a
Linux pve01 4.13.16-1-pve #1 SMP PVE 4.13.16-43 (Fri, 16 Mar 2018 19:41:43 +0100) x86_64 GNU/Linux
an unpatched system (also updated pve, but older kernel running) shows better (normal) values
Code:
#pveperf
CPU BOGOMIPS:      139206.00
REGEX/SECOND:      1765695
HD SIZE:           325.49 GB (rpool/ROOT/pve-1)
FSYNCS/SECOND:     3999.74
DNS EXT:           12.07 ms
DNS INT:           0.91 ms

# pveversion
pve-manager/5.1-46/ae8241d4 (running kernel: 4.13.13-3-pve)
# uname -a
Linux pve02 4.13.13-3-pve #1 SMP PVE 4.13.13-34 (Sun, 7 Jan 2018 13:19:58 +0100) x86_64 GNU/Linux
On an second system updated last week I had the same drop at the regex-value (reconiced now), but on the testsystem the regex-values are normal (but pve-nosubscription):
Code:
# pveperf
CPU BOGOMIPS:      120001.68
REGEX/SECOND:      1752079
HD SIZE:           24.24 GB (/dev/mapper/pve-root)
BUFFERED READS:    443.77 MB/sec
AVERAGE SEEK TIME: 4.41 ms
FSYNCS/SECOND:     3317.69
DNS EXT:           11.79 ms
DNS INT:           0.88 ms

# pveversion
pve-manager/5.1-46/ae8241d4 (running kernel: 4.13.16-1-pve)
# uname -a
Linux pvetest 4.13.16-1-pve #1 SMP PVE 4.13.16-45 (Wed, 28 Mar 2018 15:47:11 +0200) x86_64 GNU/Linux
I must check, if the power settings was changed during the bios update, but this don't happens on the test-system and phoronix give on both systems (test-system + live-system updated last week) the same output:
Code:
Phoronix Test Suite v7.8.0
System Information

  PROCESSOR:          2 x Intel Xeon E5-2640 0 @ 3.00GHz
    Core Count:       12
    Thread Count:     24
    Extensions:       SSE 4.2 + AVX
    Cache Size:       15360 KB
    Microcode:        0x713
    Scaling Driver:   intel_pstate performance
...
Any hints?

Udo
 
Depends on what the microcode update contains, but I assume they provide fixes around the Spectre + Meltdown bugs. Same for the different kernel versions.
 
Hi Alwin,
right. AFAIK the microcode updates is for spectre/meltdown fixes only.

The strange thing is, that the testserver in the lab don't show this effect (same hardware).
Looks not, that it's kernel related - even with the kernel from the enterprise node the regey are fine on the test-node.
Try now to configure the test-node via puppet - perhaps there are settings, which produce this effect.

Udo
 
If possible, please test also with the 4.15 kernel.
 
Hi,
status update: the reason, why I can't reproduce that in the lab, was due the missing VMs...

After reboot with the new kernel from today (pve-kernel-4.13.16-2-pve: 4.13.16-47) I got good regex-values with pveperf (round 1700000).
I was happy... until some VMs are started - the regex dropped to 500000.
All 18 VMs has the pcid + spec-ctrl flag enabled (this was not so before).
Then I switched of all VMs exect one - and I got alternate one good and one bad result:
Code:
# pveperf
CPU BOGOMIPS:      119999.76
REGEX/SECOND:      1699506
HD SIZE:           1452.74 GB (rpool/ROOT/pve-1)
FSYNCS/SECOND:     2679.17
DNS EXT:           243.83 ms
DNS INT:           1.20 ms
# pveperf
CPU BOGOMIPS:      119999.76
REGEX/SECOND:      523716
HD SIZE:           1452.74 GB (rpool/ROOT/pve-1)
FSYNCS/SECOND:     2720.58
DNS EXT:           11.81 ms
DNS INT:           1.21 ms
# pveperf
CPU BOGOMIPS:      119999.76
REGEX/SECOND:      1720468
HD SIZE:           1452.74 GB (rpool/ROOT/pve-1)
FSYNCS/SECOND:     2379.57
DNS EXT:           11.76 ms
DNS INT:           1.26 ms
# pveperf
CPU BOGOMIPS:      119999.76
REGEX/SECOND:      525149
HD SIZE:           1452.74 GB (rpool/ROOT/pve-1)
FSYNCS/SECOND:     2413.19
DNS EXT:           11.72 ms
DNS INT:           1.23 ms
After that I switched pcid/spec-ctrl on all VMs off, install "pve-kernel-4.15.10-1-pve: 4.15.10-4" and after reboot the regex-values are good.
Now I have on 6 VMs the pcid/spect-ctrl flag active and it's locks good.

Udo
 
Hi Tom,
is the pve-kernel-4.15.15-1-pve/stable 4.15.15-6 amd64 the same like before in pvetest?
https://forum.proxmox.com/threads/4...or-pve-5-x-available.42097/page-2#post-205516

In this case I don't have the best feeling...

Udo
Hi,
to answer myself - it's not the same kernel:
Code:
old: pve-kernel-4.15.15-1-pve_4.15.15-5_amd64.deb
new: pve-kernel-4.15.15-1-pve_4.15.15-6_amd64.deb
But unfortunality with the same issue - not bootable with an lsi perc 710H

Edit: I have filed an bug report: https://bugzilla.proxmox.com/show_bug.cgi?id=1733

Udo
 

Attachments

  • kernel_4.15.15_megacli.png
    kernel_4.15.15_megacli.png
    97.5 KB · Views: 23
Last edited:
Hi,
status update:
with kernel 4.15.10 the DB run with normal load (app. 50% with 16 vCPUs) for one day, but we switched to the second DB host again, because the fail-over don't work well (and switched off the failover-sw).
For failover-switching is keepalived used (VRRP) with an higher prio on one host and after the upgrade, keepalive switched multible times for a short moment the fail-over-IP to the backup-node and back again...

Udo
 
How much is your performance drop in numbers? We have at least 10% for database loads in I/O alone after all spectre/meltdown patching.
 
How much is your performance drop in numbers? We have at least 10% for database loads in I/O alone after all spectre/meltdown patching.
Hi,
can't say the numbers, but I have done measurings before (with an single VM on one Node. With Phornoonix redis-lpop test (because it's shows the highes changes due spectre/meltdown) I lost app. 11% due the patched VM ubuntu-kernel (and patched host kernel).
With an actual host-kernel and pcid/spect-ctrl flag to the VM I got app. 10% back...
In therory fine, but in real live the DB VM, which take normaly 40-60% CPU load, go up to 80-100% and the custumer reconiced much slower access, so that we move the master-Ip to the DB, which was running on an older host.

Looks for me, that with 4.13. I run in trouble, if more than one VM used pcid+spct-ctl flags...
With 4.15. it's looks better, but we have still an network issue...

Udo
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!