New Proxmox Cluster - high context switches

centron

New Member
Feb 22, 2016
2
0
1
24
Hello community,

we made a new setup with a two node cluster, without any shared ressources (just for the convenience of a shared GUI).

Everything worked out and the new cluster works as expected - however we experienced strange overall system loads, definetly higher than the old to-be-replaced cluster. The cause seems to be a rather high context switch amount, but we are not entirely sure what's causing that load/context switches in general.

We tried setting up different Guest OS switching the CPU virtualisation (Host -> kvm64) or disabling RAM ballooing. KSM is offline as per default. This affects BOTH nodes! Every single KVM process seems to create some sort of background noise, as you can see here - just a reminder - these are new VMs without anything other than a fresh Debian9 OS ::

The NEW cluster ::


Code:
root@NODE1:~# ps faux
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAN
root     11019  2.6  1.0 1709964 349816 ?      Sl   Dec12  57:30 /usr/bin/kvm -id 40024
root     11161  2.6  1.2 1752112 422588 ?      Sl   Dec12  57:44 /usr/bin/kvm -id 40001
root     11303  2.6  1.0 1691428 354152 ?      Sl   Dec12  57:25 /usr/bin/kvm -id 40002
root     11433  2.6  1.0 1714076 347048 ?      Sl   Dec12  57:43 /usr/bin/kvm -id 40003
root     11568  2.6  1.0 1728484 357224 ?      Sl   Dec12  57:38 /usr/bin/kvm -id 40004
root     11699  3.5  2.4 1781868 814780 ?      Sl   Dec12  77:45 /usr/bin/kvm -id 40005
root     11835  2.6  1.0 1678112 355524 ?      Sl   Dec12  57:12 /usr/bin/kvm -id 40006
root     11967  2.6  1.1 1715120 374344 ?      Sl   Dec12  57:31 /usr/bin/kvm -id 40007
root     12097  2.6  1.0 1670916 337536 ?      Sl   Dec12  57:24 /usr/bin/kvm -id 40008
root     12232  2.6  2.4 1757284 814664 ?      Sl   Dec12  58:16 /usr/bin/kvm -id 40009
root     12365  2.7  2.4 1761380 814452 ?      Sl   Dec12  58:37 /usr/bin/kvm -id 40010
root     12500  2.6  2.4 1757284 814372 ?      Sl   Dec12  58:20 /usr/bin/kvm -id 40011
root     12630  2.7  2.4 1761380 814332 ?      Sl   Dec12  58:49 /usr/bin/kvm -id 40012
root     12760  2.6  1.0 1740788 352952 ?      Sl   Dec12  57:14 /usr/bin/kvm -id 40013
root     12933  2.7  2.4 1757284 814168 ?      Sl   Dec12  59:05 /usr/bin/kvm -id 40014
root     13064  2.6  1.0 1732596 357916 ?      Sl   Dec12  57:23 /usr/bin/kvm -id 40015
root     13202  2.6  1.1 1676056 364016 ?      Sl   Dec12  57:19 /usr/bin/kvm -id 40016
root     13334  2.7  0.9 1924836 310984 ?      Sl   Dec12  60:25 /usr/bin/kvm -id 40017
root     13468  2.6  1.0 1670916 359644 ?      Sl   Dec12  56:44 /usr/bin/kvm -id 40018
root     13599  2.6  1.1 1693532 365644 ?      Sl   Dec12  56:34 /usr/bin/kvm -id 40019
root     13737  2.6  1.0 1692488 359644 ?      Sl   Dec12  56:21 /usr/bin/kvm -id 40020
root     13871  2.6  1.0 1707908 343280 ?      Sl   Dec12  56:33 /usr/bin/kvm -id 40021
root     14009  2.6  1.0 1755212 344320 ?      Sl   Dec12  56:41 /usr/bin/kvm -id 40022
root     14144  2.6  1.0 1706940 358652 ?      Sl   Dec12  57:05 /usr/bin/kvm -id 40023
.... and so on ......

Code:
root@NODE1:~# pidstat -d -w 10 | grep kvm
Average:      UID       PID   cswch/s nvcswch/s  Command
10:05:00 AM     0      6708   1024.68      1.20  kvm
10:05:00 AM     0     10851   1218.38      2.40  kvm
10:05:00 AM     0     11019    988.41      0.90  kvm
10:05:00 AM     0     11161    987.01      0.40  kvm
10:05:00 AM     0     11303    988.71      0.80  kvm
10:05:00 AM     0     11433    986.41      0.30  kvm
10:05:00 AM     0     11568    987.41      0.80  kvm
10:05:00 AM     0     11699    987.11      0.50  kvm
10:05:00 AM     0     11835    986.61      0.30  kvm
10:05:00 AM     0     11967    986.01      0.60  kvm
10:05:00 AM     0     12097    988.31      0.60  kvm
10:05:00 AM     0     12232    987.41      0.80  kvm
10:05:00 AM     0     12365    987.51      0.50  kvm
10:05:00 AM     0     12500    987.61      0.20  kvm
10:05:00 AM     0     12630    987.91      0.30  kvm
.... and so on ......

Here the OLD cluster ::
Code:
root@OLD-NODE1:~# pidstat -d -w 10 | grep kvm
10:22:19 AM     0      2818      4.70      0.00  kvm
10:22:19 AM     0      3190    204.40      0.00  kvm
10:22:19 AM     0      3286      9.80      0.00  kvm
10:22:19 AM     0      3352      4.20      0.00  kvm
10:22:19 AM     0      3428      3.20      0.00  kvm
10:22:19 AM     0      3493      3.10      0.00  kvm
10:22:19 AM     0      3598      3.50      0.00  kvm
10:22:19 AM     0      3737      4.30      0.00  kvm
10:22:19 AM     0      3801      3.90      0.00  kvm
10:22:19 AM     0      3888      3.00      0.00  kvm
10:22:19 AM     0      4678      4.30      0.10  kvm
10:22:19 AM     0      5058      3.90      0.00  kvm
10:22:19 AM     0      5234      3.90      0.00  kvm
10:22:19 AM     0      5310      6.40      0.00  kvm
10:22:19 AM     0      5385      6.00      0.00  kvm
10:22:19 AM     0      5462      4.30      0.00  kvm
10:22:19 AM     0      5534      3.70      0.00  kvm
10:22:19 AM     0     12455    348.70      0.30  kvm
10:22:19 AM     0     17368      3.70      0.00  kvm
10:22:19 AM     0     20856      4.40      0.00  kvm
.... and so on ....

As you can see - way, way less context switching going on here.

To set everything in perspective, here our both setups in detail. However, I just noticed a small difference in the PVE Version on our new cluster.

Old cluster specs (Both nodes identical)::
Intel Xeon E3-1231 v3 @ 3.40GHz
8x4 DDR3 Samsung M391B1G73QH0-YK0
Nodes running pve-manager/4.4-15/7599e35a (running kernel: 4.4.67-1-pve)
QEMU Images as files stored on mountpoint (ext4) on SSDs
Crossover network connection
Both nodes have 30 producitve (mysql/psgrql/nginx or more) Debian8 VMs
per node context switching : ~ 4k in total
load average: 0.17, 0.16, 0.21

NEW cluster specs ::
Node 1:

pve-manager/5.2-12/ba196e4b (running kernel: 4.15.18-9-pve)
Intel Xeon E3-1230 V2 @ 3.30GHz
8x4 DDR3 Samsung M391B1G73QH0-YK0
Crossover network connection
LVM-Thin (4xSamsung SSD 860 PRO 512GB RAID 1)
Currently running 27 new/empty Debian9 VMs in idle
per node context switching : ~ 56k-100k in total
load average: 4.10, 2.72, 1.72

Node 2:
pve-manager/5.3-5/97ae681d (running kernel: 4.15.18-9-pve)
Intel Xeon E3-1231 v3 @ 3.40GHz
8x4 DDR3 Samsung M391B1G73QH0-YK0
Crossover network connection
LVM-Thin (4xSamsung SSD 860 PRO 512GB RAID 1)
Currently running 26 new/empty Debian9 VMs in idle
per node context switching : ~ 53k-110k in total
load average: 1.06, 1.79, 1.84

Ideas on this one? Any help is greatly appreciated!

EDIT :: We did not set up any weird multiple internal vmbridges - as even new vms without network interfaces create the same amount of context switching.

pveversions of every singe node ::
root@NODE1:~# pveversion -v
proxmox-ve: 5.2-3 (running kernel: 4.15.18-9-pve)
pve-manager: 5.2-12 (running version: 5.2-12/ba196e4b)
pve-kernel-4.15: 5.2-12
pve-kernel-4.15.18-9-pve: 4.15.18-30
pve-kernel-4.15.18-7-pve: 4.15.18-27
corosync: 2.4.4-pve1
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: not correctly installed
libjs-extjs: 6.0.1-2
libpve-access-control: 5.1-2
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-42
libpve-guest-common-perl: 2.0-18
libpve-http-server-perl: 2.0-11
libpve-storage-perl: 5.0-32
libqb0: 1.0.3-1~bpo9
lvm2: 2.02.168-pve6
lxc-pve: 3.0.2+pve1-5
lxcfs: 3.0.2-2
novnc-pve: 1.0.0-2
proxmox-widget-toolkit: 1.0-20
pve-cluster: 5.0-30
pve-container: 2.0-30
pve-docs: 5.2-10
pve-edk2-firmware: 1.20181023-1
pve-firewall: 3.0-14
pve-firmware: 2.0-6
pve-ha-manager: 2.0-5
pve-i18n: 1.0-6
pve-libspice-server1: 0.14.1-1
pve-qemu-kvm: 2.12.1-1
pve-xtermjs: 1.0-5
qemu-server: 5.0-41
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3

root@NODE2:~# pveversion -v
proxmox-ve: 5.3-1 (running kernel: 4.15.18-9-pve)
pve-manager: 5.3-5 (running version: 5.3-5/97ae681d)
pve-kernel-4.15: 5.2-12
pve-kernel-4.15.18-9-pve: 4.15.18-30
corosync: 2.4.4-pve1
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: not correctly installed
libjs-extjs: 6.0.1-2
libpve-access-control: 5.1-3
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-43
libpve-guest-common-perl: 2.0-18
libpve-http-server-perl: 2.0-11
libpve-storage-perl: 5.0-33
libqb0: 1.0.3-1~bpo9
lvm2: 2.02.168-pve6
lxc-pve: 3.0.2+pve1-5
lxcfs: 3.0.2-2
novnc-pve: 1.0.0-2
proxmox-widget-toolkit: 1.0-22
pve-cluster: 5.0-31
pve-container: 2.0-31
pve-docs: 5.3-1
pve-edk2-firmware: 1.20181023-1
pve-firewall: 3.0-16
pve-firmware: 2.0-6
pve-ha-manager: 2.0-5
pve-i18n: 1.0-9
pve-libspice-server1: 0.14.1-1
pve-qemu-kvm: 2.12.1-1
pve-xtermjs: 1.0-5
qemu-server: 5.0-43
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3

root@OLD-NODE1:~# pveversion -v
proxmox-ve: 4.4-92 (running kernel: 4.4.67-1-pve)
pve-manager: 4.4-15 (running version: 4.4-15/7599e35a)
pve-kernel-4.4.59-1-pve: 4.4.59-87
pve-kernel-4.4.67-1-pve: 4.4.67-92
lvm2: 2.02.116-pve3
corosync-pve: 2.4.2-2~pve4+1
libqb0: 1.0.1-1
pve-cluster: 4.0-52
qemu-server: 4.0-110
pve-firmware: 1.1-11
libpve-common-perl: 4.0-95
libpve-access-control: 4.0-23
libpve-storage-perl: 4.0-76
pve-libspice-server1: 0.12.8-2
vncterm: 1.3-2
pve-docs: 4.4-4
pve-qemu-kvm: 2.7.1-4
pve-container: 1.0-101
pve-firewall: 2.0-33
pve-ha-manager: 1.0-41
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u3
lxc-pve: 2.0.7-4
lxcfs: 2.0.6-pve1
criu: 1.6.0-1
novnc-pve: 0.5-9
smartmontools: 6.5+svn4324-1~pve80

root@OLD-NODE2:~# pveversion -v
proxmox-ve: 4.4-92 (running kernel: 4.4.67-1-pve)
pve-manager: 4.4-15 (running version: 4.4-15/7599e35a)
pve-kernel-4.4.59-1-pve: 4.4.59-87
pve-kernel-4.4.67-1-pve: 4.4.67-92
lvm2: 2.02.116-pve3
corosync-pve: 2.4.2-2~pve4+1
libqb0: 1.0.1-1
pve-cluster: 4.0-52
qemu-server: 4.0-110
pve-firmware: 1.1-11
libpve-common-perl: 4.0-95
libpve-access-control: 4.0-23
libpve-storage-perl: 4.0-76
pve-libspice-server1: 0.12.8-2
vncterm: 1.3-2
pve-docs: 4.4-4
pve-qemu-kvm: 2.7.1-4
pve-container: 1.0-101
pve-firewall: 2.0-33
pve-ha-manager: 1.0-41
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u3
lxc-pve: 2.0.7-4
lxcfs: 2.0.6-pve1
criu: 1.6.0-1
novnc-pve: 0.5-9
smartmontools: 6.5+svn4324-1~pve80
 
Last edited:
Hi,

high context switches happened if you have to manny vcores running on a node.
Reduce the VM running on the node and the context switches will be normal again.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!