New Proxmox Cluster - high context switches

centron

New Member
Feb 22, 2016
2
0
1
25
Hello community,

we made a new setup with a two node cluster, without any shared ressources (just for the convenience of a shared GUI).

Everything worked out and the new cluster works as expected - however we experienced strange overall system loads, definetly higher than the old to-be-replaced cluster. The cause seems to be a rather high context switch amount, but we are not entirely sure what's causing that load/context switches in general.

We tried setting up different Guest OS switching the CPU virtualisation (Host -> kvm64) or disabling RAM ballooing. KSM is offline as per default. This affects BOTH nodes! Every single KVM process seems to create some sort of background noise, as you can see here - just a reminder - these are new VMs without anything other than a fresh Debian9 OS ::

The NEW cluster ::


Code:
root@NODE1:~# ps faux
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAN
root     11019  2.6  1.0 1709964 349816 ?      Sl   Dec12  57:30 /usr/bin/kvm -id 40024
root     11161  2.6  1.2 1752112 422588 ?      Sl   Dec12  57:44 /usr/bin/kvm -id 40001
root     11303  2.6  1.0 1691428 354152 ?      Sl   Dec12  57:25 /usr/bin/kvm -id 40002
root     11433  2.6  1.0 1714076 347048 ?      Sl   Dec12  57:43 /usr/bin/kvm -id 40003
root     11568  2.6  1.0 1728484 357224 ?      Sl   Dec12  57:38 /usr/bin/kvm -id 40004
root     11699  3.5  2.4 1781868 814780 ?      Sl   Dec12  77:45 /usr/bin/kvm -id 40005
root     11835  2.6  1.0 1678112 355524 ?      Sl   Dec12  57:12 /usr/bin/kvm -id 40006
root     11967  2.6  1.1 1715120 374344 ?      Sl   Dec12  57:31 /usr/bin/kvm -id 40007
root     12097  2.6  1.0 1670916 337536 ?      Sl   Dec12  57:24 /usr/bin/kvm -id 40008
root     12232  2.6  2.4 1757284 814664 ?      Sl   Dec12  58:16 /usr/bin/kvm -id 40009
root     12365  2.7  2.4 1761380 814452 ?      Sl   Dec12  58:37 /usr/bin/kvm -id 40010
root     12500  2.6  2.4 1757284 814372 ?      Sl   Dec12  58:20 /usr/bin/kvm -id 40011
root     12630  2.7  2.4 1761380 814332 ?      Sl   Dec12  58:49 /usr/bin/kvm -id 40012
root     12760  2.6  1.0 1740788 352952 ?      Sl   Dec12  57:14 /usr/bin/kvm -id 40013
root     12933  2.7  2.4 1757284 814168 ?      Sl   Dec12  59:05 /usr/bin/kvm -id 40014
root     13064  2.6  1.0 1732596 357916 ?      Sl   Dec12  57:23 /usr/bin/kvm -id 40015
root     13202  2.6  1.1 1676056 364016 ?      Sl   Dec12  57:19 /usr/bin/kvm -id 40016
root     13334  2.7  0.9 1924836 310984 ?      Sl   Dec12  60:25 /usr/bin/kvm -id 40017
root     13468  2.6  1.0 1670916 359644 ?      Sl   Dec12  56:44 /usr/bin/kvm -id 40018
root     13599  2.6  1.1 1693532 365644 ?      Sl   Dec12  56:34 /usr/bin/kvm -id 40019
root     13737  2.6  1.0 1692488 359644 ?      Sl   Dec12  56:21 /usr/bin/kvm -id 40020
root     13871  2.6  1.0 1707908 343280 ?      Sl   Dec12  56:33 /usr/bin/kvm -id 40021
root     14009  2.6  1.0 1755212 344320 ?      Sl   Dec12  56:41 /usr/bin/kvm -id 40022
root     14144  2.6  1.0 1706940 358652 ?      Sl   Dec12  57:05 /usr/bin/kvm -id 40023
.... and so on ......

Code:
root@NODE1:~# pidstat -d -w 10 | grep kvm
Average:      UID       PID   cswch/s nvcswch/s  Command
10:05:00 AM     0      6708   1024.68      1.20  kvm
10:05:00 AM     0     10851   1218.38      2.40  kvm
10:05:00 AM     0     11019    988.41      0.90  kvm
10:05:00 AM     0     11161    987.01      0.40  kvm
10:05:00 AM     0     11303    988.71      0.80  kvm
10:05:00 AM     0     11433    986.41      0.30  kvm
10:05:00 AM     0     11568    987.41      0.80  kvm
10:05:00 AM     0     11699    987.11      0.50  kvm
10:05:00 AM     0     11835    986.61      0.30  kvm
10:05:00 AM     0     11967    986.01      0.60  kvm
10:05:00 AM     0     12097    988.31      0.60  kvm
10:05:00 AM     0     12232    987.41      0.80  kvm
10:05:00 AM     0     12365    987.51      0.50  kvm
10:05:00 AM     0     12500    987.61      0.20  kvm
10:05:00 AM     0     12630    987.91      0.30  kvm
.... and so on ......

Here the OLD cluster ::
Code:
root@OLD-NODE1:~# pidstat -d -w 10 | grep kvm
10:22:19 AM     0      2818      4.70      0.00  kvm
10:22:19 AM     0      3190    204.40      0.00  kvm
10:22:19 AM     0      3286      9.80      0.00  kvm
10:22:19 AM     0      3352      4.20      0.00  kvm
10:22:19 AM     0      3428      3.20      0.00  kvm
10:22:19 AM     0      3493      3.10      0.00  kvm
10:22:19 AM     0      3598      3.50      0.00  kvm
10:22:19 AM     0      3737      4.30      0.00  kvm
10:22:19 AM     0      3801      3.90      0.00  kvm
10:22:19 AM     0      3888      3.00      0.00  kvm
10:22:19 AM     0      4678      4.30      0.10  kvm
10:22:19 AM     0      5058      3.90      0.00  kvm
10:22:19 AM     0      5234      3.90      0.00  kvm
10:22:19 AM     0      5310      6.40      0.00  kvm
10:22:19 AM     0      5385      6.00      0.00  kvm
10:22:19 AM     0      5462      4.30      0.00  kvm
10:22:19 AM     0      5534      3.70      0.00  kvm
10:22:19 AM     0     12455    348.70      0.30  kvm
10:22:19 AM     0     17368      3.70      0.00  kvm
10:22:19 AM     0     20856      4.40      0.00  kvm
.... and so on ....

As you can see - way, way less context switching going on here.

To set everything in perspective, here our both setups in detail. However, I just noticed a small difference in the PVE Version on our new cluster.

Old cluster specs (Both nodes identical)::
Intel Xeon E3-1231 v3 @ 3.40GHz
8x4 DDR3 Samsung M391B1G73QH0-YK0
Nodes running pve-manager/4.4-15/7599e35a (running kernel: 4.4.67-1-pve)
QEMU Images as files stored on mountpoint (ext4) on SSDs
Crossover network connection
Both nodes have 30 producitve (mysql/psgrql/nginx or more) Debian8 VMs
per node context switching : ~ 4k in total
load average: 0.17, 0.16, 0.21

NEW cluster specs ::
Node 1:

pve-manager/5.2-12/ba196e4b (running kernel: 4.15.18-9-pve)
Intel Xeon E3-1230 V2 @ 3.30GHz
8x4 DDR3 Samsung M391B1G73QH0-YK0
Crossover network connection
LVM-Thin (4xSamsung SSD 860 PRO 512GB RAID 1)
Currently running 27 new/empty Debian9 VMs in idle
per node context switching : ~ 56k-100k in total
load average: 4.10, 2.72, 1.72

Node 2:
pve-manager/5.3-5/97ae681d (running kernel: 4.15.18-9-pve)
Intel Xeon E3-1231 v3 @ 3.40GHz
8x4 DDR3 Samsung M391B1G73QH0-YK0
Crossover network connection
LVM-Thin (4xSamsung SSD 860 PRO 512GB RAID 1)
Currently running 26 new/empty Debian9 VMs in idle
per node context switching : ~ 53k-110k in total
load average: 1.06, 1.79, 1.84

Ideas on this one? Any help is greatly appreciated!

EDIT :: We did not set up any weird multiple internal vmbridges - as even new vms without network interfaces create the same amount of context switching.

pveversions of every singe node ::
root@NODE1:~# pveversion -v
proxmox-ve: 5.2-3 (running kernel: 4.15.18-9-pve)
pve-manager: 5.2-12 (running version: 5.2-12/ba196e4b)
pve-kernel-4.15: 5.2-12
pve-kernel-4.15.18-9-pve: 4.15.18-30
pve-kernel-4.15.18-7-pve: 4.15.18-27
corosync: 2.4.4-pve1
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: not correctly installed
libjs-extjs: 6.0.1-2
libpve-access-control: 5.1-2
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-42
libpve-guest-common-perl: 2.0-18
libpve-http-server-perl: 2.0-11
libpve-storage-perl: 5.0-32
libqb0: 1.0.3-1~bpo9
lvm2: 2.02.168-pve6
lxc-pve: 3.0.2+pve1-5
lxcfs: 3.0.2-2
novnc-pve: 1.0.0-2
proxmox-widget-toolkit: 1.0-20
pve-cluster: 5.0-30
pve-container: 2.0-30
pve-docs: 5.2-10
pve-edk2-firmware: 1.20181023-1
pve-firewall: 3.0-14
pve-firmware: 2.0-6
pve-ha-manager: 2.0-5
pve-i18n: 1.0-6
pve-libspice-server1: 0.14.1-1
pve-qemu-kvm: 2.12.1-1
pve-xtermjs: 1.0-5
qemu-server: 5.0-41
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3

root@NODE2:~# pveversion -v
proxmox-ve: 5.3-1 (running kernel: 4.15.18-9-pve)
pve-manager: 5.3-5 (running version: 5.3-5/97ae681d)
pve-kernel-4.15: 5.2-12
pve-kernel-4.15.18-9-pve: 4.15.18-30
corosync: 2.4.4-pve1
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: not correctly installed
libjs-extjs: 6.0.1-2
libpve-access-control: 5.1-3
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-43
libpve-guest-common-perl: 2.0-18
libpve-http-server-perl: 2.0-11
libpve-storage-perl: 5.0-33
libqb0: 1.0.3-1~bpo9
lvm2: 2.02.168-pve6
lxc-pve: 3.0.2+pve1-5
lxcfs: 3.0.2-2
novnc-pve: 1.0.0-2
proxmox-widget-toolkit: 1.0-22
pve-cluster: 5.0-31
pve-container: 2.0-31
pve-docs: 5.3-1
pve-edk2-firmware: 1.20181023-1
pve-firewall: 3.0-16
pve-firmware: 2.0-6
pve-ha-manager: 2.0-5
pve-i18n: 1.0-9
pve-libspice-server1: 0.14.1-1
pve-qemu-kvm: 2.12.1-1
pve-xtermjs: 1.0-5
qemu-server: 5.0-43
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3

root@OLD-NODE1:~# pveversion -v
proxmox-ve: 4.4-92 (running kernel: 4.4.67-1-pve)
pve-manager: 4.4-15 (running version: 4.4-15/7599e35a)
pve-kernel-4.4.59-1-pve: 4.4.59-87
pve-kernel-4.4.67-1-pve: 4.4.67-92
lvm2: 2.02.116-pve3
corosync-pve: 2.4.2-2~pve4+1
libqb0: 1.0.1-1
pve-cluster: 4.0-52
qemu-server: 4.0-110
pve-firmware: 1.1-11
libpve-common-perl: 4.0-95
libpve-access-control: 4.0-23
libpve-storage-perl: 4.0-76
pve-libspice-server1: 0.12.8-2
vncterm: 1.3-2
pve-docs: 4.4-4
pve-qemu-kvm: 2.7.1-4
pve-container: 1.0-101
pve-firewall: 2.0-33
pve-ha-manager: 1.0-41
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u3
lxc-pve: 2.0.7-4
lxcfs: 2.0.6-pve1
criu: 1.6.0-1
novnc-pve: 0.5-9
smartmontools: 6.5+svn4324-1~pve80

root@OLD-NODE2:~# pveversion -v
proxmox-ve: 4.4-92 (running kernel: 4.4.67-1-pve)
pve-manager: 4.4-15 (running version: 4.4-15/7599e35a)
pve-kernel-4.4.59-1-pve: 4.4.59-87
pve-kernel-4.4.67-1-pve: 4.4.67-92
lvm2: 2.02.116-pve3
corosync-pve: 2.4.2-2~pve4+1
libqb0: 1.0.1-1
pve-cluster: 4.0-52
qemu-server: 4.0-110
pve-firmware: 1.1-11
libpve-common-perl: 4.0-95
libpve-access-control: 4.0-23
libpve-storage-perl: 4.0-76
pve-libspice-server1: 0.12.8-2
vncterm: 1.3-2
pve-docs: 4.4-4
pve-qemu-kvm: 2.7.1-4
pve-container: 1.0-101
pve-firewall: 2.0-33
pve-ha-manager: 1.0-41
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u3
lxc-pve: 2.0.7-4
lxcfs: 2.0.6-pve1
criu: 1.6.0-1
novnc-pve: 0.5-9
smartmontools: 6.5+svn4324-1~pve80
 
Last edited:
Hi,

high context switches happened if you have to manny vcores running on a node.
Reduce the VM running on the node and the context switches will be normal again.