Hello community,
we made a new setup with a two node cluster, without any shared ressources (just for the convenience of a shared GUI).
Everything worked out and the new cluster works as expected - however we experienced strange overall system loads, definetly higher than the old to-be-replaced cluster. The cause seems to be a rather high context switch amount, but we are not entirely sure what's causing that load/context switches in general.
We tried setting up different Guest OS switching the CPU virtualisation (Host -> kvm64) or disabling RAM ballooing. KSM is offline as per default. This affects BOTH nodes! Every single KVM process seems to create some sort of background noise, as you can see here - just a reminder - these are new VMs without anything other than a fresh Debian9 OS ::
The NEW cluster ::
Here the OLD cluster ::
As you can see - way, way less context switching going on here.
To set everything in perspective, here our both setups in detail. However, I just noticed a small difference in the PVE Version on our new cluster.
Old cluster specs (Both nodes identical)::
Intel Xeon E3-1231 v3 @ 3.40GHz
8x4 DDR3 Samsung M391B1G73QH0-YK0
Nodes running pve-manager/4.4-15/7599e35a (running kernel: 4.4.67-1-pve)
QEMU Images as files stored on mountpoint (ext4) on SSDs
Crossover network connection
Both nodes have 30 producitve (mysql/psgrql/nginx or more) Debian8 VMs
per node context switching : ~ 4k in total
load average: 0.17, 0.16, 0.21
NEW cluster specs ::
Node 1:
pve-manager/5.2-12/ba196e4b (running kernel: 4.15.18-9-pve)
Intel Xeon E3-1230 V2 @ 3.30GHz
8x4 DDR3 Samsung M391B1G73QH0-YK0
Crossover network connection
LVM-Thin (4xSamsung SSD 860 PRO 512GB RAID 1)
Currently running 27 new/empty Debian9 VMs in idle
per node context switching : ~ 56k-100k in total
load average: 4.10, 2.72, 1.72
Node 2:
pve-manager/5.3-5/97ae681d (running kernel: 4.15.18-9-pve)
Intel Xeon E3-1231 v3 @ 3.40GHz
8x4 DDR3 Samsung M391B1G73QH0-YK0
Crossover network connection
LVM-Thin (4xSamsung SSD 860 PRO 512GB RAID 1)
Currently running 26 new/empty Debian9 VMs in idle
per node context switching : ~ 53k-110k in total
load average: 1.06, 1.79, 1.84
Ideas on this one? Any help is greatly appreciated!
EDIT :: We did not set up any weird multiple internal vmbridges - as even new vms without network interfaces create the same amount of context switching.
pveversions of every singe node ::
we made a new setup with a two node cluster, without any shared ressources (just for the convenience of a shared GUI).
Everything worked out and the new cluster works as expected - however we experienced strange overall system loads, definetly higher than the old to-be-replaced cluster. The cause seems to be a rather high context switch amount, but we are not entirely sure what's causing that load/context switches in general.
We tried setting up different Guest OS switching the CPU virtualisation (Host -> kvm64) or disabling RAM ballooing. KSM is offline as per default. This affects BOTH nodes! Every single KVM process seems to create some sort of background noise, as you can see here - just a reminder - these are new VMs without anything other than a fresh Debian9 OS ::
The NEW cluster ::
Code:
root@NODE1:~# ps faux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAN
root 11019 2.6 1.0 1709964 349816 ? Sl Dec12 57:30 /usr/bin/kvm -id 40024
root 11161 2.6 1.2 1752112 422588 ? Sl Dec12 57:44 /usr/bin/kvm -id 40001
root 11303 2.6 1.0 1691428 354152 ? Sl Dec12 57:25 /usr/bin/kvm -id 40002
root 11433 2.6 1.0 1714076 347048 ? Sl Dec12 57:43 /usr/bin/kvm -id 40003
root 11568 2.6 1.0 1728484 357224 ? Sl Dec12 57:38 /usr/bin/kvm -id 40004
root 11699 3.5 2.4 1781868 814780 ? Sl Dec12 77:45 /usr/bin/kvm -id 40005
root 11835 2.6 1.0 1678112 355524 ? Sl Dec12 57:12 /usr/bin/kvm -id 40006
root 11967 2.6 1.1 1715120 374344 ? Sl Dec12 57:31 /usr/bin/kvm -id 40007
root 12097 2.6 1.0 1670916 337536 ? Sl Dec12 57:24 /usr/bin/kvm -id 40008
root 12232 2.6 2.4 1757284 814664 ? Sl Dec12 58:16 /usr/bin/kvm -id 40009
root 12365 2.7 2.4 1761380 814452 ? Sl Dec12 58:37 /usr/bin/kvm -id 40010
root 12500 2.6 2.4 1757284 814372 ? Sl Dec12 58:20 /usr/bin/kvm -id 40011
root 12630 2.7 2.4 1761380 814332 ? Sl Dec12 58:49 /usr/bin/kvm -id 40012
root 12760 2.6 1.0 1740788 352952 ? Sl Dec12 57:14 /usr/bin/kvm -id 40013
root 12933 2.7 2.4 1757284 814168 ? Sl Dec12 59:05 /usr/bin/kvm -id 40014
root 13064 2.6 1.0 1732596 357916 ? Sl Dec12 57:23 /usr/bin/kvm -id 40015
root 13202 2.6 1.1 1676056 364016 ? Sl Dec12 57:19 /usr/bin/kvm -id 40016
root 13334 2.7 0.9 1924836 310984 ? Sl Dec12 60:25 /usr/bin/kvm -id 40017
root 13468 2.6 1.0 1670916 359644 ? Sl Dec12 56:44 /usr/bin/kvm -id 40018
root 13599 2.6 1.1 1693532 365644 ? Sl Dec12 56:34 /usr/bin/kvm -id 40019
root 13737 2.6 1.0 1692488 359644 ? Sl Dec12 56:21 /usr/bin/kvm -id 40020
root 13871 2.6 1.0 1707908 343280 ? Sl Dec12 56:33 /usr/bin/kvm -id 40021
root 14009 2.6 1.0 1755212 344320 ? Sl Dec12 56:41 /usr/bin/kvm -id 40022
root 14144 2.6 1.0 1706940 358652 ? Sl Dec12 57:05 /usr/bin/kvm -id 40023
.... and so on ......
Code:
root@NODE1:~# pidstat -d -w 10 | grep kvm
Average: UID PID cswch/s nvcswch/s Command
10:05:00 AM 0 6708 1024.68 1.20 kvm
10:05:00 AM 0 10851 1218.38 2.40 kvm
10:05:00 AM 0 11019 988.41 0.90 kvm
10:05:00 AM 0 11161 987.01 0.40 kvm
10:05:00 AM 0 11303 988.71 0.80 kvm
10:05:00 AM 0 11433 986.41 0.30 kvm
10:05:00 AM 0 11568 987.41 0.80 kvm
10:05:00 AM 0 11699 987.11 0.50 kvm
10:05:00 AM 0 11835 986.61 0.30 kvm
10:05:00 AM 0 11967 986.01 0.60 kvm
10:05:00 AM 0 12097 988.31 0.60 kvm
10:05:00 AM 0 12232 987.41 0.80 kvm
10:05:00 AM 0 12365 987.51 0.50 kvm
10:05:00 AM 0 12500 987.61 0.20 kvm
10:05:00 AM 0 12630 987.91 0.30 kvm
.... and so on ......
Here the OLD cluster ::
Code:
root@OLD-NODE1:~# pidstat -d -w 10 | grep kvm
10:22:19 AM 0 2818 4.70 0.00 kvm
10:22:19 AM 0 3190 204.40 0.00 kvm
10:22:19 AM 0 3286 9.80 0.00 kvm
10:22:19 AM 0 3352 4.20 0.00 kvm
10:22:19 AM 0 3428 3.20 0.00 kvm
10:22:19 AM 0 3493 3.10 0.00 kvm
10:22:19 AM 0 3598 3.50 0.00 kvm
10:22:19 AM 0 3737 4.30 0.00 kvm
10:22:19 AM 0 3801 3.90 0.00 kvm
10:22:19 AM 0 3888 3.00 0.00 kvm
10:22:19 AM 0 4678 4.30 0.10 kvm
10:22:19 AM 0 5058 3.90 0.00 kvm
10:22:19 AM 0 5234 3.90 0.00 kvm
10:22:19 AM 0 5310 6.40 0.00 kvm
10:22:19 AM 0 5385 6.00 0.00 kvm
10:22:19 AM 0 5462 4.30 0.00 kvm
10:22:19 AM 0 5534 3.70 0.00 kvm
10:22:19 AM 0 12455 348.70 0.30 kvm
10:22:19 AM 0 17368 3.70 0.00 kvm
10:22:19 AM 0 20856 4.40 0.00 kvm
.... and so on ....
As you can see - way, way less context switching going on here.
To set everything in perspective, here our both setups in detail. However, I just noticed a small difference in the PVE Version on our new cluster.
Old cluster specs (Both nodes identical)::
Intel Xeon E3-1231 v3 @ 3.40GHz
8x4 DDR3 Samsung M391B1G73QH0-YK0
Nodes running pve-manager/4.4-15/7599e35a (running kernel: 4.4.67-1-pve)
QEMU Images as files stored on mountpoint (ext4) on SSDs
Crossover network connection
Both nodes have 30 producitve (mysql/psgrql/nginx or more) Debian8 VMs
per node context switching : ~ 4k in total
load average: 0.17, 0.16, 0.21
NEW cluster specs ::
Node 1:
pve-manager/5.2-12/ba196e4b (running kernel: 4.15.18-9-pve)
Intel Xeon E3-1230 V2 @ 3.30GHz
8x4 DDR3 Samsung M391B1G73QH0-YK0
Crossover network connection
LVM-Thin (4xSamsung SSD 860 PRO 512GB RAID 1)
Currently running 27 new/empty Debian9 VMs in idle
per node context switching : ~ 56k-100k in total
load average: 4.10, 2.72, 1.72
Node 2:
pve-manager/5.3-5/97ae681d (running kernel: 4.15.18-9-pve)
Intel Xeon E3-1231 v3 @ 3.40GHz
8x4 DDR3 Samsung M391B1G73QH0-YK0
Crossover network connection
LVM-Thin (4xSamsung SSD 860 PRO 512GB RAID 1)
Currently running 26 new/empty Debian9 VMs in idle
per node context switching : ~ 53k-110k in total
load average: 1.06, 1.79, 1.84
Ideas on this one? Any help is greatly appreciated!
EDIT :: We did not set up any weird multiple internal vmbridges - as even new vms without network interfaces create the same amount of context switching.
pveversions of every singe node ::
root@NODE1:~# pveversion -v
proxmox-ve: 5.2-3 (running kernel: 4.15.18-9-pve)
pve-manager: 5.2-12 (running version: 5.2-12/ba196e4b)
pve-kernel-4.15: 5.2-12
pve-kernel-4.15.18-9-pve: 4.15.18-30
pve-kernel-4.15.18-7-pve: 4.15.18-27
corosync: 2.4.4-pve1
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: not correctly installed
libjs-extjs: 6.0.1-2
libpve-access-control: 5.1-2
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-42
libpve-guest-common-perl: 2.0-18
libpve-http-server-perl: 2.0-11
libpve-storage-perl: 5.0-32
libqb0: 1.0.3-1~bpo9
lvm2: 2.02.168-pve6
lxc-pve: 3.0.2+pve1-5
lxcfs: 3.0.2-2
novnc-pve: 1.0.0-2
proxmox-widget-toolkit: 1.0-20
pve-cluster: 5.0-30
pve-container: 2.0-30
pve-docs: 5.2-10
pve-edk2-firmware: 1.20181023-1
pve-firewall: 3.0-14
pve-firmware: 2.0-6
pve-ha-manager: 2.0-5
pve-i18n: 1.0-6
pve-libspice-server1: 0.14.1-1
pve-qemu-kvm: 2.12.1-1
pve-xtermjs: 1.0-5
qemu-server: 5.0-41
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
proxmox-ve: 5.2-3 (running kernel: 4.15.18-9-pve)
pve-manager: 5.2-12 (running version: 5.2-12/ba196e4b)
pve-kernel-4.15: 5.2-12
pve-kernel-4.15.18-9-pve: 4.15.18-30
pve-kernel-4.15.18-7-pve: 4.15.18-27
corosync: 2.4.4-pve1
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: not correctly installed
libjs-extjs: 6.0.1-2
libpve-access-control: 5.1-2
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-42
libpve-guest-common-perl: 2.0-18
libpve-http-server-perl: 2.0-11
libpve-storage-perl: 5.0-32
libqb0: 1.0.3-1~bpo9
lvm2: 2.02.168-pve6
lxc-pve: 3.0.2+pve1-5
lxcfs: 3.0.2-2
novnc-pve: 1.0.0-2
proxmox-widget-toolkit: 1.0-20
pve-cluster: 5.0-30
pve-container: 2.0-30
pve-docs: 5.2-10
pve-edk2-firmware: 1.20181023-1
pve-firewall: 3.0-14
pve-firmware: 2.0-6
pve-ha-manager: 2.0-5
pve-i18n: 1.0-6
pve-libspice-server1: 0.14.1-1
pve-qemu-kvm: 2.12.1-1
pve-xtermjs: 1.0-5
qemu-server: 5.0-41
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
root@NODE2:~# pveversion -v
proxmox-ve: 5.3-1 (running kernel: 4.15.18-9-pve)
pve-manager: 5.3-5 (running version: 5.3-5/97ae681d)
pve-kernel-4.15: 5.2-12
pve-kernel-4.15.18-9-pve: 4.15.18-30
corosync: 2.4.4-pve1
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: not correctly installed
libjs-extjs: 6.0.1-2
libpve-access-control: 5.1-3
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-43
libpve-guest-common-perl: 2.0-18
libpve-http-server-perl: 2.0-11
libpve-storage-perl: 5.0-33
libqb0: 1.0.3-1~bpo9
lvm2: 2.02.168-pve6
lxc-pve: 3.0.2+pve1-5
lxcfs: 3.0.2-2
novnc-pve: 1.0.0-2
proxmox-widget-toolkit: 1.0-22
pve-cluster: 5.0-31
pve-container: 2.0-31
pve-docs: 5.3-1
pve-edk2-firmware: 1.20181023-1
pve-firewall: 3.0-16
pve-firmware: 2.0-6
pve-ha-manager: 2.0-5
pve-i18n: 1.0-9
pve-libspice-server1: 0.14.1-1
pve-qemu-kvm: 2.12.1-1
pve-xtermjs: 1.0-5
qemu-server: 5.0-43
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
proxmox-ve: 5.3-1 (running kernel: 4.15.18-9-pve)
pve-manager: 5.3-5 (running version: 5.3-5/97ae681d)
pve-kernel-4.15: 5.2-12
pve-kernel-4.15.18-9-pve: 4.15.18-30
corosync: 2.4.4-pve1
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: not correctly installed
libjs-extjs: 6.0.1-2
libpve-access-control: 5.1-3
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-43
libpve-guest-common-perl: 2.0-18
libpve-http-server-perl: 2.0-11
libpve-storage-perl: 5.0-33
libqb0: 1.0.3-1~bpo9
lvm2: 2.02.168-pve6
lxc-pve: 3.0.2+pve1-5
lxcfs: 3.0.2-2
novnc-pve: 1.0.0-2
proxmox-widget-toolkit: 1.0-22
pve-cluster: 5.0-31
pve-container: 2.0-31
pve-docs: 5.3-1
pve-edk2-firmware: 1.20181023-1
pve-firewall: 3.0-16
pve-firmware: 2.0-6
pve-ha-manager: 2.0-5
pve-i18n: 1.0-9
pve-libspice-server1: 0.14.1-1
pve-qemu-kvm: 2.12.1-1
pve-xtermjs: 1.0-5
qemu-server: 5.0-43
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
root@OLD-NODE1:~# pveversion -v
proxmox-ve: 4.4-92 (running kernel: 4.4.67-1-pve)
pve-manager: 4.4-15 (running version: 4.4-15/7599e35a)
pve-kernel-4.4.59-1-pve: 4.4.59-87
pve-kernel-4.4.67-1-pve: 4.4.67-92
lvm2: 2.02.116-pve3
corosync-pve: 2.4.2-2~pve4+1
libqb0: 1.0.1-1
pve-cluster: 4.0-52
qemu-server: 4.0-110
pve-firmware: 1.1-11
libpve-common-perl: 4.0-95
libpve-access-control: 4.0-23
libpve-storage-perl: 4.0-76
pve-libspice-server1: 0.12.8-2
vncterm: 1.3-2
pve-docs: 4.4-4
pve-qemu-kvm: 2.7.1-4
pve-container: 1.0-101
pve-firewall: 2.0-33
pve-ha-manager: 1.0-41
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u3
lxc-pve: 2.0.7-4
lxcfs: 2.0.6-pve1
criu: 1.6.0-1
novnc-pve: 0.5-9
smartmontools: 6.5+svn4324-1~pve80
proxmox-ve: 4.4-92 (running kernel: 4.4.67-1-pve)
pve-manager: 4.4-15 (running version: 4.4-15/7599e35a)
pve-kernel-4.4.59-1-pve: 4.4.59-87
pve-kernel-4.4.67-1-pve: 4.4.67-92
lvm2: 2.02.116-pve3
corosync-pve: 2.4.2-2~pve4+1
libqb0: 1.0.1-1
pve-cluster: 4.0-52
qemu-server: 4.0-110
pve-firmware: 1.1-11
libpve-common-perl: 4.0-95
libpve-access-control: 4.0-23
libpve-storage-perl: 4.0-76
pve-libspice-server1: 0.12.8-2
vncterm: 1.3-2
pve-docs: 4.4-4
pve-qemu-kvm: 2.7.1-4
pve-container: 1.0-101
pve-firewall: 2.0-33
pve-ha-manager: 1.0-41
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u3
lxc-pve: 2.0.7-4
lxcfs: 2.0.6-pve1
criu: 1.6.0-1
novnc-pve: 0.5-9
smartmontools: 6.5+svn4324-1~pve80
root@OLD-NODE2:~# pveversion -v
proxmox-ve: 4.4-92 (running kernel: 4.4.67-1-pve)
pve-manager: 4.4-15 (running version: 4.4-15/7599e35a)
pve-kernel-4.4.59-1-pve: 4.4.59-87
pve-kernel-4.4.67-1-pve: 4.4.67-92
lvm2: 2.02.116-pve3
corosync-pve: 2.4.2-2~pve4+1
libqb0: 1.0.1-1
pve-cluster: 4.0-52
qemu-server: 4.0-110
pve-firmware: 1.1-11
libpve-common-perl: 4.0-95
libpve-access-control: 4.0-23
libpve-storage-perl: 4.0-76
pve-libspice-server1: 0.12.8-2
vncterm: 1.3-2
pve-docs: 4.4-4
pve-qemu-kvm: 2.7.1-4
pve-container: 1.0-101
pve-firewall: 2.0-33
pve-ha-manager: 1.0-41
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u3
lxc-pve: 2.0.7-4
lxcfs: 2.0.6-pve1
criu: 1.6.0-1
novnc-pve: 0.5-9
smartmontools: 6.5+svn4324-1~pve80
proxmox-ve: 4.4-92 (running kernel: 4.4.67-1-pve)
pve-manager: 4.4-15 (running version: 4.4-15/7599e35a)
pve-kernel-4.4.59-1-pve: 4.4.59-87
pve-kernel-4.4.67-1-pve: 4.4.67-92
lvm2: 2.02.116-pve3
corosync-pve: 2.4.2-2~pve4+1
libqb0: 1.0.1-1
pve-cluster: 4.0-52
qemu-server: 4.0-110
pve-firmware: 1.1-11
libpve-common-perl: 4.0-95
libpve-access-control: 4.0-23
libpve-storage-perl: 4.0-76
pve-libspice-server1: 0.12.8-2
vncterm: 1.3-2
pve-docs: 4.4-4
pve-qemu-kvm: 2.7.1-4
pve-container: 1.0-101
pve-firewall: 2.0-33
pve-ha-manager: 1.0-41
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u3
lxc-pve: 2.0.7-4
lxcfs: 2.0.6-pve1
criu: 1.6.0-1
novnc-pve: 0.5-9
smartmontools: 6.5+svn4324-1~pve80
Last edited: