0-100% instant random core utilization

hanzfritz85

Member
Dec 15, 2018
30
0
6
30
Dear All,
Please help. Two identical servers: Dell R630 2x Intel Xeon E5-2650v4 (48 threads) 128GB
same problem on both servers, both running proxmox 5.4-4
2 containers running on each machine 16 threads each
htop inside CTs shows some of the cores 100% utilization, while the host show far less than that.
Can't monitor cpu utilization, but maybe it's even worse than only cpu stat. Maybe I won't be able to run under heavy load.
some gifs attached
Any ideas what could be wrong?
 

Attachments

hanzfritz85

Member
Dec 15, 2018
30
0
6
30
Anyone else faced similar problem?
Looks like an incorrect cpu process queues. Probably because of 2x CPU machine config.
5.4-6 on a single cpu machine has none of that.
 

Alwin

Proxmox Staff Member
Staff member
Aug 1, 2017
2,672
232
63
Can you please check with the following command how the cores of the containers are distributed?
Code:
pct cpusets
Background to this, CTs have their core(s) distributed along the available CPU cores. With many CT or VMs it makes it likely that some other CT or VM is also using the same core. The usage displayed of the core(s) inside the CT is the actual usage of the core on the host. Any other service on the host that ramps up the usage on that particular core will influence the displayed usage.
 

hanzfritz85

Member
Dec 15, 2018
30
0
6
30
-------------------------------------------------------------------------------------------------------------------------------------
100: 2 5 6 7 10 12 14 15 20 22 28 34 38 41 44 45
101: 0 1 3 4 8 9 11 13 16 17 18 21 25 27 31 32
-------------------------------------------------------------------------------------------------------------------------------------

----------------------------------------------------------------------------------------------------------------------------------------
100: 1 2 3 4 5 10 11 12 13 14 15 16 17 18 19 21
101: 0 6 7 8 9 20 22 23 25 27 28 29 34 36 41 46
----------------------------------------------------------------------------------------------------------------------------------------
Can you please check with the following command how the cores of the containers are distributed?
Code:
pct cpusets
Background to this, CTs have their core(s) distributed along the available CPU cores. With many CT or VMs it makes it likely that some other CT or VM is also using the same core. The usage displayed of the core(s) inside the CT is the actual usage of the core on the host. Any other service on the host that ramps up the usage on that particular core will influence the displayed usage.
 

hanzfritz85

Member
Dec 15, 2018
30
0
6
30
Can you please check with the following command how the cores of the containers are distributed?
Code:
pct cpusets
Background to this, CTs have their core(s) distributed along the available CPU cores. With many CT or VMs it makes it likely that some other CT or VM is also using the same core. The usage displayed of the core(s) inside the CT is the actual usage of the core on the host. Any other service on the host that ramps up the usage on that particular core will influence the displayed usage.
Thank you for attention. The host htop shows there are no processes that utilize 100% of any core. And only some of cores show 0-100% some wok as expected. Gif attached to the firs message can show it more clear.
 

Alwin

Proxmox Staff Member
Staff member
Aug 1, 2017
2,672
232
63
What is the config of those VMs in question?
 

Alwin

Proxmox Staff Member
Staff member
Aug 1, 2017
2,672
232
63
I more meant, the config with 'pct config <vmid>'.
 

hanzfritz85

Member
Dec 15, 2018
30
0
6
30
pct config 100
arch: amd64
cores: 16
hostname:X
memory: 8192
net0: name=eth0,bridge=vmbr0,firewall=1,gw=,hwaddr=,ip=,type=veth
net1: name=eth1,bridge=vmbr1,firewall=1,hwaddr=,ip=,type=veth
onboot: 1
ostype: debian
rootfs: thinpool:vm-100-disk-0,size=16G
swap: 1024

sudo pct config 101
arch: amd64
cores: 16
hostname:X
memory: 65536
net0: name=eth0,bridge=vmbr0,firewall=1,gw=,hwaddr=,ip=,type=veth
net1: name=eth1,bridge=vmbr1,firewall=1,hwaddr=,ip=,type=veth
onboot: 1
ostype: debian
parent: final
rootfs: thinpool:vm-101-disk-0,size=240G
swap: 1024
 

Alwin

Proxmox Staff Member
Staff member
Aug 1, 2017
2,672
232
63
What else is running on the server? You could use atop to record the node and then compare the results with the container.

Is there anything related in the log files (/var/log/)?
 

hanzfritz85

Member
Dec 15, 2018
30
0
6
30
What else is running on the server? You could use atop to record the node and then compare the results with the container.

Is there anything related in the log files (/var/log/)?
will do. meanwhile i've found another
server with 2 x Intel(R) Xeon(R) CPU E5-2630 v3 server with proxmox 5.3-9 - no such problem there
 

Alwin

Proxmox Staff Member
Staff member
Aug 1, 2017
2,672
232
63
Could you please post a 'pveversion -v' of the 5.3-9 and 5.4-4 nodes?
 

hanzfritz85

Member
Dec 15, 2018
30
0
6
30
pct config 100
arch: amd64
cores: 16
hostname:X
memory: 8192
net0: name=eth0,bridge=vmbr0,firewall=1,gw=,hwaddr=,ip=,type=veth
net1: name=eth1,bridge=vmbr1,firewall=1,hwaddr=,ip=,type=veth
onboot: 1
ostype: debian
rootfs: thinpool:vm-100-disk-0,size=16G
swap: 1024

sudo pct config 101
arch: amd64
cores: 16
hostname:X
memory: 65536
net0: name=eth0,bridge=vmbr0,firewall=1,gw=,hwaddr=,ip=,type=veth
net1: name=eth1,bridge=vmbr1,firewall=1,hwaddr=,ip=,type=veth
onboot: 1
ostype: debian
parent: final
rootfs: thinpool:vm-101-disk-0,size=240G
swap: 1024
 

hanzfritz85

Member
Dec 15, 2018
30
0
6
30
What else is running on the server? You could use atop to record the node and then compare the results with the container.

Is there anything related in the log files (/var/log/)?
nothing in logs. atop shows 68% of mysql
 

Attachments

hanzfritz85

Member
Dec 15, 2018
30
0
6
30
pveversion -v
proxmox-ve: 5.4-1 (running kernel: 4.15.18-13-pve)
pve-manager: 5.4-4 (running version: 5.4-4/97a96833)
pve-kernel-4.15: 5.4-1
pve-kernel-4.15.18-13-pve: 4.15.18-37
corosync: 2.4.4-pve1
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: not correctly installed
libjs-extjs: 6.0.1-2
libpve-access-control: 5.1-8
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-51
libpve-guest-common-perl: 2.0-20
libpve-http-server-perl: 2.0-13
libpve-storage-perl: 5.0-41
libqb0: 1.0.3-1~bpo9
lvm2: 2.02.168-pve6
lxc-pve: 3.1.0-3
lxcfs: 3.0.3-pve1
novnc-pve: 1.0.0-3
proxmox-widget-toolkit: 1.0-26
pve-cluster: 5.0-36
pve-container: 2.0-37
pve-docs: 5.4-2
pve-edk2-firmware: 1.20190312-1
pve-firewall: 3.0-20
pve-firmware: 2.0-6
pve-ha-manager: 2.0-9
pve-i18n: 1.1-4
pve-libspice-server1: 0.14.1-2
pve-qemu-kvm: 2.12.1-3
pve-xtermjs: 3.12.0-1
qemu-server: 5.0-50
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3


pveversion -v
proxmox-ve: 5.3-1 (running kernel: 4.15.18-11-pve)
pve-manager: 5.3-9 (running version: 5.3-9/ba817b29)
pve-kernel-4.15: 5.3-2
pve-kernel-4.15.18-11-pve: 4.15.18-33
corosync: 2.4.4-pve1
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: not correctly installed
libjs-extjs: 6.0.1-2
libpve-access-control: 5.1-3
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-45
libpve-guest-common-perl: 2.0-20
libpve-http-server-perl: 2.0-11
libpve-storage-perl: 5.0-37
libqb0: 1.0.3-1~bpo9
lvm2: 2.02.168-pve6
lxc-pve: 3.1.0-2
lxcfs: 3.0.2-2
novnc-pve: 1.0.0-2
proxmox-widget-toolkit: 1.0-22
pve-cluster: 5.0-33
pve-container: 2.0-34
pve-docs: 5.3-2
pve-edk2-firmware: 1.20181023-1
pve-firewall: 3.0-17
pve-firmware: 2.0-6
pve-ha-manager: 2.0-6
pve-i18n: 1.0-9
pve-libspice-server1: 0.14.1-2
pve-qemu-kvm: 2.12.1-1
pve-xtermjs: 3.10.1-1
qemu-server: 5.0-46
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
 

Alwin

Proxmox Staff Member
Staff member
Aug 1, 2017
2,672
232
63
1600% utilization of all the cores in CT exactly every 20seconds.
I only see 16 processes, 15x mysql and 1x htop that display the 1600% (16x100). From the screenshot I don't see anything wrong with the node, just that the container has a good load of work.
 

hanzfritz85

Member
Dec 15, 2018
30
0
6
30
I only see 16 processes, 15x mysql and 1x htop that display the 1600% (16x100). From the screenshot I don't see anything wrong with the node, just that the container has a good load of work.
really, guys, please help. it's 0 to 100% momentarily and it's 100% on some cores constantly.
need gif?
 

Attachments

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE and Proxmox Mail Gateway. We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!