0-100% instant random core utilization

hanzfritz85 · Jun 13, 2019

Dear All,
Please help. Two identical servers: Dell R630 2x Intel Xeon E5-2650v4 (48 threads) 128GB
same problem on both servers, both running proxmox 5.4-4
2 containers running on each machine 16 threads each
htop inside CTs shows some of the cores 100% utilization, while the host show far less than that.
Can't monitor cpu utilization, but maybe it's even worse than only cpu stat. Maybe I won't be able to run under heavy load.
some gifs attached
Any ideas what could be wrong?

hanzfritz85 · Jun 14, 2019

Anyone else faced similar problem?
Looks like an incorrect cpu process queues. Probably because of 2x CPU machine config.
5.4-6 on a single cpu machine has none of that.

Alwin · Jun 14, 2019

Can you please check with the following command how the cores of the containers are distributed?

Code:

pct cpusets

Background to this, CTs have their core(s) distributed along the available CPU cores. With many CT or VMs it makes it likely that some other CT or VM is also using the same core. The usage displayed of the core(s) inside the CT is the actual usage of the core on the host. Any other service on the host that ramps up the usage on that particular core will influence the displayed usage.

hanzfritz85 · Jun 14, 2019

and 1600% cpu appear someitmes

hanzfritz85 · Jun 14, 2019

-------------------------------------------------------------------------------------------------------------------------------------
100: 2 5 6 7 10 12 14 15 20 22 28 34 38 41 44 45
101: 0 1 3 4 8 9 11 13 16 17 18 21 25 27 31 32
-------------------------------------------------------------------------------------------------------------------------------------

----------------------------------------------------------------------------------------------------------------------------------------
100: 1 2 3 4 5 10 11 12 13 14 15 16 17 18 19 21
101: 0 6 7 8 9 20 22 23 25 27 28 29 34 36 41 46
----------------------------------------------------------------------------------------------------------------------------------------

Alwin said:
Can you please check with the following command how the cores of the containers are distributed?

Code:

pct cpusets

Background to this, CTs have their core(s) distributed along the available CPU cores. With many CT or VMs it makes it likely that some other CT or VM is also using the same core. The usage displayed of the core(s) inside the CT is the actual usage of the core on the host. Any other service on the host that ramps up the usage on that particular core will influence the displayed usage.

hanzfritz85 · Jun 14, 2019

Alwin said:
Can you please check with the following command how the cores of the containers are distributed?

Code:

pct cpusets

Background to this, CTs have their core(s) distributed along the available CPU cores. With many CT or VMs it makes it likely that some other CT or VM is also using the same core. The usage displayed of the core(s) inside the CT is the actual usage of the core on the host. Any other service on the host that ramps up the usage on that particular core will influence the displayed usage.

Thank you for attention. The host htop shows there are no processes that utilize 100% of any core. And only some of cores show 0-100% some wok as expected. Gif attached to the firs message can show it more clear.

Alwin · Jun 14, 2019

What is the config of those VMs in question?

hanzfritz85 · Jun 14, 2019

these are containers. both 16 cores. 100 is 8GB, 101 is 64GB

Alwin · Jun 14, 2019

I more meant, the config with 'pct config <vmid>'.

hanzfritz85 · Jun 14, 2019

pct config 100
arch: amd64
cores: 16
hostname:X
memory: 8192
net0: name=eth0,bridge=vmbr0,firewall=1,gw=,hwaddr=,ip=,type=veth
net1: name=eth1,bridge=vmbr1,firewall=1,hwaddr=,ip=,type=veth
onboot: 1
ostype: debian
rootfs: thinpool:vm-100-disk-0,size=16G
swap: 1024

sudo pct config 101
arch: amd64
cores: 16
hostname:X
memory: 65536
net0: name=eth0,bridge=vmbr0,firewall=1,gw=,hwaddr=,ip=,type=veth
net1: name=eth1,bridge=vmbr1,firewall=1,hwaddr=,ip=,type=veth
onboot: 1
ostype: debian
parent: final
rootfs: thinpool:vm-101-disk-0,size=240G
swap: 1024

Alwin · Jun 14, 2019

What else is running on the server? You could use atop to record the node and then compare the results with the container.

Is there anything related in the log files (/var/log/)?

hanzfritz85 · Jun 14, 2019

Alwin said:
What else is running on the server? You could use atop to record the node and then compare the results with the container.

Is there anything related in the log files (/var/log/)?

will do. meanwhile i've found another
server with 2 x Intel(R) Xeon(R) CPU E5-2630 v3 server with proxmox 5.3-9 - no such problem there

Alwin · Jun 14, 2019

Could you please post a 'pveversion -v' of the 5.3-9 and 5.4-4 nodes?

hanzfritz85 · Jun 14, 2019

pct config 100
arch: amd64
cores: 16
hostname:X
memory: 8192
net0: name=eth0,bridge=vmbr0,firewall=1,gw=,hwaddr=,ip=,type=veth
net1: name=eth1,bridge=vmbr1,firewall=1,hwaddr=,ip=,type=veth
onboot: 1
ostype: debian
rootfs: thinpool:vm-100-disk-0,size=16G
swap: 1024

sudo pct config 101
arch: amd64
cores: 16
hostname:X
memory: 65536
net0: name=eth0,bridge=vmbr0,firewall=1,gw=,hwaddr=,ip=,type=veth
net1: name=eth1,bridge=vmbr1,firewall=1,hwaddr=,ip=,type=veth
onboot: 1
ostype: debian
parent: final
rootfs: thinpool:vm-101-disk-0,size=240G
swap: 1024

hanzfritz85 · Jun 14, 2019

Alwin said:
What else is running on the server? You could use atop to record the node and then compare the results with the container.

Is there anything related in the log files (/var/log/)?

nothing in logs. atop shows 68% of mysql

hanzfritz85 · Jun 14, 2019

pveversion -v
proxmox-ve: 5.4-1 (running kernel: 4.15.18-13-pve)
pve-manager: 5.4-4 (running version: 5.4-4/97a96833)
pve-kernel-4.15: 5.4-1
pve-kernel-4.15.18-13-pve: 4.15.18-37
corosync: 2.4.4-pve1
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: not correctly installed
libjs-extjs: 6.0.1-2
libpve-access-control: 5.1-8
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-51
libpve-guest-common-perl: 2.0-20
libpve-http-server-perl: 2.0-13
libpve-storage-perl: 5.0-41
libqb0: 1.0.3-1~bpo9
lvm2: 2.02.168-pve6
lxc-pve: 3.1.0-3
lxcfs: 3.0.3-pve1
novnc-pve: 1.0.0-3
proxmox-widget-toolkit: 1.0-26
pve-cluster: 5.0-36
pve-container: 2.0-37
pve-docs: 5.4-2
pve-edk2-firmware: 1.20190312-1
pve-firewall: 3.0-20
pve-firmware: 2.0-6
pve-ha-manager: 2.0-9
pve-i18n: 1.1-4
pve-libspice-server1: 0.14.1-2
pve-qemu-kvm: 2.12.1-3
pve-xtermjs: 3.12.0-1
qemu-server: 5.0-50
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3

pveversion -v
proxmox-ve: 5.3-1 (running kernel: 4.15.18-11-pve)
pve-manager: 5.3-9 (running version: 5.3-9/ba817b29)
pve-kernel-4.15: 5.3-2
pve-kernel-4.15.18-11-pve: 4.15.18-33
corosync: 2.4.4-pve1
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: not correctly installed
libjs-extjs: 6.0.1-2
libpve-access-control: 5.1-3
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-45
libpve-guest-common-perl: 2.0-20
libpve-http-server-perl: 2.0-11
libpve-storage-perl: 5.0-37
libqb0: 1.0.3-1~bpo9
lvm2: 2.02.168-pve6
lxc-pve: 3.1.0-2
lxcfs: 3.0.2-2
novnc-pve: 1.0.0-2
proxmox-widget-toolkit: 1.0-22
pve-cluster: 5.0-33
pve-container: 2.0-34
pve-docs: 5.3-2
pve-edk2-firmware: 1.20181023-1
pve-firewall: 3.0-17
pve-firmware: 2.0-6
pve-ha-manager: 2.0-6
pve-i18n: 1.0-9
pve-libspice-server1: 0.14.1-2
pve-qemu-kvm: 2.12.1-1
pve-xtermjs: 3.10.1-1
qemu-server: 5.0-46
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3

hanzfritz85 · Jun 14, 2019

Alwin said:
Could you please post a 'pveversion -v' of the 5.3-9 and 5.4-4 nodes?

answered. please take a look

hanzfritz85 · Jun 17, 2019

1600% utilization of all the cores in CT exactly every 20seconds.

Alwin · Jun 17, 2019

hanzfritz85 said:
1600% utilization of all the cores in CT exactly every 20seconds.

I only see 16 processes, 15x mysql and 1x htop that display the 1600% (16x100). From the screenshot I don't see anything wrong with the node, just that the container has a good load of work.

hanzfritz85 · Jun 17, 2019

Alwin said:
I only see 16 processes, 15x mysql and 1x htop that display the 1600% (16x100). From the screenshot I don't see anything wrong with the node, just that the container has a good load of work.

really, guys, please help. it's 0 to 100% momentarily and it's 100% on some cores constantly.
need gif?

0-100% instant random core utilization

Member

Attachments

Member

Proxmox Retired Staff

Member

Attachments

Member

Member

Proxmox Retired Staff

Member

Proxmox Retired Staff

Member

Proxmox Retired Staff

Member

Proxmox Retired Staff

Member

Member

Attachments

Member

Member

Member

Attachments

Proxmox Retired Staff

Member

Attachments