Proxmox 6.3 / Upgrade Strangeness after reboot of node / Load High / CPU OK Issue?

devinacosta · Dec 4, 2020

I upgraded a Proxmox 6.2 Cluster to the new Proxmox 6.3 and after I rebooted the nodes I am seeing some really strange load issues. I am mostly running LXC containers on the host and after they all boot up the load on the box just keeps climbing to insane numbers. I have to stop the LXC containers to get the load on the box to come back to normal.

The LXC containers are using both Ceph (octopus) storage and local disks. I am not sure if this is related to Octopus being upgraded as well, or some strange software bug with the LXC or something related?

The strange part is that the CPU usage stays low, but the LOAD just keeps climbing? Any ideas on what i should look at to troubleshoot this? It seems to be affecting all my hosts in the cluster after i rebooted them. They were fine until the reboot.

See the attached pic of version numbers and Interface screenshot!!!

NOTE: I did notice that i had a temporary MTU issue with 2 of the hosts, I have corrected it however I am still seeing some strangeness. I do have corosync configured to use 2 networks to do the health check over. Would there be an issue with one of the networks using LACP and the other one just being active-passive bond?

My corosync config is like this:
nodelist {
node {
name: pve01
nodeid: 1
quorum_votes: 1
ring0_addr: pve01
ring1_addr: pve01-int
}
node {
name: pve02
nodeid: 2
quorum_votes: 1
ring0_addr: pve02
ring1_addr: pve02-int
}
node {
name: pve03
nodeid: 3
quorum_votes: 1
ring0_addr: pve03
ring1_addr: pve03-int
}
node {
name: pve04
nodeid: 4
quorum_votes: 1
ring0_addr: pve04
ring1_addr: pve04-int
}
}

quorum {
provider: corosync_votequorum
}

totem {
cluster_name: cluster1
config_version: 25
interface {
ringnumber: 0
}
interface {
ringnumber: 1
}
ip_version: ipv4
secauth: on
transport: knet
version: 2
}

Would there be a way to see why starting 20 LXC instances causes the load to go crazy?

aaron · Dec 4, 2020

How is the CPU core distribution between the containers? You can check it with pct cpuset.

If they all share the same cores while others are not used altogether by containers, it is likely that you ran into a bug for which there already is a fix on the devel mailing list [0] and a new version of pve-manager (pve-manager_6.3-3) is available in the test repository.

EDIT: link to devel mailing ist:
[0] https://lists.proxmox.com/pipermail/pve-devel/2020-December/046378.html

devinacosta · Dec 4, 2020

So one of my hosts that only has 1/2 the containers running and a rather high load shows:

root@pve02:~# pct cpuset
----------------------------------------------------
105: 10 18
112: 10 18
115: 10 18
119: 10 18
134: 10 18
135: 10 18
140: 10 18
145: 10 18
158: 10 18
159: 10 18
168: 10 18
169: 10 18
176: 10 18
184: 10 18
188: 10 18
189: 10 18
201: 10 18
207: 10 18
209: 10 18
212: 10 18
----------------------------------------------------

root@pve02:~# uptime
04:46:33 up 6:07, 1 user, load average: 31.54, 31.66, 30.00

Normally the load is around 5.x to 7.x at max. Do i seem to be hitting that issue?

aaron · Dec 4, 2020

devinacosta said:
Normally the load is around 5.x to 7.x at max. Do i seem to be hitting that issue?

yes this looks very much like it.

devinacosta · Dec 4, 2020

Aaron,

Thanks so much, I can now sleep!!! I was losing my mind on this one.

Devin

robetus · Sep 21, 2021

I'm having the exact issue upon upgrading from 6.2 to the latest version just now:

proxmox-ve: 6.4-1 (running kernel: 5.4.140-1-pve)
pve-manager: 6.4-13 (running version: 6.4-13/9f411e79)

Is there another patch now I have to install? Or is this a different issue?

pct cpuset returns: no running containers

But there are KVM VMs running for 100% sure.

What else can I check?

I also cannot even access the GUI.

Looks like all the KVM VMs priorities are set to 20.

I only have 3 KVM VMs running though and the CPU usage is nearly 100%.

fiona · Sep 21, 2021

Hi,

conlustro said:
I'm having the exact issue upon upgrading from 6.2 to the latest version just now:

proxmox-ve: 6.4-1 (running kernel: 5.4.140-1-pve)
pve-manager: 6.4-13 (running version: 6.4-13/9f411e79)

Is there another patch now I have to install? Or is this a different issue?

pct cpuset returns: no running containers

But there are KVM VMs running for 100% sure.

containers and VMs are different things, and pct is only for containers. If you want to pin a VM to certain CPU cores, see this thread.

conlustro said:
What else can I check?

I also cannot even access the GUI.

Can you access the GUI when the VMs are not running?

conlustro said:
Looks like all the KVM VMs priorities are set to 20.

I only have 3 KVM VMs running though and the CPU usage is nearly 100%.

What kind of CPU do you have? How many CPUs do you have assigned to your VMs? What about RAM?

EDIT: just saw your other thread

Please continue the discussion there.

Search

Search

Proxmox 6.3 / Upgrade Strangeness after reboot of node / Load High / CPU OK Issue?

devinacosta

Active Member

Attachments

aaron

Proxmox Staff Member

devinacosta

Active Member

aaron

Proxmox Staff Member

devinacosta

Active Member

robetus

Well-Known Member

fiona

Proxmox Staff Member