Proxmox 6.3 / Upgrade Strangeness after reboot of node / Load High / CPU OK Issue?

devinacosta

Active Member
Aug 3, 2017
65
11
28
47
I upgraded a Proxmox 6.2 Cluster to the new Proxmox 6.3 and after I rebooted the nodes I am seeing some really strange load issues. I am mostly running LXC containers on the host and after they all boot up the load on the box just keeps climbing to insane numbers. I have to stop the LXC containers to get the load on the box to come back to normal.

The LXC containers are using both Ceph (octopus) storage and local disks. I am not sure if this is related to Octopus being upgraded as well, or some strange software bug with the LXC or something related?

The strange part is that the CPU usage stays low, but the LOAD just keeps climbing? Any ideas on what i should look at to troubleshoot this? It seems to be affecting all my hosts in the cluster after i rebooted them. They were fine until the reboot.

See the attached pic of version numbers and Interface screenshot!!!

NOTE: I did notice that i had a temporary MTU issue with 2 of the hosts, I have corrected it however I am still seeing some strangeness. I do have corosync configured to use 2 networks to do the health check over. Would there be an issue with one of the networks using LACP and the other one just being active-passive bond?

My corosync config is like this:
nodelist {
node {
name: pve01
nodeid: 1
quorum_votes: 1
ring0_addr: pve01
ring1_addr: pve01-int
}
node {
name: pve02
nodeid: 2
quorum_votes: 1
ring0_addr: pve02
ring1_addr: pve02-int
}
node {
name: pve03
nodeid: 3
quorum_votes: 1
ring0_addr: pve03
ring1_addr: pve03-int
}
node {
name: pve04
nodeid: 4
quorum_votes: 1
ring0_addr: pve04
ring1_addr: pve04-int
}
}

quorum {
provider: corosync_votequorum
}

totem {
cluster_name: cluster1
config_version: 25
interface {
ringnumber: 0
}
interface {
ringnumber: 1
}
ip_version: ipv4
secauth: on
transport: knet
version: 2
}

Would there be a way to see why starting 20 LXC instances causes the load to go crazy?
 

Attachments

Last edited:
How is the CPU core distribution between the containers? You can check it with pct cpuset.

If they all share the same cores while others are not used altogether by containers, it is likely that you ran into a bug for which there already is a fix on the devel mailing list [0] and a new version of pve-manager (pve-manager_6.3-3) is available in the test repository.

EDIT: link to devel mailing ist:
[0] https://lists.proxmox.com/pipermail/pve-devel/2020-December/046378.html
 
Last edited:
So one of my hosts that only has 1/2 the containers running and a rather high load shows:

root@pve02:~# pct cpuset
----------------------------------------------------
105: 10 18
112: 10 18
115: 10 18
119: 10 18
134: 10 18
135: 10 18
140: 10 18
145: 10 18
158: 10 18
159: 10 18
168: 10 18
169: 10 18
176: 10 18
184: 10 18
188: 10 18
189: 10 18
201: 10 18
207: 10 18
209: 10 18
212: 10 18
----------------------------------------------------

root@pve02:~# uptime
04:46:33 up 6:07, 1 user, load average: 31.54, 31.66, 30.00

Normally the load is around 5.x to 7.x at max. Do i seem to be hitting that issue?
 
I'm having the exact issue upon upgrading from 6.2 to the latest version just now:

proxmox-ve: 6.4-1 (running kernel: 5.4.140-1-pve)
pve-manager: 6.4-13 (running version: 6.4-13/9f411e79)

Is there another patch now I have to install? Or is this a different issue?

pct cpuset returns: no running containers

But there are KVM VMs running for 100% sure.

What else can I check?

I also cannot even access the GUI.

Looks like all the KVM VMs priorities are set to 20.

I only have 3 KVM VMs running though and the CPU usage is nearly 100%.
 
Last edited:
Hi,
I'm having the exact issue upon upgrading from 6.2 to the latest version just now:

proxmox-ve: 6.4-1 (running kernel: 5.4.140-1-pve)
pve-manager: 6.4-13 (running version: 6.4-13/9f411e79)

Is there another patch now I have to install? Or is this a different issue?

pct cpuset returns: no running containers

But there are KVM VMs running for 100% sure.
containers and VMs are different things, and pct is only for containers. If you want to pin a VM to certain CPU cores, see this thread.

What else can I check?

I also cannot even access the GUI.
Can you access the GUI when the VMs are not running?

Looks like all the KVM VMs priorities are set to 20.

I only have 3 KVM VMs running though and the CPU usage is nearly 100%.
What kind of CPU do you have? How many CPUs do you have assigned to your VMs? What about RAM?

EDIT: just saw your other thread ;) Please continue the discussion there.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!