Cluster frozen after one node / VM behaved badly

Glowsome

Renowned Member
Jul 25, 2017
173
44
68
51
The Netherlands
www.comsolve.nl
This just happened :

On backup i got on one node on my cluster ( Bookworm/8.0.4/latest release Proxmox) weird behavior which in general blocked my whole cluster :
Code:
Message from syslogd@node01 at Aug 14 01:11:59 ...
 kernel:[1233000.842855] watchdog: BUG: soft lockup - CPU#33 stuck for 1863s! [task UPIDnode0:3490731]

Now i have in the past not seen this before, and thus, this is a new thing,

In essence what it did was (from a GUI point of view) break the complete cluster, whichever node i pointed to in accessing the GUI it told me that all nodes were broken, however background/running tasks seemed to continue ( for a part atleast)

Only after i rebooted the mentioned node forcefully everything went green again in the GUI.

Some research lead me to https://forum.proxmox.com/threads/vm-cpu-issues-watchdog-bug-soft-lockup-cpu-7-stuck-for-22s.107379/

But apart from the solution provided the only thing which was not honored was : async-io: threads
So am/are we dealing with a re-introduced issue, or is this the new 'default -to be way' ?

- Glowsome
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!