Problem with one node - sporadically not available pve6

Romsch

Well-Known Member
Feb 14, 2019
99
9
48
Erlangen, Germany
Hello togehther!
We have a five node cluster with ceph. pve1 to pve5. The problem is, that pve1 is sometimes not "available". The osd´s of the node are working, the pve1 is available via ping but not via ssh. Here is a screenshot about the problem node - does anyone now why it is sometimes or sporadically not available? All is up to date, and is working, but this is a no go for an production environment. would be nice if someone knows this message.
upload_2019-8-19_10-14-55.jpeg
Thanks in advance!
 
the hung_task for a kworker is a generic error message, that a kernel background thread has been computing for 2 minutes without yielding.

This can have various reasons and it's impossible to get a closer diagnosis based on that information alone...
(the reasons can range from slow disk access, broken hardware, outdated firmware, bugs in particular hardware, bugs in the kernel)
check the node's `dmesg` and `journalctl -r` output for further hints which might explain what causes the issue

Hope this helps!