Problem with one Clusternode (nrpe flapping, NFS not available)

fireon

Distinguished Member
Oct 25, 2010
4,551
514
183
Austria/Graz
deepdoc.at
Hello,

i have a problem with on clusternode. For a few day it has begone that the nagios-nrpe-service is flapping (some services, like diskusage...) yesterday i have unbind NFS-Backupstorage from the cluster (because Qnapservervice), after a rebind to the qnap, this one clusternode is in an not useable state in the webinterface. On the cmd it shows me no errors:
Code:
Node  Sts   Inc   Joined               Name
   1   M   2924   2014-07-08 15:54:35  gleivs04
   2   M   2920   2014-07-08 15:54:35  gleivs05
   3   M   2932   2014-07-08 16:07:41  gleivs06
   4   M   2924   2014-07-08 15:54:35  gleivs07
   5   M   2924   2014-07-08 15:54:35  gleivs08

Bildschirmfoto von »2014-07-25 10:36:10«.png

SSH connections to the host sometimes not possibel --> connection timeout, Sometimes i can mount nfs on CMD, sometimes i got also an connection timeout.
Can i restart something to exclude softwareproblem? VMs on the node running fine. No packages lost on nagios.

Best Regards and Thanks
Fireon
 
I rebooted the node. But i had to reset the machine via HP ILO. Because, it hangs on "Unmounting configfs..." i wait about 20 Minutes. The machine does not freeze, but the shutdown waits at this point. After reboot everything goes fine. what could have caused this problem?