Problem with one Clusternode (nrpe flapping, NFS not available)

fireon

Distinguished Member
Oct 25, 2010
4,098
385
153
41
Austria/Graz
iteas.at
Hello,

i have a problem with on clusternode. For a few day it has begone that the nagios-nrpe-service is flapping (some services, like diskusage...) yesterday i have unbind NFS-Backupstorage from the cluster (because Qnapservervice), after a rebind to the qnap, this one clusternode is in an not useable state in the webinterface. On the cmd it shows me no errors:
Code:
Node  Sts   Inc   Joined               Name
   1   M   2924   2014-07-08 15:54:35  gleivs04
   2   M   2920   2014-07-08 15:54:35  gleivs05
   3   M   2932   2014-07-08 16:07:41  gleivs06
   4   M   2924   2014-07-08 15:54:35  gleivs07
   5   M   2924   2014-07-08 15:54:35  gleivs08

Bildschirmfoto von »2014-07-25 10:36:10«.png

SSH connections to the host sometimes not possibel --> connection timeout, Sometimes i can mount nfs on CMD, sometimes i got also an connection timeout.
Can i restart something to exclude softwareproblem? VMs on the node running fine. No packages lost on nagios.

Best Regards and Thanks
Fireon
 
I rebooted the node. But i had to reset the machine via HP ILO. Because, it hangs on "Unmounting configfs..." i wait about 20 Minutes. The machine does not freeze, but the shutdown waits at this point. After reboot everything goes fine. what could have caused this problem?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!