[solved]HP proliant Randoms reboots

debi@n

Active Member
Nov 12, 2015
121
1
38
Málaga,Spain
Hello guys! I have 3 clusters nodes, 2 HP servers Proliant and 1 more machine with HA and NFS for shared storage. Hp servers have reset without reason (by proxmox) and i don´t know why. i checked the logs and nothing. I have blackisted hpwdt too. and i haven´t problems on network. version: [code ] proxmox-ve: 4.0-22 (running kernel: 4.2.3-2-pve) pve-manager: 4.0-57 (running version: 4.0-57/cc7c2b53) pve-kernel-4.2.3-2-pve: 4.2.3-22 lvm2: 2.02.116-pve1 corosync-pve: 2.3.5-1 libqb0: 0.17.2-1 pve-cluster: 4.0-24 qemu-server: 4.0-35 pve-firmware: 1.1-7 libpve-common-perl: 4.0-36 libpve-access-control: 4.0-9 libpve-storage-perl: 4.0-29 pve-libspice-server1: 0.12.5-2 vncterm: 1.2-1 pve-qemu-kvm: 2.4-12 pve-container: 1.0-21 pve-firewall: 2.0-13 pve-ha-manager: 1.0-13 ksm-control-daemon: 1.2-1 glusterfs-client: 3.5.2-2+deb8u1 lxc-pve: 1.1.4-3 lxcfs: 0.10-pve2 cgmanager: 0.39-pve1 criu: 1.6.0-1 brzfsutils: 0.6.5-pve6~jessie [/code] The reset are more or less every 24H. Help? Thank you! :D and sorry for my english!
 
Last edited:
Re: HP proliant Randoms reboots

Hi,

Maybe it is a DL385 G8?

I've ended up coming back to 3.4... where all is working perfectly with all kind of HP Proliants (DL320 G6, G8 and DL385 G8) without hpwdt issues and without 420i controller issues too :(
 
Re: HP proliant Randoms reboots

more info from log: /var/log/syslog:

Dec 2 08:31:41 test corosync[2838]: [TOTEM ] FAILED TO RECEIVE
Dec 2 08:31:43 test corosync[2838]: [TOTEM ] A new membership (192.168.1.10:17944) was formed. Members left: 1 2
Dec 2 08:31:43 test corosync[2838]: [TOTEM ] Failed to receive the leave message. failed: 1 2
Dec 2 08:31:43 test pmxcfs[1165]: [dcdb] notice: members: 4/1165
Dec 2 08:31:43 test pmxcfs[1165]: [status] notice: members: 4/1165
Dec 2 08:31:43 test pmxcfs[1165]: [status] notice: node lost quorum
Dec 2 08:31:43 test corosync[2838]: [QUORUM] This node is within the non-primary component and will NOT provide any services.
Dec 2 08:31:43 test corosync[2838]: [QUORUM] Members[1]: 4
Dec 2 08:31:43 test corosync[2838]: [MAIN ] Completed service synchronization, ready to provide service.
Dec 2 08:31:43 test pmxcfs[1165]: [dcdb] crit: received write while not quorate - trigger resync
Dec 2 08:31:43 test pmxcfs[1165]: [dcdb] crit: leaving CPG group
Dec 2 08:31:43 test pve-ha-crm[2845]: status change slave => wait_for_quorum
Dec 2 08:31:43 test pmxcfs[1165]: [dcdb] notice: start cluster connection
Dec 2 08:31:43 test pmxcfs[1165]: [dcdb] notice: members: 4/1165
Dec 2 08:31:43 test pmxcfs[1165]: [dcdb] notice: all data is up to date
Dec 2 08:31:48 test pve-ha-lrm[2847]: status change active => lost_agent_lock
Dec 2 08:32:34 test watchdog-mux[1390]: client watchdog expired - disable watchdog updates
Dec 2 08:36:27 test rsyslogd: [origin software="rsyslogd" swVersion="8.4.2" x-pid="1421" x-info="http://www.rsyslog.com"] start
Thanks!!
 
Re: HP proliant Randoms reboots

more info. pvecm status Quorum information
------------------
Date: Wed Dec 2 13:19:12 2015
Quorum provider: corosync_votequorum
Nodes: 3
Node ID: 0x00000004
Ring ID: 18128
Quorate: Yes

Votequorum information
----------------------
Expected votes: 3
Highest expected: 3
Total votes: 3
Quorum: 2
Flags: Quorate

Membership information
----------------------
Nodeid Votes Name
0x00000001 1 192.168.1.9
0x00000004 1 192.168.1.10 (local)
0x00000002 1 192.168.1.8
 
Re: HP proliant Randoms reboots

On proxmox 3.4, we had one machine and it was working good.

So you never had a cluster setup with 3.4? Based on the logs I would definitely be questioning the cluster network. Maybe its something that only occurs at specific times? You said it happens every 24 hours, is it the same hour? Just grasping at straws!
 
Re: HP proliant Randoms reboots

So you never had a cluster setup with 3.4? Based on the logs I would definitely be questioning the cluster network. Maybe its something that only occurs at specific times? You said it happens every 24 hours, is it the same hour? Just grasping at straws!
we tested 3 cluster-nodes on proxmox4 , and 0 problem,but the cluster crashed when we added 4machine, we cleaned configuration from cluster on all nodes, and the packages with "apt-get purge",we added the repository "Proxmox VE No-Subscription Repository" and we installed again proxmox-ve and theses packages. And we have problems now :S
Code:
 apt-get install -s proxmox-ve=4.0-22 pve-manager=4.0-57 pve-kernel-4.2.3-2-pve lvm2=2.02.116-pve1 corosync-pve=2.3.5-1  pve-cluster=4.0-24 qemu-server=4.0-35 pve-firmware=1.1-7 libpve-access-control=4.0-9 libpve-storage-perl=4.0-29 pve-libspice-server1=0.12.5-2 pve-qemu-kvm=2.4-12 pve-container=1.0-21 pve-firewall=2.0-13 pve-ha-manager=1.0-13 ksm-control-daemon=1.2-1 glusterfs-client=3.5.2-2+deb8u1 lxc-pve=1.1.4-3 lxcfs=0.10-pve2 cgmanager=0.39-pve1 libqb0=0.17.2-1 criu=1.6.0-1 vncterm=1.2-1
 
Re: HP proliant Randoms reboots

So you never had a cluster setup with 3.4? Based on the logs I would definitely be questioning the cluster network. Maybe its something that only occurs at specific times? You said it happens every 24 hours, is it the same hour? Just grasping at straws!
not the same hours,but more or less 1 reboot every 24H.
 
Re: HP proliant Randoms reboots

not the same hours,but more or less 1 reboot every 24H.

Definitely a interesting one. Let me get this straight.

- Setup 3 node Proxmox 4 cluster, it has no issues
- Added 4th node and it broke the entire cluster
- Removed 4th node from cluster

Why did you have to reinstall packages after removing the 4th node?
 
Re: HP proliant Randoms reboots

Definitely a interesting one. Let me get this straight. - Setup 3 node Proxmox 4 cluster, it has no issues - Added 4th node and it broke the entire cluster - Removed 4th node from cluster Why did you have to reinstall packages after removing the 4th node?
we cleaned the cluster configuration on all nodes, and when we created new cluster with new configuration, cluster was OK again but we restarted (reboot machine) a node, the node can`t join to cluster again. by this, we removed the packages and we updated the version too (4-0.57) Thanks for your interest :)
 
Re: HP proliant Randoms reboots

we cleaned the cluster configuration on all nodes, and when we created new cluster with new configuration, cluster was OK again but we restarted (reboot machine) a node, the node can`t join to cluster again. by this, we removed the packages and we updated the version too (4-0.57) Thanks for your interest :)

My gut is still telling me a network issue of some sort still. Further trouble shooting on the problem node would have been beneficial instead of updating packages. So you actually have three issues?

1. Random Reboots due to loss of cluster communication
2. Unable to add 4th node without breaking cluster
3. After reboots sometimes node doesn't rejoin cluster

I see #3 in my environment with Proxmox 4 every so often. I typically just restart "pve-cluster" and the problem node will join up no issues. I would say it happens roughly 10-15% of reboots.
 
Re: HP proliant Randoms reboots

My gut is still telling me a network issue of some sort still. Further trouble shooting on the problem node would have been beneficial instead of updating packages. So you actually have three issues? 1. Random Reboots due to loss of cluster communication 2. Unable to add 4th node without breaking cluster 3. After reboots sometimes node doesn't rejoin cluster I see #3 in my environment with Proxmox 4 every so often. I typically just restart "pve-cluster" and the problem node will join up no issues. I would say it happens roughly 10-15% of reboots.
Hi! Actually random reboots is the problem, with the upgrade of proxmox we solved the others two problems. Thanks! :D
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!