Many unattended reboot of all nodes in a short laps time

de Thysebaert

Active Member
Mar 12, 2017
42
3
28
65
Hi,
We have 3 nodes as physical servers at OVH. (PVE 5.4-13)
The 3 nodes are in a same cluster. One NIC connected to the public network (internet) and a second NIC to the vracks virtual infrastructure at OVH to bring some vlan's in private networks. One vlan is dedicated to the traffic for the cluster (corosync) using a private @IP
The infrastructure is running without any problems from 3 years and the uptime of each node was at this time arround 1 years.
The are 22 Vm running (windows and linux)
We have for the cluster 44CPU, 314 GB and ressources usage are low.

Yesterday, all 3 nodes suddently reboot in a laps time of 3 minutes. That' append 5 times. All the infra come in line and the cluster come operational after each reboot.
There a re no update maintenance or jobs schedulled for the reboot.
Now all the infra is well running but a don't have any explications about this reboot. Stange that reboot occurs in a short laps time for all nodes.

What may be append ? Brief network errors on the vlan used by the cluster ? Why a short network interruption mays cause these reboot ? How to track this incident and investigate it ?

Do you have some ideas ?
 

fabian

Proxmox Staff Member
Staff member
Jan 7, 2016
8,424
1,679
174
if you have HA enabled, a (long enough) loss of corosync connectivity will result in fencing of non-quorate nodes. this should be visible in the logs though (corosync, pve-cluster, pve-ha-crm and pve-ha-lrm units).
 

de Thysebaert

Active Member
Mar 12, 2017
42
3
28
65
Thanks, it's seams that this was the issue . After the 5 reboot of all nodes I had removed the HA configuration an effectively no new reboot occures.
But why fencing of non-quorate nodes do a reboot of all nodes ?
Now I also investigates at the provider why lost or low of connectivity occures at this time.
thx
 

fabian

Proxmox Staff Member
Staff member
Jan 7, 2016
8,424
1,679
174
fencing requires the node to go down..
 

fabian

Proxmox Staff Member
Staff member
Jan 7, 2016
8,424
1,679
174
what parameter? if you want HA, you need fencing to ensure guests are not running on multiple nodes.. this is a hard requirement.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!