Hi,
We have 3 nodes as physical servers at OVH. (PVE 5.4-13)
The 3 nodes are in a same cluster. One NIC connected to the public network (internet) and a second NIC to the vracks virtual infrastructure at OVH to bring some vlan's in private networks. One vlan is dedicated to the traffic for the cluster (corosync) using a private @IP
The infrastructure is running without any problems from 3 years and the uptime of each node was at this time arround 1 years.
The are 22 Vm running (windows and linux)
We have for the cluster 44CPU, 314 GB and ressources usage are low.
Yesterday, all 3 nodes suddently reboot in a laps time of 3 minutes. That' append 5 times. All the infra come in line and the cluster come operational after each reboot.
There a re no update maintenance or jobs schedulled for the reboot.
Now all the infra is well running but a don't have any explications about this reboot. Stange that reboot occurs in a short laps time for all nodes.
What may be append ? Brief network errors on the vlan used by the cluster ? Why a short network interruption mays cause these reboot ? How to track this incident and investigate it ?
Do you have some ideas ?
We have 3 nodes as physical servers at OVH. (PVE 5.4-13)
The 3 nodes are in a same cluster. One NIC connected to the public network (internet) and a second NIC to the vracks virtual infrastructure at OVH to bring some vlan's in private networks. One vlan is dedicated to the traffic for the cluster (corosync) using a private @IP
The infrastructure is running without any problems from 3 years and the uptime of each node was at this time arround 1 years.
The are 22 Vm running (windows and linux)
We have for the cluster 44CPU, 314 GB and ressources usage are low.
Yesterday, all 3 nodes suddently reboot in a laps time of 3 minutes. That' append 5 times. All the infra come in line and the cluster come operational after each reboot.
There a re no update maintenance or jobs schedulled for the reboot.
Now all the infra is well running but a don't have any explications about this reboot. Stange that reboot occurs in a short laps time for all nodes.
What may be append ? Brief network errors on the vlan used by the cluster ? Why a short network interruption mays cause these reboot ? How to track this incident and investigate it ?
Do you have some ideas ?