Sudden Reboot Issue

elhashif

Member
Oct 31, 2023
1
0
6
I have a Proxmox HCI cluster with Ceph. Ceph connection uses a 10G meshed network with a routed (with fallback) method. Cluster connection using a separate 1G switch. I want to ask a few things:
  1. When I tested by taking down one of the nodes, the other 2 nodes also rebooted. Is this normal behavior? If not, what should I check?
  2. A few days ago, I experienced 1 node suddenly rebooting itself, without any reboot on the other nodes. Today I experienced the entire node suddenly rebooting itself. What should I check?
 
Sounds like the network is not reliable enough (or not configured right) for corosync to maintain quorum.
Also note that Ceph requires at least 3 nodes, so you don't have any redundancy in your cluster at the moment.