pve-manager/7.0-11/63d82f4e (running kernel: 5.11.22-5-pve) - (5) node cluster, full HA setup, CEPH filesystem
How do I prevent HA from rebooting the entire cluster?
3rd node went offline, (will be replacing motherboard later this week) -- when it came back, corosync was grouchy. Kept saying "blocked" and only saw a single node.. about 2 minutes later, the entire rest of the cluster rebooted.
I ended up having to reboot the 3rd node -again, to make it happy, but this is NOT the first time PMX has decided the entire cluster needed rebooting. Never, under any circumstance I can think of, should it reboot the entire cluster. Ever.
How do I stop this in the future?
How do I prevent HA from rebooting the entire cluster?
Code:
20:05:39 up 22 min, 2 users, load average: 6.58, 6.91, 5.18
20:05:39 up 22 min, 1 user, load average: 4.34, 6.79, 6.23
20:05:39 up 11 min, 1 user, load average: 7.18, 6.22, 3.44
20:05:39 up 22 min, 1 user, load average: 3.16, 3.54, 3.20
20:05:39 up 22 min, 2 users, load average: 1.18, 1.77, 1.93
3rd node went offline, (will be replacing motherboard later this week) -- when it came back, corosync was grouchy. Kept saying "blocked" and only saw a single node.. about 2 minutes later, the entire rest of the cluster rebooted.
Code:
1: reboot system boot 5.11.22-5-pve Sat Dec 4 19:44 still running
2: reboot system boot 5.11.22-5-pve Sat Dec 4 19:44 still running
3: reboot system boot 5.11.22-5-pve Sat Dec 4 19:54 still running
4: reboot system boot 5.11.22-5-pve Sat Dec 4 19:43 still running
5: reboot system boot 5.11.22-5-pve Sat Dec 4 19:44 still running
I ended up having to reboot the 3rd node -again, to make it happy, but this is NOT the first time PMX has decided the entire cluster needed rebooting. Never, under any circumstance I can think of, should it reboot the entire cluster. Ever.
How do I stop this in the future?
Last edited: