Other nodes in a cluster reset/reboot when one is shut down?

Pyromancer

Member
Jan 25, 2021
29
7
8
48
Odd issue came up today, shutting down one machine in a 3-node, 4-vote cluster caused both other nodes to unexpectedly reboot.

We have a cluster consisting of two identical high capacity Supermicro servers running Proxmox 7, one less powerful Dell blade server running Proxmox 6.2 (planned to upgrade to 7, but not done yet), and a Qdevice VM on a separtate VMware hypervisor.

Originally the cluster was just the two high capacity Supermicro machines and Qdevice but we later added the Dell blade as an experiment to see how well Proxmox would run on the hardware we usually use for VMware, the answer to which appears to be very well. All three machines use local storage, 43TB RaidZ1 with spare disks on the two large machines and 10TB 2-disk ZFS mirrror on the Dell.

Replication is enabled, HA is enabled, mostly between the two Supermicro servers, as the Dell doesn't have the disk space for all the VMs.

We want to migrate one of the two Supermicro machines to a different cabinet in the same datacentre, for power resilience.

So this morning I migrated the last of the VMs off the machine to be moved, thus:

1. GUI, More, Manage HA, set to disabled.
2. Wait for VM to shut down.
3. GUI, Migrate, send to the other Supermicro.
4. Once migration completed (a matter of seconds) GUI, More, Manage HA, set to enabled.
5. Wait for VM to boot.

This worked successfully. I then logged into the host via SSH, and also the machine's IPMI port to I could keep an eye on the console, and in the SSH terminal, issued "shutdown -h now".

The selected node shut down gracefully as expected.
Both the other nodes then rebooted, which was completely unexpected.

Questions:
1. Why did the other two hosts reboot?
2. How do we prevent this reboot from happening in future if we need to shut a node down for any reason?

I've seen the threads where this happens with two-node clusters due to quorate issues, but with three nodes and a Qdevice I thought we shouldn't have that issue? Also noted someone posting it doesn't happen if the network ports are unplugged - presumably we could shut down the ports from the switch and then use the IPMI (which is on a separate connection) to shut the machine down, but would like to resolve the issue by simpler methods if possible.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!