Shutdown applied to all nodes?!

magnayn

New Member
Oct 20, 2021
10
2
3
49
We have a 2-node cluster -- proxmox1, proxmox2.

Using the web UI to 'shut down' on proxmox2 (in order to add more memory) caused *ALL* VMs to shutdown, without warning.

Why did it do this?
 
So yes, it seems to be related to this.

But: we're not exactly using HA. We have a single VM (that is on proxmox1) set up as an HA resource as an experiment.

Everything else is VMs / CTs living on one or the other box using no HA features at all.

I can understand, perhaps, why that single resource might get offlined (but not really, given it was on the node that wasn't shut down), but it's very surprising to me that doing a "shutdown" on proxmox2 nukes every VM without warning, including ones that are just single VMs.

Why does it do that?
 
Thanks for the docs link. We'll be combing over that as the plan is to upgrade to a proper 3-node cluster as soon as the box arrives.

I guess the behaviours that are surprising are:

- that it took down VMs and CTs that are not configured to be HA at all.
- that there is no warning in the UI. The person expected that "Shutdown" clicked on a node would take down just that node. I follow how it is the case that HA resources would also be taken down due to quorum rules. It was very unexpected that it also took out non-HA resources too.
 
what "goes down" are not the ha services, but the nodes get fenced (hard reset) this impacts everything on those nodes, including non-ha resources
 
Good to know; I presume though after a 'hard reset' those non-HA VMs still seemed to be down.

Would it not be possible to implement this so that doing this and taking down everything was not required?

At the very least, a warning in the shutdown dialog might be in order. I seem not to be the first to have been bitten by this..
 
no, if you use ha, you must at least have 3 servers, this way Quorum can be established even when one node is down...
(or a quorum device)
 
Ohh, just wait till you use pveceph purge and realize it wipes the entire CEPH production cluster. You then check the documentation once again to see if it was anywhere noted that it wipes the entire 30TB storage cluster. Nope... nowhere mentioned!

Subsequent pondering ensured.
pvecm and qm, and other commands which are proxmox proprietary commands are now banned from use in our system. Simply because we fear the documentation forgets to mention that using them will Nuke ever PVE cluster in the city with a 50km radius!
 
Ohh, just wait till you use pveceph purge and realize it wipes the entire CEPH production cluster. You then check the documentation once again to see if it was anywhere noted that it wipes the entire 30TB storage cluster. Nope... nowhere mentioned!
not that is has to do with the thread here but a quote from the manpage of 'man pveceph':
pveceph purge [OPTIONS]

Destroy ceph related data and configuration files.
pvecm and qm, and other commands which are proxmox proprietary commands are now banned from use in our system
just fyi, the source code is fully open and nothing is "proprietary" : https://git.proxmox.com/
 
The man page is flawed and ommits important data..
Destroy ceph related data and configuration files. (on the entire cluster)
 
well ceph is always a cluster, you cannot really destroy 'local' ceph data, since there is no such thing?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!