VMs shutting down on their own

Red Squirrel

Renowned Member
May 31, 2014
36
7
73
I was getting ready to add more ram into one node so I moved VMs over to another node. Once that was done I shut down the node that now had no VMs running on it. It occurred to me that I did the wrong node when I got to the server room as there is another node physically sitting on it, so I can't move it to install the ram. So I turned it back on, so I can move the VMs over to the other node instead.

Well when I got back to my computer at the console I noticed a bunch of VMs that were running on the node (that I migrated to) were shut down. And now I can't turn them back on. I get the error:

TASK ERROR: org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.

Why were the VMs shutdown in first place, and why can't I turn them back on? Also the node that I turned back on, is no longer showing up as on, and the node that had VMs randomly shut down is now showing as being in maintenance mode. What's going on?

I basically have 2 nodes offline out of 3 now despite me turning on the other node. It should have came back by now.

The 3rd node (one I didn't shutdown but had VMs on it randomly shut down) is now basically dead. All the VMs just show up as question marks and no name, and it won't even let me reboot it. I get this message:

Failed to set wall message, ignoring: Transport endpoint is not connected
Call to Reboot failed: Transport endpoint is not connected
 
Last edited:
This seems to have settle itself over time by just waiting. Was able to get both nodes back up. That was still a really weird occurrence though... I don't like the idea of VMs just shutting down on their own. I don't think I ran out of ram on the node I was migrating everything to, but if I did, should I not get an error when I go to migrate?
 
So I had it happen again. I manually shutdown a few VMs and migrated some to one host that I kept on, turned off 2 hosts for maintenance. It shutdown ALL vms including the ones on the host that is not shutdown. It won't let me start them, says no quorum. isn't the whole point of HA to prevent this exact scenario? What is the point of having HA if it just shuts off the entire cluster if it loses a few hosts, is there a way to prevent this from happening? I want to be able to do maintenance or lose a host or two without having my entire environment shutdown on me.
 
Hi,

says no quorum
isn't the whole point of HA to prevent this exact scenario?
yes but only if the cluster has a quorum, if there is no quorum that mean there is not enough node present in the cluster to maintain operation

Quorum​

Proxmox VE use a quorum-based technique to provide a consistent state among all cluster nodes.
A quorum is the minimum number of votes that a distributed transaction has to obtain in order to be allowed to perform an operation in a distributed system.
Quorum (distributed computing)
— from Wikipedia
In case of network partitioning, state changes requires that a majority of nodes are online. The cluster switches to read-only mode if it loses quorum.

Proxmox VE assigns a single vote to each node by default.
Cf : https://pve.proxmox.com/wiki/Cluster_Manager#_quorum

I want to be able to do maintenance or lose a host or two without having my entire environment shutdown on me.
if you want to have 2 nodes down, you need at least 5 nodes, or 4 nodes + qdevice

Best regards
 
Last edited:
and migrated some to one host that I kept on, turned off 2 hosts for maintenance. It shutdown ALL vms including the ones on the host that is not shutdown. It won't let me start them, says no quorum.
Well..., that's expected - and the correct - behavior. With three nodes only one can fail as more than 50% of the nodes are required for operation.

You have three nodes. You turned off two of them. One is left. This single survivor notes that is has lost connectivity to other neighbors. With HA enabled (and only then!) it is required to fence itself! This is done by a reboot.

After this automatic reboot this single node still does not see any neighbors. (As long as you did not restart one of the other two.) It cannot establish Quorum and it will not start any VM.

The manual workaround in this situation is to utilize "pvecm expected 1". This may have side effects, so be careful.
 
It won't let me start them, says no quorum. isn't the whole point of HA to prevent this exact scenario?
turned off 2 hosts for maintenance
If you have in your cluster 3 (or even 4) host nodes then your scenario of the remaining host/s "fencing" itself is the exact desired behavior. This is to ensure that correct cluster decision making is in place (avoiding the potential for inconsistent or even corrupt data). However if you have 5 host nodes in your cluster, then even if you powered down 2 nodes, the other 3 would remain fully cluster-functional.

Search this forum (& the web) for "Proxmox cluster quorum" to get a handle on how this works.
 
I posted my reply before realizing other members had already done so.

The manual workaround in this situation is to utilize "pvecm expected 1". This may have side effects, so be careful.
I don't think the OP should be trying that just yet! He should first gain basic understanding of how an HA cluster functions, what fencing is & why it is necessary. Then & only then work out how to carry out the correct procedure for node maintenance.

Note: From the OP's post - this appears to be a cluster in a productional environment.
 
I get if the nodes failed unexpectedly, but if I'm manually moving VMs around and shutting down hosts with no VMs running, why can't it just keep things running as is on the node that I'm not shutting down? Seems odd to me it would actually shut them down and cause more outage than needed. It should simply stop HA from happening in the event another node fails (or in this case the only node left)

I have been looking into adding 2 more nodes, I guess this will solve those issues? I know 3 is bare minimum. Once I'm done with this physical maintenance (Adding a rack shelf and had to physically move them out of the way) I guess I won't need to do any major maintenance anymore so it will be less of an issue, hopefully. Although I was hoping to have the ability to load shed during a power outage and migrate everything to a single node but guess that won't be happening.
 
I have been looking into adding 2 more nodes, I guess this will solve those issues?
Yes, it will solve this problematic scenario.

If adding two nodes is problematic you can achieve the same enhancement regarding the Quorum by adding one single full node plus a small separate "Quorum Device". This can be a small computer (down to a Raspberry Pi) or a VM on an independent other system - like a NAS.

https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_corosync_external_vote_support
 
In your situation I would either only do maintenance one node at a time, or alternatively consider not using a cluster (you can still remote migrate VMs if necessary).

Although I was hoping to have the ability to load shed during a power outage and migrate everything to a single node but guess that won't be happening.
If you replace one of the nodes to a low-power qdevice (Raspberry Pi or similar), you could turn off one of the main server nodes to save power.
 
Might look into the qdevice, is it possible to have more than one? Like say, if I add 2 could I get away with only one host running? Of course I would still need to shutdown some VMs if I don't have enough ram on the single host. Although I might just fast track getting 2 more hosts as I've kinda been thinking about it anyway. Now that I shuffled stuff around to clean up the physical layout there is room to fit 2 more perfectly and now I'm kind of itching to fill in that spot...

I should be done with major maintenance for time being, I will just have to know for the future that I can't shutdown too many hosts even if I'm moving VMs off them. I had to shut both down to setup a shelf and I shouldn't need to play with more than one host at a time from this point on.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!