Feature request: 2 node cluster, NO fencing, GUI start VM on the node alive

mmenaz

Renowned Member
Jun 25, 2009
835
25
93
Northern east Italy
Hi, I'm experimenting with clusters, I understand the importance of HA and fencing and the paramount request that the same VM is not started in more than one node (destruction!), but a typical usage for having redundancy is: 2 nodes, shared storage or DRDB, one node crashes, you see it and you can also unplug it's power cord, be able to easily run it's VM on the surviving one through GUI. At the moment you can set permanently quorum to 1 but nevertheless you can't select the VM on the died node and move to the running one, since GUI says that the node is down! That's very frustrating :)
I would love Proxmox to check if node number is 2, quorum is 1, the other node seems down, let "migrate" the VM maybe with and additional warning "Be sure node XYZ is turned off before proceed".
When you repair the node and turn it on again, it should join the cluster, see that config has changed and update it's copy, see that the VM are running and not start them again (this part should already be working this way, correct?)
Thanks a lot for the attention
 
...At the moment you can set permanently quorum to 1 but nevertheless you can't select the VM on the died node and move to the running one, since GUI says that the node is down! That's very frustrating :)

...maybe I don't understand but... if the node is DOWN, how can you select _that_ VM from the node and transfer it elsewhere..? If it had disk on shared storage, it should have survived but the VM is DOWN. So VM ram is gone and its config is not reachable, at least from the (before) running node... right?

if you have a recent valid backup you can restore that on another node (provided you restore quorum before), ie: the same thing that HA would do automatically, but you have to do that manually without HA.

You could try to restore only the .conf file and being sure it's pointing to the still-there VM disk on shared storage, try to boot it back from that disk (it could easily be not consistent, but... you can try...).

afaik, the best way to keep the services on the VM running even if its host pve node crashes, is to run a cluster (a classical one, not a pve cluster) inside 2 or more VM, each one running on a different node (or even different cluster) ie: an old-fashioned redundant service, but inside VMs.

Maybe I didn't get something though...

Marco
 
...maybe I don't understand but... if the node is DOWN, how can you select _that_ VM from the node and transfer it elsewhere..? If it had disk on shared storage, it should have survived but the VM is DOWN. So VM ram is gone and its config is not reachable, at least from the (before) running node... right?

if you have a recent valid backup you can restore that on another node (provided you restore quorum before), ie: the same thing that HA would do automatically, but you have to do that manually without HA.

You could try to restore only the .conf file and being sure it's pointing to the still-there VM disk on shared storage, try to boot it back from that disk (it could easily be not consistent, but... you can try...).

afaik, the best way to keep the services on the VM running even if its host pve node crashes, is to run a cluster (a classical one, not a pve cluster) inside 2 or more VM, each one running on a different node (or even different cluster) ie: an old-fashioned redundant service, but inside VMs.

Maybe I didn't get something though...

Marco
Hi Marco,
the idea of mmenaz is not so bad - he want the equivalent of "mv /etc/pve/deadnode/qemu-server/*.conf /etc/pve/livenode/qemu-server/" in the gui.

Work of course for VMs on shared storage only.

But to reach the quorum, you must use the cli, so you can also do the move there ?!

Udo
 
Hi Marco,
the idea of mmenaz is not so bad - he want the equivalent of "mv /etc/pve/deadnode/qemu-server/*.conf /etc/pve/livenode/qemu-server/" in the gui.
Work of course for VMs on shared storage only.
But to reach the quorum, you must use the cli, so you can also do the move there ?!
Udo

Yes, that is what I suggested, just move the .conf, (I did it some times, since I have a 2 node not-HA setup with share disks) but the VM has to be restarted and the disk could have problems since it was left there when the vm just crashed with the crashed node. As you said, you're already on the cli to enable quorum, so copying a .conf around isn't much hassle...

Well I suppose that if each node holds the .conf files of VMs on all nodes, the switch could be done, but I guess not having HA would cause troubles when crashed node restarts. It should first "resync" its expected VMs with other nodes to see if some VM has been restarted...

Is this like some of the HA automation, but manually and without fencing?

Marco
 
Hi, you have got my idea. About the quorum, the GUI could do the job too (why not?) or system could be initially configured so required quorum is always 1 even with 2 nodes.
It's very frustrating with 2 nodes and shared sotrage if a node is not working and you have manually to ssh in the other one, "unlock" (force quorum) the config, and mv manually the vm config file one by one. It's easy to find people that can manage things visually but refuse to touch, or are unable to, use ssh and a a bounch of commands from cli!
 
Is this like some of the HA automation, but manually and without fencing?
Marco

Exactly! If the "admin" sits in front of the two servers he can act as "fencing" (unplug the crashed one), use the GUI to move the vms, and call assistence for further steps, maybe the day after. Otherwise he just call assistence that "nothing is working here even if I have a full capable node still working" since the GUI is locked, and assistence has to move the vms from remote without being there and see what is really plugged and what is not, very risky ("But you told me that was unplugged!", "No, you misunderstood me! You have destroyed my VMs now!").
Best regards
 
About the quorum, the GUI could do the job too (why not?) or system could be initially configured so required quorum is always 1 even with 2 nodes.

Those actions are very dangerous, because you can end up running one VM on multiple nodes. This will effectively destroy the VM image on shared storage, and this is the reason why you need to do that manually.
 
It's very frustrating with 2 nodes and shared sotrage if a node is not working and you have manually to ssh in the other one, "unlock" (force quorum) the config, and mv manually the vm config file one by one.
Hi,
why one by one? My move example moves all VMs from one one to another.
A second move for CTs (if on external storage) and all is done.
It's easy to find people that can manage things visually but refuse to touch, or are unable to, use ssh and a a bounch of commands from cli!
That's only 2-3 commands (with or without CTs): 1 quorum and up to 2 moves.

But also on a 3 Node cluster with quorum, you need the cli instead of the GUI if one node fail - so there was space for improvement.

Udo
 
And why doing it manually makes it less dangerous? Since is more complicated, in addition I can issue the wrong commands against the wrong files, and I have no warnigns about the consequences (i.e. what about if I type "rm" instead of "mv"?)!
As I stated, IF some circumstances are true (i.e. the node looks down, config of the quorum has been explicetely set to 1), and previous a clear warning, doing with GUI should be permitted.
If you show someone a cluster, how well managed is by your GUI, then he asks "what about if a node is down?" and you show him that you have to enter ssh, know node name, issue a long "mv" command (ok, you can create a script to make it easier... ;P) it thinks that the solution is broken.
I understand your argument "if you have to enter explicitly some commands, I'm sure you are sure you really want do it" but...
 
And why doing it manually makes it less dangerous?

You should not do it at all. Instead, the admin should bring up the other node and he is done.
We always suggest clusters with at least 3 nodes, so this scenario can be easily avoided.
This was our thinking why we did not implement that so far.

But I accept patches if you really want to but that on the GUI.