Cluster node failure case

obrador

New Member
Dec 27, 2011
9
0
1
Hi, I've set up a 3 node cluster with all VM KVM virtualized and stored on a raid1 NAS. Everything works fine, migrating VM from one host to another, etc.

The question now is, what happens if a physical node fails? how to migrate all running VM on that node to another living node? I tried to migrate a VM from an stopped node and it says you can't because host is not running. So why should you maintain a cluster if you don't have a simple way to migrate VM on failed node to others? Or maybe an easier to ask question (maybe difficult to answer): what to do with VM when a physical node fails?

I imagine you can achieve this by configuring HA, but I really don't have a reliable network hardware, so I don't think it should be a good idea.

Many thanks in advance, and thanks for the good job!
Jaume.
 
See http://pve.proxmox.com/wiki/High_Availability_Cluster

if you do not use this, you have to do it manually.

I mean, you need to make sure that the dead node is really off (to prevent that the VM runs on two nodes and corrupts your data).
e.g. you go to the dead server and manually unplug the power. the node should never go up again. this is very important.

then, go to a remaining node and move the VM config files to right node and start the VM.
 
Thanks Tom. There's not an "easy" way to do that through the web interface, for example? Is there any place where "how to move config files" is documented?

Regards,
Jaume.
 
again - its already fully automated - see our HA article.

if you allow starting VM´s on other nodes via GUI - without fencing - all users will just do it without thinking about the consequences. If you know what you are doing, CLI is easy. If you do not know what you are doing, its better to stop here and ask someone with more experience to help.

e.g. the probably dead server powers up again (e.g. due to short power fault), VM´s will start up and you have it running on two nodes - all your data is destroyed - so I am pretty sure that no-one want this.
 
Hi Tom, thanks a lot.

We are an school, quite small organization, our 3 servers are managed by only one or tho administrators, very kind them. Also we'll make absolutely sure the failed node is completely broken, so we can remove it from the cluster and maybe add later, when we have repaired it. But we have some critical services on some VM so we'll want to make sure we'll be able to restore it's funcion when a host dies.

So we'll be absolutely sure of what we are doing, ans also I imagine there's no way to do it via GUI. Is there any place where how to move VM from a failed node to a living one? I searched the wiki with no success.

Many thanks,
Jaume.
 
/etc/pve/nodes contains a folder for each configured node in the cluster. Eg a 3 node cluster with 3 vm's one on each node:

/etc/pve/nodes/node1/qemu-server
100.conf
/etc/pve/nodes/node2/qemu-server
101.conf
/etc/pve/nodes/node3/qemu-server
102.conf

If node1 is down, as in dead, you can start vm-100 on any of the other servers by loging on to one of the remaining servers and move the conf file to this server. Eg on node2 you would do: mv /etc/pve/nodes/node1/qemu-server/100.conf /etc/pve/nodes/node2/qemu-server/100.conf. After this you should be able to start vm-100 from the gui.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!