high availability - best practise

Jan 15, 2018
7
0
1
46
Hi,

we have cluster with 6 nodes, and gluster (ext4) with pairwise replication this nodes.

What is best way to have high availability mode, without reboot any machine.

I'dont really unsterand the HA- concept. HA means no downtime ?

Bests,
Markus
 
On Xen.
Remus provides transparent high availability to ordinary virtual machines running on Xen. It does this by continually live migrating a copy of a running VM to a backup server, which automatically activates if the primary server fails.
The other one I know of is on VMware.
 
In my experience, in a fault case, the vm spawned (moved) on other node, after this the vm boot automatically
How can I prevent a boot ?
This is contrary to the HA feature. You can manually move the vmid.conf file under /etc/pve/nodes/<NODE>/qemu-server/ to the right node, on failure of the original node (no running VM). Then the VM appears and can be startet on that node. OFC, it makes only sense with a shared storage.

The desired state is "started" .
You can find the states here: https://pve.proxmox.com/pve-docs/chapter-ha-manager.html#ha_manager_resource_config

Does Promox support "fault tolerance model" ?
No

And how can i implement this ?
As one of the users wrote, it is better to implemented it a level higher, on the VM(s) itself. This saves resources (no RAM mirror) and makes a move between hardware easier, as you can move the VM(s) to a node outside of the cluster (or to a new cluster).
 
  • Like
Reactions: Markus Wolff
This is contrary to the HA feature.

Correct that is no HA. I described our situation. if I enable the HA on a VM and power off the node.
The VM will appears on other node. I think that is correct, but it takes long downtime and the vm is booting on the other node. We use gluster as a shared storage.

Our aim is to have almost no downtime, may you help us to solve this Issue.
 
Every node writes its current timestamp to the cluster filesystem and in turn, not to have false-positiv states (where a node is seen as offline by the rest of the cluster but actually isn't), it needs some time to establish this consensus. Only then the VM is considered offline and the resource will be moved to a different machine. Plus on top the boot time of the machine itself.

To really minimize the downtime (or close to zero downtime), a fault tolerance mode on application level, inside a VM is necessary. But this is out of scope.
 
  • Like
Reactions: Markus Wolff

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!