Pve high availability consultation

skyshe

New Member
Feb 24, 2023
10
2
1
Can the high availability of PVE achieve VM migration without downtime?
thank you
 
Hello,

do you mean that the VM keeps running without interruption in the case that the server hosting the VM crashes, similar to VMwares 'Fault-Tolerance'? No, that is not possible.

What does work is that the VM automatically gets (re-)started on a new host. This means that there is a slight interruption (a few minutes) and the VM will interpret it as a crash. This works in the same way as VMwares 'High-Availability'.

This, of course, assumes that you have a properly configured HA-cluster with shared storage (like using CEPH).

Kind regards,
Benedolt
 
Hello,

do you mean that the VM keeps running without interruption in the case that the server hosting the VM crashes, similar to VMwares 'Fault-Tolerance'? No, that is not possible.

What does work is that the VM automatically gets (re-)started on a new host. This means that there is a slight interruption (a few minutes) and the VM will interpret it as a crash. This works in the same way as VMwares 'High-Availability'.

This, of course, assumes that you have a properly configured HA-cluster with shared storage (like using CEPH).

Kind regards,
Benedolt
yes,

In the case of shared or distributed storage, can VM be migrated without downtime?

Because every VM migration will be in the state after restart,

Because this is equivalent to a VM restart, which is contrary to the true concept of high availability.
 
yes,

In the case of shared or distributed storage, can VM be migrated without downtime?

Because every VM migration will be in the state after restart,

Because this is equivalent to a VM restart, which is contrary to the true concept of high availability.
Live migration is possible between nodes in the same cluster, so the VM won't notice that it has been moved. This does not even require a shared storage, but without it the migration will take a long time (copying the whole hard disk to another node).
 
To be clear, there are two "concepts":

- "unplanned High Availability" - in case of a powerloss, hardware problems, whatever.
Because the Host crashed, all VMs crashed, too, and need to be restarted after the host is back up, or, if shared storage is used, power on the crashed VMs on another node. Proxmox offers this.

VMware has a special feature that B.Otto mentioned above - "Fault-Tolerance". A VM is running on two hosts in a mirrored way, so when one host crashes, the mirrored VM on the other host keeps running = no downtime even when a host crashes. I've never seen any normal company use this because of its downsides.

- "planned" High Availability - for hardware maintenance purposes, Proxmox patching/reboots, etc.
This is where live migration between hosts comes into play (equivalent to VMware vMotion) - with shared storage so only the VM (RAM, running Operating System) needs to be moved to another host. This works without downtime of the running VM. Proxmox offers this.
 
Last edited:
VMware has a special feature that B.Otto mentioned above - "Fault-Tolerance". A VM is running on two hosts in a mirrored way, so when one host crashes, the mirrored VM on the other host keeps running = no downtime even when a host crashes. I've never seen any normal company use this because of its downsides.

in Qemu land, this is called "COLO" - https://wiki.qemu.org/Features/COLO , and is not integrated in PVE In any way. the use case is rather limited, and the resource requirements and possible pitfalls are huge.
 
  • Like
Reactions: Zerstoiber
I tested and found that I used ZFS storage and CEPH storage to migrate the VM manually. I found that the online time of the VM would return to zero, and the network would also have packet loss jitter, indicating that the VM is still sensitive, which is unacceptable for some business-sensitive VMs.


Can I improve it?

1677726823864.png

1677726902146.png
 
  • Like
Reactions: bubbles