iSCSI and Realtime Failover

5. create a watchdog service (number of ways to do it) which stonith's the original, and thaws the clone
I think the chance of data inconsistency between what is on disk and what the suspended VM/app thinks should be there is too high, which could lead to data corruption. The safe approach is to restart the App and flush any disk cache on the VM. At that point a fresh start is just more reliable.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
Maybe I don't really get the point, but as I mentioned in this comment, you will have failover (if you configured your VM with HA in PVE) as in the VM gets started automatically after one node goes down with the SAME data at the last flushed-to-disk state and are up an running again in a few minutes (depending on the boot time of your VM). This is what HA means in PVE with a dedicated shared storage (FC, iSCSI, NFS) or distributed shared storage (CEPH, GlusterFS). ZFS is NOT ABLE to provide this.

Fault Tolerance in QEMU (COLO) is kind of proof-of-concept for years and very similar to vSphere Fault Tolerance as @bbgeek17 already mentioned. Even in VMware, this is not possible for a lot of VMs due to the IMENSE overhead it creates.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!