iSCSI and Realtime Failover

bbgeek17 · Apr 27, 2023

alexskysilk said:
5. create a watchdog service (number of ways to do it) which stonith's the original, and thaws the clone

I think the chance of data inconsistency between what is on disk and what the suspended VM/app thinks should be there is too high, which could lead to data corruption. The safe approach is to restart the App and flush any disk cache on the VM. At that point a fresh start is just more reliable.

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

LnxBil · May 1, 2023

Maybe I don't really get the point, but as I mentioned in this comment, you will have failover (if you configured your VM with HA in PVE) as in the VM gets started automatically after one node goes down with the SAME data at the last flushed-to-disk state and are up an running again in a few minutes (depending on the boot time of your VM). This is what HA means in PVE with a dedicated shared storage (FC, iSCSI, NFS) or distributed shared storage (CEPH, GlusterFS). ZFS is NOT ABLE to provide this.

Fault Tolerance in QEMU (COLO) is kind of proof-of-concept for years and very similar to vSphere Fault Tolerance as @bbgeek17 already mentioned. Even in VMware, this is not possible for a lot of VMs due to the IMENSE overhead it creates.

iSCSI and Realtime Failover

bbgeek17

Distinguished Member

LnxBil

Distinguished Member

We value your privacy