iSCSI and Realtime Failover

JakeIT · Apr 26, 2023

Dear All,

We have a number of older generation hosts which I will re-configure with Proxmox. I have been using it standalone with great success.

However I am in testing and noticing a problem - I may be working under a misunderstanding. I am hoping someone can assist - as the POC completion will result in more Proxmox Licensing and ultimately the support this entails.

I am able to use the iSCSI Volume to create LVM which is shared and I can Live Migrate in operation but I cannot shut down a node and a see the device appear as I would with HyperV / VMware HA.

What am I doing wrong?

In all my research It looks as if the real time Failover requires ZFS and replication.
Trying to turn on replication results in a 500 error because replication requires ZFS ?

ZFS over iSCSI is a seperate entity as I understand.

Is it possible to server up an iSCSI LUN from a third Party Device with or without MPIO and for that iSCSI LUN to be used for Real Time Failover so if a node fails it pops up on the other.

Can anyone tell if this is possible and what I may be doing wrong?

Or must I look at iSCSI over ZFS etc ?

Kind Regards

Jake

bbgeek17 · Apr 26, 2023

JakeIT said:
I am able to use the iSCSI Volume to create LVM which is shared and I can Live Migrate in operation

Sounds here like you have the configuration done correctly.

JakeIT said:
but I cannot shut down a node and a see the device appear as I would with HyperV / VMware HA.

Can you provide more details what exactly you are doing and expecting to happen. Have you configured your VMs to have HA attribute ?

JakeIT said:
the real time Failover requires ZFS and replication.

There are two concepts for Business Continuity : a) High Availability b) Replication
The shared storage+PVE HA provides the first part. The ZFS replication provides the second. The replication is scheduled, ie always behind. Whereas HA failover is supposed to be transparent.
You are correct that PVE replication is based on ZFS and as such requires that storage to be used.

JakeIT said:
Is it possible to server up an iSCSI LUN from a third Party Device with or without MPIO and for that iSCSI LUN to be used for Real Time Failover so if a node fails it pops up on the other.

Yes, its absolutely possible - thats the whole point of Shared Storage (iSCSI, NVMe/TCP, NFS or Ceph). It looked, at the beginning of your post, like you had it configured. Perhaps you just didnt setup HA relationship for VMs?

https://pve.proxmox.com/wiki/High_Availability_Cluster#Configure_VM_or_Containers_for_HA

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

JakeIT · Apr 27, 2023

bbgeek17 said:
Sounds here like you have the configuration done correctly.

Can you provide more details what exactly you are doing and expecting to happen. Have you configured your VMs to have HA attribute ?

There are two concepts for Business Continuity : a) High Availability b) Replication
The shared storage+PVE HA provides the first part. The ZFS replication provides the second. The replication is scheduled, ie always behind. Whereas HA failover is supposed to be transparent.
You are correct that PVE replication is based on ZFS and as such requires that storage to be used.

Yes, its absolutely possible - thats the whole point of Shared Storage (iSCSI, NVMe/TCP, NFS or Ceph). It looked, at the beginning of your post, like you had it configured. Perhaps you just didnt setup HA relationship for VMs?

https://pve.proxmox.com/wiki/High_Availability_Cluster#Configure_VM_or_Containers_for_HA

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

Hi and thanks for your reply.
I have been configuring and maintaining and support Microsoft and VMware Clusters for a good while, I hope I am missing a configuration step and have miss read or misunderstood someone else's video explanation.

I don't need to use the replication in the sense that I know replication between nodes.

I just need the Virtual Machine to fail over onto the second node (only two in test) when the owning node fails (stopped by me!).
I need this to be possible with iSCSI Shared storage as it is in the other two major platforms.

Is automatic failover possible without ZFS?

I am reading your link now

Thank you!

B.Otto · Apr 27, 2023

You only have two nodes? Then the remaining node won't have quorum when you shut down the other. So the problem might be with the general setup of the cluster and not with the storage per-se.

The 'minimum recommended' HA setup of PVE needs three hosts. If you only have two, you need a Qdevice, or else there won't be any majority in a failure scenario:
https://pve.proxmox.com/wiki/Cluster_Manager#_corosync_external_vote_support

LnxBil · Apr 27, 2023

B.Otto said:
You only have two nodes? Then the remaining node won't have quorum when you shut down the other. So the problem might be with the general setup of the cluster and not with the storage per-se.

THIS

JakeIT said:
Is automatic failover possible without ZFS?

First, I don't know, but I assume no. You normally would not want ZFS replication in a "real" cluster. The setup is more for the usecase of having two non-clustered servers.

JakeIT · Apr 27, 2023

LnxBil said:
THIS

First, I don't know, but I assume no. You normally would not want ZFS replication in a "real" cluster. The setup is more for the usecase of having two non-clustered servers.

Automatic failover is a pretty important function in Virt Clusters HyperV with iSCSI - just does it.

Hope I am missing something here

LnxBil · Apr 27, 2023

JakeIT said:
Automatic failover is a pretty important function in Virt Clusters HyperV with iSCSI - just does it.

Hope I am missing something here

Again (bbgeek17 already explained this): If you want a HA cluster, use a HA storage solution. ZFS does not qualify. Use CEPH or external LVM on top of iSCSI/FC, iSCSI directly or NFS.

JakeIT · Apr 27, 2023

I think we are talking at cross purposes.
I am using LVM on iSCSI and all the nodes can see it.
I've added a third node.

JakeIT · Apr 27, 2023

It works guys!!!!!

Thanks

There are a few instructional videos out there not done by Proxmox - they are misleading making me think a replication setup is required.
Its not.

Thanks

LnxBil · Apr 27, 2023

JakeIT said:
There are a few instructional videos out there not done by Proxmox - they are misleading making me think a replication setup is required.
Its not.

Yes, not everyone making video should make them

UdoB · Apr 27, 2023

LnxBil said:
You normally would not want ZFS replication in a "real" cluster.

I am using it actively in production, so I need to respond ;-)

A real SAN with Multipath etc. is both expensive and complex. For me ZFS is absolutely a valid approach to get live migration and HA/failover.

Of course only if one can live with data loss depending on the replication interval if one Node dies without a warning - which is the case in my environment.

Just my 2 €¢, ymmv!

JakeIT · Apr 27, 2023

UdoB said:
I am using it actively in production, so I need to respond ;-)

A real SAN with Multipath etc. is both expensive and complex. For me ZFS is absolutely a valid approach to get live migration and HA/failover.

Of course only if one can live with data loss depending on the replication interval if one Node dies without a warning - which is the case in my environment.

Just my 2 €¢, ymmv!

Perhaps there is a missing trick here.
Because a real SAN is at the heart of all the good enterprise systems I've worked on.
Because they are extremely resilient they perform insanely well, even loaded with Spinny Disks with the right connection a 2040 is rapid giving dozens of VM's SSD Like performance - and in combination with modern configs and connectivity they are unbelievably rapid .

I wouldn't sacrifice this for Live Migration in Proxmox - someone needs to convince Proxmox that live migration without ZFS in failover is supported in all the other main vendor options and should be supported in Proxmox - the fact its not live failover with LVM on iSCSI - is not a big issue in this environment

bbgeek17 · Apr 27, 2023

I must admit I am very confused about your concern @JakeIT. Especially since few hours earlier you said that everything worked for you. Can you please define exactly what you think is missing?

We provide block storage (SAN), our customers run Promox clusters 24x7x365. The reason they use us (among performance and PVE integration) is for shared storage. The connectivity protocol is iSCSI or NVMe/TCP, no ZFS is involved at any level. Live VM migration within a PVE cluster is just basic functionality.

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

JakeIT · Apr 27, 2023

bbgeek17 said:
I must admit I am very confused about your concern @JakeIT. Especially since few hours earlier you said that everything worked for you. Can you please define exactly what you think is missing?

We provide block storage (SAN), our customers run Promox clusters 24x7x365. The reason they use us (among performance and PVE integration) is for shared storage. The connectivity protocol is iSCSI or NVMe/TCP, no ZFS is involved at any level. Live VM migration within a PVE cluster is just basic functionality.

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

Live Failover on iSCSI with no ZFS.

On VMWare and HyperV with iSCSI served up ... Datastore or NTFS whatever -- the VM will drop a few pings when its owning host fails before it is running on the other machine ... this does not seem possible on Proxmox with LVM-iSCSI - ZFS seems the only way.

My failover works but the machine is started as it goes to the other device. Not a big issue. Can still work with this.

alexskysilk · Apr 27, 2023

JakeIT said:
On VMWare and HyperV with iSCSI served up ... Datastore or NTFS whatever -- the VM will drop a few pings when its owning host fails before it is running on the other machine ... this does not seem possible on Proxmox with LVM-iSCSI - ZFS seems the only way.

I am scratching my head at this.

ZFS is not a clusterable file system (at least not without major non supported genuflections.) you CANT do what you describe with ZFS.

If you have LVM on iSCSI, it works precisely as you describe- the technique is essentially the same with vsphere or hyperv. no one is reinventing the wheel here.

bbgeek17 · Apr 27, 2023

I think what @JakeIT is asking about is "vSphere Fault Tolerance":

Code:

Fault Tolerance provides a higher level of business continuity than vSphere HA.
When a Secondary VM is called upon to replace its Primary VM counterpart, the Secondary VM
immediately takes over the Primary VM’s role with the entire state of the virtual machine
preserved. Applications are already running, and data stored in memory does not need to be
reentered or reloaded. Failover provided by vSphere HA restarts the virtual machines affected
by a failure.

There is no currently equivalent in Proxmox, see:
https://forum.proxmox.com/threads/proxmox-setup-with-faulttolerance-for-zero-downtime.26440

P.S. @JakeIT if you can identify which specific resource gave you impression that ZFS is somehow providing such functionality - dont use that resource any more

P.P.S. To expand a bit further:
PVE Live Migration is similar to vSphere HA. The memory state of the VM is available and transferred in a way to preserve VM state in orderly fashion.
vSphere FT creates and maintains a secondary standy VM at all times with a real time copy of memory state of the Primary. If/when the primary host fails - the data is already there. Until the COLO feature is baked in, when the PVE node fails - the state is gone, so VM has to be restarted.

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

alexskysilk · Apr 27, 2023

AHH gotcha. ty for the clarification @bbgeek17

Honestly if that is the goal, I wouldn't be designing the HA at the hypervisor level as it is limiting- do it at the application level with proper multitarget load balancing (and then you can actually be fault tolerant across more than just cluster level and not be dependent on vendor specific solutions.)

bbgeek17 · Apr 27, 2023

alexskysilk said:
Honestly if that is the goal, I wouldn't be designing the HA at the hypervisor level as it is limiting- do it at the application level

Absolutely, application level HA is always the most comprehensive. But FT has its place, especially for legacy apps. Its certainly a differentiator in a hypervisor market, where there is rarely a one to one comparison.

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

JakeIT · Apr 27, 2023

bbgeek17 said:
I think what @JakeIT is asking about is "vSphere Fault Tolerance":

Code:

Fault Tolerance provides a higher level of business continuity than vSphere HA. When a Secondary VM is called upon to replace its Primary VM counterpart, the Secondary VM immediately takes over the Primary VM’s role with the entire state of the virtual machine preserved. Applications are already running, and data stored in memory does not need to be reentered or reloaded. Failover provided by vSphere HA restarts the virtual machines affected by a failure.

There is no currently equivalent in Proxmox, see:
https://forum.proxmox.com/threads/proxmox-setup-with-faulttolerance-for-zero-downtime.26440

P.S. @JakeIT if you can identify which specific resource gave you impression that ZFS is somehow providing such functionality - dont use that resource any more

P.P.S. To expand a bit further:
PVE Live Migration is similar to vSphere HA. The memory state of the VM is available and transferred in a way to preserve VM state in orderly fashion.
vSphere FT creates and maintains a secondary standy VM at all times with a real time copy of memory state of the Primary. If/when the primary host fails - the data is already there. Until the COLO feature is baked in, when the PVE node fails - the state is gone, so VM has to be restarted.

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

I will find the resource link that lead me astray - its not a promox link...

Don't worry all ... this is working sufficiently for this dev ops role!

alexskysilk · Apr 27, 2023

bbgeek17 said:
But FT has its place, especially for legacy apps. Its certainly a differentiator in a hypervisor market, where there is rarely a one to one comparison.

hmm. sorry if I'm hijacking the thread going down a rabbit hole... I think this CAN be accomplished with proxmox (at least it would make for an interesting POC if I had the time or application)

1. create a VM with a boot disk and a payload disk. boot disk should be relatively immutable.
2. copy the vmid.conf to another host changing the vmid BUT NOT THE PAYLOAD DISK.
3. with the initial VM OFF, start the clone and SUSPEND it.
4. restart the initial VM.
5. create a watchdog service (number of ways to do it) which stonith's the original, and thaws the clone

may also need to have a cluster IP to avoid mac address handoff issues. it wouldnt be exactly real time FT because there is no memory sync, but it may get you a few points closer to another 9 if you have an application that can function this way

ok back to the original discussion. ignore me

iSCSI and Realtime Failover

New Member

Distinguished Member

New Member

Active Member

Distinguished Member

New Member

Distinguished Member

New Member

New Member

Distinguished Member

Distinguished Member

New Member

Distinguished Member

New Member

Distinguished Member

Distinguished Member

Distinguished Member

Distinguished Member

New Member

Distinguished Member