iSCSI and Realtime Failover

JakeIT

New Member
Mar 27, 2023
13
0
1
Dear All,

We have a number of older generation hosts which I will re-configure with Proxmox. I have been using it standalone with great success.

However I am in testing and noticing a problem - I may be working under a misunderstanding. I am hoping someone can assist - as the POC completion will result in more Proxmox Licensing and ultimately the support this entails.

I am able to use the iSCSI Volume to create LVM which is shared and I can Live Migrate in operation but I cannot shut down a node and a see the device appear as I would with HyperV / VMware HA.

What am I doing wrong?

In all my research It looks as if the real time Failover requires ZFS and replication.
Trying to turn on replication results in a 500 error because replication requires ZFS ?

ZFS over iSCSI is a seperate entity as I understand.

Is it possible to server up an iSCSI LUN from a third Party Device with or without MPIO and for that iSCSI LUN to be used for Real Time Failover so if a node fails it pops up on the other.

Can anyone tell if this is possible and what I may be doing wrong?

Or must I look at iSCSI over ZFS etc ?

Kind Regards

Jake
 
I am able to use the iSCSI Volume to create LVM which is shared and I can Live Migrate in operation
Sounds here like you have the configuration done correctly.
but I cannot shut down a node and a see the device appear as I would with HyperV / VMware HA.
Can you provide more details what exactly you are doing and expecting to happen. Have you configured your VMs to have HA attribute ?
the real time Failover requires ZFS and replication.
There are two concepts for Business Continuity : a) High Availability b) Replication
The shared storage+PVE HA provides the first part. The ZFS replication provides the second. The replication is scheduled, ie always behind. Whereas HA failover is supposed to be transparent.
You are correct that PVE replication is based on ZFS and as such requires that storage to be used.

Is it possible to server up an iSCSI LUN from a third Party Device with or without MPIO and for that iSCSI LUN to be used for Real Time Failover so if a node fails it pops up on the other.
Yes, its absolutely possible - thats the whole point of Shared Storage (iSCSI, NVMe/TCP, NFS or Ceph). It looked, at the beginning of your post, like you had it configured. Perhaps you just didnt setup HA relationship for VMs?

https://pve.proxmox.com/wiki/High_Availability_Cluster#Configure_VM_or_Containers_for_HA


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
Sounds here like you have the configuration done correctly.

Can you provide more details what exactly you are doing and expecting to happen. Have you configured your VMs to have HA attribute ?

There are two concepts for Business Continuity : a) High Availability b) Replication
The shared storage+PVE HA provides the first part. The ZFS replication provides the second. The replication is scheduled, ie always behind. Whereas HA failover is supposed to be transparent.
You are correct that PVE replication is based on ZFS and as such requires that storage to be used.


Yes, its absolutely possible - thats the whole point of Shared Storage (iSCSI, NVMe/TCP, NFS or Ceph). It looked, at the beginning of your post, like you had it configured. Perhaps you just didnt setup HA relationship for VMs?

https://pve.proxmox.com/wiki/High_Availability_Cluster#Configure_VM_or_Containers_for_HA


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
Hi and thanks for your reply.
I have been configuring and maintaining and support Microsoft and VMware Clusters for a good while, I hope I am missing a configuration step and have miss read or misunderstood someone else's video explanation.

I don't need to use the replication in the sense that I know replication between nodes.

I just need the Virtual Machine to fail over onto the second node (only two in test) when the owning node fails (stopped by me!).
I need this to be possible with iSCSI Shared storage as it is in the other two major platforms.

Is automatic failover possible without ZFS?

I am reading your link now :)

Thank you!
 
You only have two nodes? Then the remaining node won't have quorum when you shut down the other. So the problem might be with the general setup of the cluster and not with the storage per-se.

The 'minimum recommended' HA setup of PVE needs three hosts. If you only have two, you need a Qdevice, or else there won't be any majority in a failure scenario:
https://pve.proxmox.com/wiki/Cluster_Manager#_corosync_external_vote_support
 
  • Like
Reactions: JakeIT
You only have two nodes? Then the remaining node won't have quorum when you shut down the other. So the problem might be with the general setup of the cluster and not with the storage per-se.
THIS

Is automatic failover possible without ZFS?
First, I don't know, but I assume no. You normally would not want ZFS replication in a "real" cluster. The setup is more for the usecase of having two non-clustered servers.
 
THIS


First, I don't know, but I assume no. You normally would not want ZFS replication in a "real" cluster. The setup is more for the usecase of having two non-clustered servers.
Automatic failover is a pretty important function in Virt Clusters HyperV with iSCSI - just does it.

Hope I am missing something here
 
Automatic failover is a pretty important function in Virt Clusters HyperV with iSCSI - just does it.

Hope I am missing something here
Again (bbgeek17 already explained this): If you want a HA cluster, use a HA storage solution. ZFS does not qualify. Use CEPH or external LVM on top of iSCSI/FC, iSCSI directly or NFS.
 
I think we are talking at cross purposes.
I am using LVM on iSCSI and all the nodes can see it.
I've added a third node.
 
It works guys!!!!!

Thanks

There are a few instructional videos out there not done by Proxmox - they are misleading making me think a replication setup is required.
Its not.

Thanks
 
You normally would not want ZFS replication in a "real" cluster.
I am using it actively in production, so I need to respond ;-)

A real SAN with Multipath etc. is both expensive and complex. For me ZFS is absolutely a valid approach to get live migration and HA/failover.

Of course only if one can live with data loss depending on the replication interval if one Node dies without a warning - which is the case in my environment.

Just my 2 €¢, ymmv!
 
I am using it actively in production, so I need to respond ;-)

A real SAN with Multipath etc. is both expensive and complex. For me ZFS is absolutely a valid approach to get live migration and HA/failover.

Of course only if one can live with data loss depending on the replication interval if one Node dies without a warning - which is the case in my environment.

Just my 2 €¢, ymmv!
Perhaps there is a missing trick here.
Because a real SAN is at the heart of all the good enterprise systems I've worked on.
Because they are extremely resilient they perform insanely well, even loaded with Spinny Disks with the right connection a 2040 is rapid giving dozens of VM's SSD Like performance - and in combination with modern configs and connectivity they are unbelievably rapid .

I wouldn't sacrifice this for Live Migration in Proxmox - someone needs to convince Proxmox that live migration without ZFS in failover is supported in all the other main vendor options and should be supported in Proxmox - the fact its not live failover with LVM on iSCSI - is not a big issue in this environment
 
I must admit I am very confused about your concern @JakeIT. Especially since few hours earlier you said that everything worked for you. Can you please define exactly what you think is missing?

We provide block storage (SAN), our customers run Promox clusters 24x7x365. The reason they use us (among performance and PVE integration) is for shared storage. The connectivity protocol is iSCSI or NVMe/TCP, no ZFS is involved at any level. Live VM migration within a PVE cluster is just basic functionality.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
I must admit I am very confused about your concern @JakeIT. Especially since few hours earlier you said that everything worked for you. Can you please define exactly what you think is missing?

We provide block storage (SAN), our customers run Promox clusters 24x7x365. The reason they use us (among performance and PVE integration) is for shared storage. The connectivity protocol is iSCSI or NVMe/TCP, no ZFS is involved at any level. Live VM migration within a PVE cluster is just basic functionality.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
Live Failover on iSCSI with no ZFS.

On VMWare and HyperV with iSCSI served up ... Datastore or NTFS whatever -- the VM will drop a few pings when its owning host fails before it is running on the other machine ... this does not seem possible on Proxmox with LVM-iSCSI - ZFS seems the only way.

My failover works but the machine is started as it goes to the other device. Not a big issue. Can still work with this.
 
On VMWare and HyperV with iSCSI served up ... Datastore or NTFS whatever -- the VM will drop a few pings when its owning host fails before it is running on the other machine ... this does not seem possible on Proxmox with LVM-iSCSI - ZFS seems the only way.
I am scratching my head at this.

ZFS is not a clusterable file system (at least not without major non supported genuflections.) you CANT do what you describe with ZFS.

If you have LVM on iSCSI, it works precisely as you describe- the technique is essentially the same with vsphere or hyperv. no one is reinventing the wheel here.
 
I think what @JakeIT is asking about is "vSphere Fault Tolerance":
Code:
Fault Tolerance provides a higher level of business continuity than vSphere HA.
When a Secondary VM is called upon to replace its Primary VM counterpart, the Secondary VM
immediately takes over the Primary VM’s role with the entire state of the virtual machine
preserved. Applications are already running, and data stored in memory does not need to be
reentered or reloaded. Failover provided by vSphere HA restarts the virtual machines affected
by a failure.

There is no currently equivalent in Proxmox, see:
https://forum.proxmox.com/threads/proxmox-setup-with-faulttolerance-for-zero-downtime.26440

P.S. @JakeIT if you can identify which specific resource gave you impression that ZFS is somehow providing such functionality - dont use that resource any more :)

P.P.S. To expand a bit further:
PVE Live Migration is similar to vSphere HA. The memory state of the VM is available and transferred in a way to preserve VM state in orderly fashion.
vSphere FT creates and maintains a secondary standy VM at all times with a real time copy of memory state of the Primary. If/when the primary host fails - the data is already there. Until the COLO feature is baked in, when the PVE node fails - the state is gone, so VM has to be restarted.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
Last edited:
AHH gotcha. ty for the clarification @bbgeek17

Honestly if that is the goal, I wouldn't be designing the HA at the hypervisor level as it is limiting- do it at the application level with proper multitarget load balancing (and then you can actually be fault tolerant across more than just cluster level and not be dependent on vendor specific solutions.)
 
Honestly if that is the goal, I wouldn't be designing the HA at the hypervisor level as it is limiting- do it at the application level
Absolutely, application level HA is always the most comprehensive. But FT has its place, especially for legacy apps. Its certainly a differentiator in a hypervisor market, where there is rarely a one to one comparison.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
I think what @JakeIT is asking about is "vSphere Fault Tolerance":
Code:
Fault Tolerance provides a higher level of business continuity than vSphere HA.
When a Secondary VM is called upon to replace its Primary VM counterpart, the Secondary VM
immediately takes over the Primary VM’s role with the entire state of the virtual machine
preserved. Applications are already running, and data stored in memory does not need to be
reentered or reloaded. Failover provided by vSphere HA restarts the virtual machines affected
by a failure.

There is no currently equivalent in Proxmox, see:
https://forum.proxmox.com/threads/proxmox-setup-with-faulttolerance-for-zero-downtime.26440

P.S. @JakeIT if you can identify which specific resource gave you impression that ZFS is somehow providing such functionality - dont use that resource any more :)

P.P.S. To expand a bit further:
PVE Live Migration is similar to vSphere HA. The memory state of the VM is available and transferred in a way to preserve VM state in orderly fashion.
vSphere FT creates and maintains a secondary standy VM at all times with a real time copy of memory state of the Primary. If/when the primary host fails - the data is already there. Until the COLO feature is baked in, when the PVE node fails - the state is gone, so VM has to be restarted.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
I will find the resource link that lead me astray - its not a promox link...

Don't worry all ... this is working sufficiently for this dev ops role!
 
But FT has its place, especially for legacy apps. Its certainly a differentiator in a hypervisor market, where there is rarely a one to one comparison.
hmm. sorry if I'm hijacking the thread going down a rabbit hole... I think this CAN be accomplished with proxmox (at least it would make for an interesting POC if I had the time or application)

1. create a VM with a boot disk and a payload disk. boot disk should be relatively immutable.
2. copy the vmid.conf to another host changing the vmid BUT NOT THE PAYLOAD DISK.
3. with the initial VM OFF, start the clone and SUSPEND it.
4. restart the initial VM.
5. create a watchdog service (number of ways to do it) which stonith's the original, and thaws the clone

may also need to have a cluster IP to avoid mac address handoff issues. it wouldnt be exactly real time FT because there is no memory sync, but it may get you a few points closer to another 9 if you have an application that can function this way :)

ok back to the original discussion. ignore me :)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!