[TUTORIAL] PoC 2 Node HA Cluster with Shared iSCSI GFS2

So if you would have such a setup you would use - inline storage
I think OP is likely in a situation where their hypervisor hosts do not have significant internal storage. They, or their customers, already own an entry- to mid-level SAN solution and want to maximize ROI from it. Typically, people in this scenario are not looking to invest additional funds in expanding internal host capacity, or in replacing hypervisor hosts just to add more storage.

Of course, if the existing hosts or SAN have reached the end of their expected useful life, it would be prudent to consider investing in new hardware.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
Last edited:
  • Like
Reactions: Johannes S
I think OP is likely in a situation where their hypervisor hosts do not have significant internal storage. They, or their customers, already own an entry- to mid-level SAN solution and want to maximize ROI from it. Typically, people in this scenario are not looking to invest additional funds in expanding internal host capacity, or in replacing hypervisor hosts just to add more storage.

Of course, if the existing hosts or SAN have reached the end of their expected useful life, it would be prudent to consider investing in new hardware.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

Ok, that did not directly come to mind to be honest, but it is very understandable at the moment.
Our customers when migrating away from Hyper-V or VMware environments usually do this when they need to replace their hardware after 5-8 years.

In the case of VMware I do understand, since Broadcom took over more and more people want to get away from their licensing policy.
When using DAC storage on for example PVE01 & PVE02 the solution I suggested would be a valid one, even with iSCSI it would still be doable.

And in case of reworking the environment, even if the customer does not want to replace hardware, you can't just attach a SAN to Proxmox and dump a GFS2 layer on top of it. That does not seem very logical either, because you would still need to move data back and forth to make that work.

I was also recommending this because you would not need GFS2. While reading the topic it seemed there were already some issues around it, and Proxmox itself also does not appear to officially support it (at least from what I could quickly find).

As far as I could look up quickly GFS2 depends heavily on quorum, and let's just say a 2-node cluster is the worst architecture for that unless you add a small quorum device. Also there might be an issue with the locking mechanism (GFS2 has one, but does Proxmox actually play along with it?). In that case it could theoretically be possible for VM1 to be active on both PVE01 and PVE02 writing to the same .qcow2 file when GFS2 is mounted as a directory. I would strongly advise testing that scenario, because if that happens it would result in instant file corruption.

Furthermore enable_fencing=0 means fencing is disabled. Why disable it? In clustered storage you absolutely want to isolate or fence a node that has gone rogue or lost cluster communication.

I honestly do think you will need a quorum device to prevent unwanted split-brain situations, especially with storage and even more so in a mission-critical environment.

I don't want to dismiss the idea outright, because I am always open to new ideas, but I do have some concerns in this case. If everything is up and running everything will work fine, but what if it does go wrong? Will the environment still behave in a predictable and stable way?
 
Last edited:
was also recommending this because you would not need GFS2. While reading the topic it seemed there were already some issues around it, and Proxmox itself also does not appear to officially support it (at least from what I could quickly find).
As our customers are businesses and enterprises, I'd never recommend to them to use unsupported technology combination. That said, necessity is the mother of invention/adaptation.
As far as I could look up quickly GFS2 depends heavily on quorum, and let's just say a 2-node cluster is the worst architecture for that unless you add a small quorum device.
The minimum requirements of GFS2 do not negate the minimum requirements of PVE, i.e. odd-numbered cluster with at least QDevice as 3rd cluster member.
In that case it could theoretically be possible for VM1 to be active on both PVE01 and PVE02 writing to the same .qcow2 file when GFS2 is mounted as a directory. I would strongly advise testing that scenario, because if that happens it would result in instant file corruption.
There is no need to test this as it would never happen in properly operational PVE cluster. A VM can only run on one PVE node at a time and those disks are exclusively used by that VM and its parent node.
I don't want to dismiss the idea outright, because I am always open to new ideas, but I do have some concerns in this case. If everything is up and running everything will work fine, but what if it does go wrong? Will the environment still behave in a predictable and stable way?
A user's tutorial posted in a volunteer-oriented forum is certainly not a badge of approval or endorsement by the Proxmox GmbH. It describes one of many technologies available for Linux that someone might use. As you pointed out, there were already reports in this forum of technical errors. Given the state of development of GFS2 (RedHat no longer actively develops it), one should certainly think 5 times before putting it in production.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
  • Like
Reactions: tscret
Ok, that did not directly come to mind to be honest, but it is very understandable at the moment.
Our customers when migrating away from Hyper-V or VMware environments usually do this when they need to replace their hardware after 5-8 years.

In the case of VMware I do understand, since Broadcom took over more and more people want to get away from their licensing policy.
When using DAC storage on for example PVE01 & PVE02 the solution I suggested would be a valid one, even with iSCSI it would still be doable.

And in case of reworking the environment, even if the customer does not want to replace hardware, you can't just attach a SAN to Proxmox and dump a GFS2 layer on top of it. That does not seem very logical either, because you would still need to move data back and forth to make that work.

I was also recommending this because you would not need GFS2. While reading the topic it seemed there were already some issues around it, and Proxmox itself also does not appear to officially support it (at least from what I could quickly find).

As far as I could look up quickly GFS2 depends heavily on quorum, and let's just say a 2-node cluster is the worst architecture for that unless you add a small quorum device. Also there might be an issue with the locking mechanism (GFS2 has one, but does Proxmox actually play along with it?). In that case it could theoretically be possible for VM1 to be active on both PVE01 and PVE02 writing to the same .qcow2 file when GFS2 is mounted as a directory. I would strongly advise testing that scenario, because if that happens it would result in instant file corruption.

Furthermore enable_fencing=0 means fencing is disabled. Why disable it? In clustered storage you absolutely want to isolate or fence a node that has gone rogue or lost cluster communication.

I honestly do think you will need a quorum device to prevent unwanted split-brain situations, especially with storage and even more so in a mission-critical environment.

I don't want to dismiss the idea outright, because I am always open to new ideas, but I do have some concerns in this case. If everything is up and running everything will work fine, but what if it does go wrong? Will the environment still behave in a predictable and stable way?
Thanks for your points. While GFS2 technically works, I would not recommend it for production use. This setup was only a PoC to demonstrate what could be achieved.

We also added PBS as a qdevice to provide proper quorum handling for HA.

For very small environments, I would recommend a 2-node setup with an NFS-capable device that can also serve as the qdevice. This works well for deployments with very limited budgets.

Why NFS instead of DRBD or Ceph? Mainly because of storage efficiency. The thin-provisioning capability is also the reason why I did not use LVM over iSCSI.

I also experimented with ZFS over iSCSI, which works quite well, but it required more setup effort than a simple NFS-based solution.
 
  • Like
Reactions: bbgeek17