Shared LVM on iSCSI: how safe is it?

MaxGashkov

Member
Mar 12, 2017
4
0
6
40
I have a simple home lab setup with 2 Proxmox 6.3 nodes in cluster configuration and NAS with iSCSI support. I've allocated LUN and attached it as LVM VG on both nodes, using pve's lvm plugin with 'shared' option enabled. I can see the storage on both nodes, create VMs with disks located on it and so on — so far, no problem.

I've had a discussion with my friend who is more experienced sysadmin then I am, and he's sure that this setup will lead to a data loss due to corruption and race conditions upon writing to a single, non-cluster-aware storage.

As far as I see, the corruption is possible if multiple nodes will change LVM layout nearly simultaneously, e.g. when creating new VMs or resizing disks.

Does Proxmox employ any safeguards against that? Like serializing LVM layout changes in queue or some other form of locking? Is it documented somewhere in detail?
I've checked usual places:
https://pve.proxmox.com/wiki/Storage
https://pve.proxmox.com/wiki/Storage:_iSCSI
https://pve.proxmox.com/wiki/Storage:_LVM

There are only few one-line mentions of this mode and risks/failure modes are not discussed at all.

Can anyone point me in direction to educate myself further on this matter?
 
Last edited:
Shared LVM over iSCSI might have its drawbacks (mainly, no thin provisioning and no snapshots), but I wouldn't worry about its production ready status. Probably one of the most solid storage option
 
Shared LVM over iSCSI might have its drawbacks (mainly, no thin provisioning and no snapshots), but I wouldn't worry about its production ready status. Probably one of the most solid storage option
Great, thanks. But why no thin provisioning? LVM accounting done locally on node?
 
Thin LVM (which allows thin prov and snapshots) is very different from thick LVM, and can't be shared by nature. So only thick is available if you want to share it between nodes
 
With Thick LVM if you create a 1GB slice, a record is made in the LVM metadata section that, for example, sectors from 1 to 1000 are used by that slice.
The other nodes will learn about it, however its not a completely transparent process. PVE actually forces a cache flush on all nodes constantly to learn about metadata changes, and uses a global cluster configuration lock to prevent other nodes from claiming the sectors at the same time.

If you attempt to use LVM as shared storage manager without PVE - there is a significant chance of data corruption, especially if LVM management is executed from more than one location.

With Thin LVM there is no space reservation, it is after all thin. The sectors are allocated/written inline of IO. As you can imagine it would be impossible to coordinate this across multiple nodes with a system that was not designed for cluster usage (LVM thin).


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
If you attempt to use LVM as shared storage manager without PVE - there is a significant chance of data corruption, especially if LVM management is executed from more than one location.
True if you don't use cluster LVM, which was in previously PVE version required, now it is handled like you described with the limitations that ANY metadata-changing operation has to go through the PVE API in order to be available on all nodes.
 
  • Like
Reactions: Johannes S
This mentions it's safe in a cluster due to cluster level locking.

However, is it safe between two clusters both sharing the same LVM on iscsi? (common use case in our vmware environment, vmware uses iscsi locks)

If not always safe, is it safe if you make sure no adding or deleting of vms are executed concurrently on the two different clusters?
 
This mentions it's safe in a cluster due to cluster level locking.

However, is it safe between two clusters both sharing the same LVM on iscsi? (common use case in our vmware environment, vmware uses iscsi locks)

If not always safe, is it safe if you make sure no adding or deleting of vms are executed concurrently on the two different clusters?
No, it is not safe between independent clusters.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
This mentions it's safe in a cluster due to cluster level locking.

However, is it safe between two clusters both sharing the same LVM on iscsi? (common use case in our vmware environment, vmware uses iscsi locks)

If not always safe, is it safe if you make sure no adding or deleting of vms are executed concurrently on the two different clusters?
VMware does this via locking in its VMFS file system.
LVM locking is at cluster level.
You can also use OCFS2 on your LUN, but you have to set that up yourself and test for problems with every update.
Or use simple NFS ;)
 
VMware does this via locking in its VMFS file system.
LVM locking is at cluster level.
You can also use OCFS2 on your LUN, but you have to set that up yourself and test for problems with every update.
Or use simple NFS ;)
Yeah, it would be nice if proxmox/lvm did it with iscsi locks for at least the lvm layout. (ideally on the vms too to prevent vm corruption of booting on two different nodes as vmware does)

Tempted to do OCFS2, but not being supported the retesting every update worries me... If it was officially supported that's probably the route I would take.

NFS has it's own HA issues, especially if your current storage doesn't directly support NFS...
 
it would be nice if proxmox/lvm did it with iscsi locks for at least the lvm layout.
To use proper terminology - there is no such thing as iSCSI locks. There are SCSI Persistent Reservations, which are in the SCSI protocol.

However, VMWare has not used PR in a long time. They moved to using VAAI ATS primitive (Atomic Test and Set), which has since been added to industry standard.

Windows Failover Clustering uses PR though.

LVM was not designed to be used in shared storage, so its not possible for PVE to use PR with LVM. Not to mention that PVE does not have any support for PR today.



Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
Last edited:
Yeah, it would be nice if proxmox/lvm did it with iscsi locks for at least the lvm layout. (ideally on the vms too to prevent vm corruption of booting on two different nodes as vmware does)

Tempted to do OCFS2, but not being supported the retesting every update worries me... If it was officially supported that's probably the route I would take.

NFS has it's own HA issues, especially if your current storage doesn't directly support NFS...
This will not be implemented.
VMware used to use SCSI locking, but the entire disk is always locked with every write, which leads to extremely fast locking delays.
Locking at file system level cannot be recreated if no file system is used.

VMware and HyperV are file system agnostic and proxmox block agnostic.
LVM locking works differently to locking at file system level and is limited to the cluster. As a rule, this is not a problem at all. I see no reason to present a LUN to several clusters. Even with VMware, this leads to significant performance degradation.

If you want to continue using your old storage until it is no longer supported, then you can do this well with LVM.
I only use Ceph for new clusters. You can also share Ceph across many clusters.
 
VMware and HyperV are file system agnostic and proxmox block agnostic.
I think you meant "centric" rather than "agnostic". Although I am not sure its correct to say that PVE is block centric. It can work equally well with block and file storage (qcow). But each comes with their own limitations. Some storage options, though, work better than others.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
Last edited:
  • Like
Reactions: Falk R.
To use proper terminology - there is no such thing as iSCSI locks. There are SCSI Persistent Reservations, which are in the SCSI protocol.

However, VMWare has not used PR in a long time. They moved to using VAAI ATS primitive (Atomic Test and Set), which has since been added to industry standard.

Windows Failover Clustering uses PR though.

LVM was not designed to be used in shared storage, so its not possible for PVE to use PR with LVM. Not to mention that PVE does not have any support for PR today.



Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
Yes, I was thinking persistent reservations.

Was not familiar with VAAI ATS, I'll have to look into that.

Obviously it's not possible today, but I don't see why it would not be possible for LVM to work with PR, although it might not be practical to implement compared to other features...
 
Tried GFS2 as filesystem ?

I have it in use for shared storage on my setup - not ISCSI , but SAS -shared attatched storage via MSA2040 storage.
Somwhere in the howto's i did document my findings about this.

But as ISCSI is just a different (shared) way of offering storage to a node the general filesystem - things should still apply.

[edit] found my posts -> https://forum.proxmox.com/threads/p...-lvm-lv-with-msa2040-sas-partial-howto.57536/
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!