[TUTORIAL] PVE 7.x Cluster Setup of shared LVM/LV with MSA2040 SAS [partial howto]

Glowsome · Sep 1, 2024

spirit said:
Another question about write performance,

I have done some test with fio, and I have abymissal results when the vm disk file is not preallocated.

preallocated: I got around 20000iops 4k randwrite, 3GB/S write 4M. (This is almost the same than my physical disk without gfs2)

but when the disk is not preallocated, or when I take a snapshot on a preallocated drive. (so new write are not preallocated anymore), I have :

60 iops 4k randwrite, 40MB/S for write 4M

I have not examined, nor taken measurements in regards of performance, so i cannot provide you with data.

spirit · Sep 1, 2024

Glowsome said:
I have not examined, nor taken measurements in regards of performance, so i cannot provide you with data.

ok thanks !

works fine without lvm in my tests, so no need to use lvmlockd, vgscan,.and all other lvm stuff.

About performance, I have compared with ocfs2, and it's really night and day with 4k direct write when the file is not preallocated. (i'm around 20000iops on ocfs2 and 200 iops on gfs2).

I have also notice that qcow2 snapshot is lowering iops de 100~200iops for 4k direct write. It's also happening with local storage, so I'll look to implement external qcow2 snapshot. (snapshot in external file). I don't have performance regression with external snapshot.

the4amfriend · Oct 11, 2024

Hey @spirit & @Glowsome thank you for such an informative thread.

I have 6 hosts in my cluster and 2 MSAs that I am trying to use as clustered distributed storage. I initially tried using LVM on top of iSCSI to implement that but soon found out that the files were not being replicated across nodes and realised I needed GFS2. So I've installed and configured to the best of my knowledge (I don't want to use LVM if I can avoid it so have configured only GFS2 & DLM) but I don't get a prompt back when I try to mount - here is my dlm.conf

log_debug=1
protocol=tcp
post_join_delay=10
enable_fencing=0
lockspace Xypro-Cluster nodir=1

dlm_tool status
cluster nodeid 1 quorate 1 ring seq 9277 9277
daemon now 2743 fence_pid 0
node 1 M add 16 rem 0 fail 0 fence 0 at 0 0
node 2 M add 710 rem 0 fail 0 fence 0 at 0 0
node 3 M add 785 rem 0 fail 0 fence 0 at 0 0
node 4 M add 751 rem 0 fail 0 fence 0 at 0 0
node 5 M add 816 rem 0 fail 0 fence 0 at 0 0
node 6 M add 1145 rem 0 fail 0 fence 0 at 0 0

I'd appreciate any help.

spirit · Oct 20, 2024

the4amfriend said:
Hey @spirit & @Glowsome thank you for such an informative thread.

I have 6 hosts in my cluster and 2 MSAs that I am trying to use as clustered distributed storage. I initially tried using LVM on top of iSCSI to implement that but soon found out that the files were not being replicated across nodes and realised I needed GFS2. So I've installed and configured to the best of my knowledge (I don't want to use LVM if I can avoid it so have configured only GFS2 & DLM) but I don't get a prompt back when I try to mount - here is my dlm.conf

I'd appreciate any help.

Hi,
here my dlm.conf

Code:

# Enable debugging
log_debug=1
# Use tcp as protocol
protocol=sctp
# Delay at join
#post_join_delay=10
# Disable fencing (for now)
enable_fencing=0

I'm using protocol=sctp because I have multiple corosync link, and it's mandatory.

then I format with gfs2 my block device

mkfs.gfs2 -t <corosync_clustername>:testgfs2 -j 4 -J 128 /dev/mapper/36742b0f0000010480000000000e02bf3

(here I'm using a multipath iscsi lun)

and finally I'm mounting it

mount -t gfs2 -o noatime /dev/mapper/36742b0f0000010480000000000e02bf3 /mnt/pve/gfs2

RoscioG · Nov 21, 2024

Hi,

I’m writing this post after testing the Glowsome configuration for about two months, followed by four months of production use on three nodes with mixed servers connected via FC to a Lenovo De2000H SAN.
I want to thank @Glowsome for the excellent work they’ve done.

I sincerely hope that this solution can become officially supported in Proxmox in the future.

Thank you again!

iwik · Mar 20, 2025

There is this tutorial https://forum.proxmox.com/threads/poc-2-node-ha-cluster-with-shared-iscsi-gfs2.160177/ which I have used to setup 2 node cluster in our lab, FC SAN (all flash storage), GFS2 directly on multipath device (simple setup)
From features perspective everything seems to be working (we only miss tpm2 blocking snapshots), all basic features we need (snapshots + san)
In lab seems to be stable, performance is also ok, even discard is supported on gfs2.
Some performance from windows VM:

Sequential speeds shows 8Gbit HBA are bottlenecks in this case.

einhirn · May 21, 2025

I have issues with DLM/Mount on boot with this setup - although I'm not using LVM but the raw LUNs themselves. I've added some dependencies to the FStab entries, but the automatic mount still somehow runs into indefinite "kern_stop" for the "mount" commands. Can only be fixed by rebooting.

My current workaround is to define the mount as "noauto" and mount it manually after the proxmox box is completely booted. That works fine up until now.

Here's my fstab entry:

Code:

/dev/disk/by-uuid/8ee5d7a9-7b19-4b45-b388-bb5758c20d77 /mnt/pve/storage-gfs2-01 gfs2 _netdev,noauto,noacl,lazytime,noatime,rgrplvb,discard,x-systemd.requires=dlm.service,x-systemd.requires=nvmf-connect-script.service,x-systemd.requires=pve-ha-crm.service,nofail 0 0
/dev/disk/by-uuid/1a89385a-965c-4014-9b83-f90a1f3782f6 /mnt/pve/storage-gfs2-02 gfs2 _netdev,noauto,noacl,lazytime,noatime,rgrplvb,discard,x-systemd.requires=dlm.service,x-systemd.requires=nvmf-connect-script.service,x-systemd.requires=pve-ha-crm.service,nofail 0 0

With the x-systemd.requires and the _netdev flag, systemd adds following dependencies:

Code:

After=dlm.service nvmf-connect-script.service pve-ha-crm.service
Requires=dlm.service nvmf-connect-script.service pve-ha-crm.service
After=blockdev@dev-disk-by\x2duuid-1a89385a\x2d965c\x2d4014\x2d9b83\x2df90a1f...target

DLM should obviously be started, and the NVMe-over-TCP connection should be established. The last bit was a first stab at a workaround, trying to wait for corosync to be ready, but it didn't work reliably. Systemd automatically added the "After=blockdev@...target", which seems fine.

I don't know whether it's a race condition because of mounting two shares at once - or is it a fencing related issue? This is my default DLM config, I'm using sctp because I've got two rings defined in corosync. I was already experimenting with disabling additional fencing related options, though. I wasn't sure whether disabling something like "enable_quorum_lockspace" would be a good idea...

Code:

# cat /etc/default/dlm
DLM_CONTROLD_OPTS="--enable_fencing 0 --protocol sctp --log_debug"

# new settings might add
# --enable_startup_fencing 0 --enable_quorum_fencing 0

Can anyone see an error I've overlooked?

david_tao · May 21, 2025

einhirn said:
I have issues with DLM/Mount on boot with this setup - although I'm not using LVM but the raw LUNs themselves. I've added some dependencies to the FStab entries, but the automatic mount still somehow runs into indefinite "kern_stop" for the "mount" commands. Can only be fixed by rebooting.

My current workaround is to define the mount as "noauto" and mount it manually after the proxmox box is completely booted. That works fine up until now.

Here's my fstab entry:

Code:

/dev/disk/by-uuid/8ee5d7a9-7b19-4b45-b388-bb5758c20d77 /mnt/pve/storage-gfs2-01 gfs2 _netdev,noauto,noacl,lazytime,noatime,rgrplvb,discard,x-systemd.requires=dlm.service,x-systemd.requires=nvmf-connect-script.service,x-systemd.requires=pve-ha-crm.service,nofail 0 0 /dev/disk/by-uuid/1a89385a-965c-4014-9b83-f90a1f3782f6 /mnt/pve/storage-gfs2-02 gfs2 _netdev,noauto,noacl,lazytime,noatime,rgrplvb,discard,x-systemd.requires=dlm.service,x-systemd.requires=nvmf-connect-script.service,x-systemd.requires=pve-ha-crm.service,nofail 0 0

With the x-systemd.requires and the _netdev flag, systemd adds following dependencies:

Code:

After=dlm.service nvmf-connect-script.service pve-ha-crm.service Requires=dlm.service nvmf-connect-script.service pve-ha-crm.service After=blockdev@dev-disk-by\x2duuid-1a89385a\x2d965c\x2d4014\x2d9b83\x2df90a1f...target

DLM should obviously be started, and the NVMe-over-TCP connection should be established. The last bit was a first stab at a workaround, trying to wait for corosync to be ready, but it didn't work reliably. Systemd automatically added the "After=blockdev@...target", which seems fine.

I don't know whether it's a race condition because of mounting two shares at once - or is it a fencing related issue? This is my default DLM config, I'm using sctp because I've got two rings defined in corosync. I was already experimenting with disabling additional fencing related options, though. I wasn't sure whether disabling something like "enable_quorum_lockspace" would be a good idea...

Code:

# cat /etc/default/dlm DLM_CONTROLD_OPTS="--enable_fencing 0 --protocol sctp --log_debug" # new settings might add # --enable_startup_fencing 0 --enable_quorum_fencing 0

Can anyone see an error I've overlooked?

Hi einhirn: you don't have to use DLM, it's required by GFS instead Shared LVM. Recommend you can reference to https://kb.blockbridge.com/technote/proxmox-lvm-shared-storage/

einhirn · Jun 5, 2025

david_tao said:
it's required by GFS

Exactly - that's what I'm using. Ok, I didn't mention that other than in the "fstab" lines, but since this thread is about using GFS2 I didn't think it neccessary.

Btw: I'm also using shared thick LVM storage via iSCSI+Multipathing and NVMe-over-TCP, but I really like to use thin provisioning for VMs and possibly snapshots, even though I was surprised that QCOW-Snapshots were internal (i.e. same file) in PVE, but that's a different topic.

einhirn · Jun 5, 2025

einhirn said:
I have issues with DLM/Mount on boot with this setup - although I'm not using LVM but the raw LUNs themselves. I've added some dependencies to the FStab entries, but the automatic mount still somehow runs into indefinite "kern_stop" for the "mount" commands. Can only be fixed by rebooting.
[...]

Can anyone see an error I've overlooked?

It seems that there are some dependencies to take care of:

Post in thread 'Cant Get GFS2 filesystem to mount on reboot'

Oct 6, 2022

dlm.service needs

Code:

[Unit]
After=pve-ha-crm.service

to be able to mount GFS2 on boot.

Post in thread 'Cant Get GFS2 filesystem to mount on reboot'

Oct 4, 2022

BTW: You also have to remove the $remote_fs dependency from /etc/init.d/rrdcached, otherwise you get a dependency cycle:

Code:

remote-fs.target -> rrdached.service -> pve-cluster.service
     ^                                         |
     |                                         V
gfs2.mount   <-      dlm.service  <-     corosync.service

rrdcached never writes to a remote filesystem, AFAIK.

I'll try those and check whether it helps...

javieitez · 2025-07-18T11:33:51+0200

Hi there,

I must say we did configure a production cluster with GFS2, following the instructions on this thread, and it has been working like a charm for around a year, but after some time the storage has become completely unstable and has left the cluster unusable.

For the moment, we've switched to RAW storage. Losing the ability to have snapshots is preferable to having such an unstable filesystem.

Just wanted to leave this comment as a warning to potential users: GFS2 does work, but in the long term it can also become corrupted (maybe it requires some additional maintenance?)

Johannes S · 2025-07-18T12:05:54+0200

The main issue with GFS2 (and OCFS2) is that they are not really supported, so if something bad happens you are on your own. I might be wrong but I remember also, that their isn't much development with them. Luckily there is a high chance, that Proxmox9 will feature snapshot support with qcow2 on LVM-thick (there is development work at the moment, I don't know however whether it will be ready in time) in a VMFS-like fashion. This should cover most usecases why people use ocfs2 or gfs2 and will be supported officially.

javieitez said:
For the moment, we've switched to RAW storage. Losing the ability to have snapshots is preferable to having such an unstable filesystem.

Until the snapshot/qcow2 support on LVM/thick is available this might be a workaround:

Alternatives to Snapshots
If an existing iSCSI/FC/SAS storage needs to be repurposed for a Proxmox VE cluster and using a network share like NFS/CIFS is not an option, it may be possible to rethink the overall strategy; if you plan to use a Proxmox Backup Server, then you could use backups and live restore of VMs instead of snapshots.

Backups of running VMs will be quick thanks to dirty bitmap (aka changed block tracking) and the downtime of a VM on restore can also be minimized if the live-restore option is used, where the VM is powered on while the backup is restored.
https://pve.proxmox.com/wiki/Migrate_to_Proxmox_VE#Alternatives_to_Snapshots

Now obviovsly this isn't a solution for every usecase, but maybe it's enough for you. Even if you use another backup software and a limited budget you could still use PBS only for "pseudo-snapshots" without obtaining a subscription as long as you can live with the nag screen. I wouldn't do this as permanent solution without obtaining a support subscription but for a workaround until qcow2 on LVM-thick is supported.

Search

Search

[TUTORIAL] PVE 7.x Cluster Setup of shared LVM/LV with MSA2040 SAS [partial howto]

Glowsome

Renowned Member

spirit

Distinguished Member

the4amfriend

New Member

spirit

Distinguished Member

RoscioG

Member

iwik

Member

einhirn

New Member

david_tao

Member

einhirn

New Member

einhirn

New Member

Post in thread 'Cant Get GFS2 filesystem to mount on reboot'

Post in thread 'Cant Get GFS2 filesystem to mount on reboot'

javieitez

New Member

Johannes S

Famous Member

We value your privacy