Cluster aware FS for shared datastores?

dwma · Apr 11, 2025

Hi,
Just wondering if it's somewhere in proxmox roadmap to add some cluster aware filesystem (similar to the VMFS etc) with possibility to configure it via GUI.
I have a bunch of Dell VRTx servers (2/4 blade system with shared datastore) - and the shared PERC is not able to work in passthrough mode, so Ceph is not an option here.

Also having the shared datastore as LVM = loosing snapshot ability.

bbgeek17 · Apr 11, 2025

Hi @dwma ,

The general PVE roadmap can be found here: https://pve.proxmox.com/wiki/Roadmap#Roadmap
Of course, some deviations and additions are possible.

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

spirit · Apr 11, 2025

I'm currently working to add snapshot on shared lvm, no target date yet. ("when it's done").

UdoB · Apr 11, 2025

dwma said:
Just wondering if it's somewhere in proxmox roadmap to add some cluster aware filesystem

Just to make sure: you know this table? https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_storage_types

gurubert · Apr 11, 2025

There is OCFS2 which can be setup as clustered filesystem on a shared LUN.
You need the 6.14 kernel and will get no official support.

floh8 · Apr 12, 2025

Why u need the 6.14 kernel?

gurubert · Apr 12, 2025

From kernel 6.5 to at least 6.8 there is an issue with OCFS2 and io_uring that produces IO errors inside the VM.

Unfortunately OCFS2 is not well maintained.

spirit · Apr 13, 2025

and the write performance with ocfs2 or gfs2 is not great too from my last year tests. (mostly on new block allocation and when you take snapshots)

gurubert · Apr 13, 2025

You need to create the OCFS2 filesystem with "-T vmstore", which creates 1MB clusters for the files.
Each time when a file needs to be enlarged, all nodes have to communicate so that they know about the newly allocated blocks.
With larger cluster sizes this happens less often.

freakingObelix · Apr 13, 2025

What gave me best results in cluster-shared storage with prox was:
1st place: fibrechannel SAN with a <100TB each LUN and thick LVM - Rock solid. Works with iSCSI or FCoE but consider pure FC to leave ethernet alone. Better performance. You can configure this almost entirely using pve gui, with the exception of the bootstrap of lv and vg (one command, only the 1st time and in one node). You will miss snapshots though, and will need to use a separate LUN or storage for backups (can be Ceph).
2nd place: Ceph. Nothing to add except RAM usage and network considerations. This case is like iSCSI or FCoE, where it is always better to have separate interfaces for storage.

dwma · Apr 16, 2025

freakingObelix said:
2nd place: Ceph. Nothing to add except RAM usage and network considerations. This case is like iSCSI or FCoE, where it is always better to have separate interfaces for storage.

Ceph looks nice. But since I have RAID on the shared datastore, and I cannot passthrough the disk from shared perc is a no go for me.

That's strange, that so enterprise solution like proxmox still doesn't have working solution (native, via GUI) with ability to set some cluster aware FS that supports snapshots. @t.lamprecht is proxmox planning to cover this scenario in some future?

Johannes S · Apr 16, 2025

Well in this forum some people reported that they use manual backups instead of snapshots as kind of a workaround since vm backups can use snapshots even if the storage can't. Of course (depending on your usecase) this is not always applicable since snapshots are not backups and vice versa.
A advantage of this approach is, that you don't need to revert the whole system to the snapshot if you just messed up some configuration files: Instead you would just restore them.

dwma · Apr 16, 2025

Johannes S said:
Well in this forum some people reported that they use manual backups instead of snapshots as kind of a workaround since vm backups can use snapshots even if the storage can't. Of course (depending on your usecase) this is not always applicable since snapshots are not backups and vice versa.
A advantage of this approach is, that you don't need to revert the whole system to the snapshot if you just messed up some configuration files: Instead you would just restore them.

Yup, that's some workaround here. But dunno how it'll behave with 3rd party backup software like Veeam B&R when you don't have datastore without snapshot ability, because normally Veeam is doing VM snapshot before backing it up.

Johannes S · Apr 16, 2025

dwma said:
Yup, that's some workaround here. But dunno how it'll behave with 3rd party backup software like Veeam B&R when you don't have datastore without snapshot ability, because normally Veeam is doing VM snapshot before backing it up.

Veaam and the teething troubles of it's Proxmox VE support are a story of it's own (just look it up here in the forum). Proxmox native backup functions (be it vzdumps or PBS) use qemus ability to create snapshots of virtual machines which are storage agnostic. I don't know whether Veeam utilizes them or not. However since in the newest Proxmox VE version they rehauled their backup api hopefully Veeam and other third party vendors will have a better support in the future.

freakingObelix · Apr 16, 2025

I'm going to experiment a bit with Longhorn by SUSE. It has been a while since I've saw it and begin accumulating willness to try. As I've read, it has no trouble with raid.

emunt6 · Aug 20, 2025

Hi!

I'm looking for something similar for the new small setup:
- "3-node-storage" with separate "3-node-hypervisor" ( 6 servers in total ).

My searching found the followings for storage-nodes replication/HA:

Code:

- MooseFS - https://moosefs.com
- SaunaFS - https://github.com/leil-io/saunafs
- NVIDIA AIStore - https://aiatscale.org
- DAOS - https://daos.io/

Above list, the "NVIDIA AIStore" looks promising, but testing is needed.

spirit · Aug 21, 2025

emunt6 said:
Above list, the "NVIDIA AIStore" looks promising, but testing is needed.

don't seem to be a FS, but more an s3 like object storage, it'll not work

aragel · Feb 24, 2026

emunt6 said:
Hi!

I'm looking for something similar for the new small setup:
- "3-node-storage" with separate "3-node-hypervisor" ( 6 servers in total ).

My searching found the followings for storage-nodes replication/HA:

Code:

- MooseFS - https://moosefs.com - SaunaFS - https://github.com/leil-io/saunafs - NVIDIA AIStore - https://aiatscale.org - DAOS - https://daos.io/

Above list, the "NVIDIA AIStore" looks promising, but testing is needed.

Proxmox plugin for SaunaFS is in the works and hope it won't take long to appear.

kmentzelos · Feb 24, 2026

freakingObelix said:
What gave me best results in cluster-shared storage with prox was:
1st place: fibrechannel SAN with a <100TB each LUN and thick LVM - Rock solid. Works with iSCSI or FCoE but consider pure FC to leave ethernet alone. Better performance. You can configure this almost entirely using pve gui, with the exception of the bootstrap of lv and vg (one command, only the 1st time and in one node). You will miss snapshots though, and will need to use a separate LUN or storage for backups (can be Ceph).
2nd place: Ceph. Nothing to add except RAM usage and network considerations. This case is like iSCSI or FCoE, where it is always better to have separate interfaces for storage.

Hi,

If I understand correctly, in your 1st place setup a Fibre Channel SAN presents a shared LUN to all Proxmox nodes, and you use LVM (thick) on top of it. There is no filesystem at the host level, only LVM with VM disks as logical volumes.

Since standard LVM is not cluster-aware, how is data integrity guaranteed when multiple nodes see the same LUN?
What prevents two nodes from modifying LVM metadata at the same time and corrupting the volume group?

Also, how is this handled during VM migration?

I’m trying to understand how this setup avoids consistency issues compared to other shared block-storage approaches.

Thanks!

aaron · Feb 24, 2026

kmentzelos said:
Since standard LVM is not cluster-aware, how is data integrity guaranteed when multiple nodes see the same LUN?
What prevents two nodes from modifying LVM metadata at the same time and corrupting the volume group?

The Proxmox VE cluster puts a storage lock in place so the other nodes know not to modify the metadata.

kmentzelos said:
Also, how is this handled during VM migration?

As with any other shared storage, there is only ever one active VM process accessing the data. Either the source VM, or after the handover, when the state has been fully migrated, the target VM instance.

Cluster aware FS for shared datastores?

New Member

Distinguished Member

Distinguished Member

Distinguished Member

Distinguished Member

Renowned Member

Distinguished Member

Distinguished Member

Distinguished Member

New Member

New Member

Distinguished Member

New Member

Distinguished Member

New Member

Active Member

Distinguished Member

New Member

New Member

Proxmox Staff Member

We value your privacy