Cluster aware FS for shared datastores?

dwma

New Member
Apr 3, 2025
13
1
3
Hi,
Just wondering if it's somewhere in proxmox roadmap to add some cluster aware filesystem (similar to the VMFS etc) with possibility to configure it via GUI.
I have a bunch of Dell VRTx servers (2/4 blade system with shared datastore) - and the shared PERC is not able to work in passthrough mode, so Ceph is not an option here.

Also having the shared datastore as LVM = loosing snapshot ability.
 
You need to create the OCFS2 filesystem with "-T vmstore", which creates 1MB clusters for the files.
Each time when a file needs to be enlarged, all nodes have to communicate so that they know about the newly allocated blocks.
With larger cluster sizes this happens less often.
 
What gave me best results in cluster-shared storage with prox was:
1st place: fibrechannel SAN with a <100TB each LUN and thick LVM - Rock solid. Works with iSCSI or FCoE but consider pure FC to leave ethernet alone. Better performance. You can configure this almost entirely using pve gui, with the exception of the bootstrap of lv and vg (one command, only the 1st time and in one node). You will miss snapshots though, and will need to use a separate LUN or storage for backups (can be Ceph).
2nd place: Ceph. Nothing to add except RAM usage and network considerations. This case is like iSCSI or FCoE, where it is always better to have separate interfaces for storage.
 
Last edited:
  • Like
Reactions: Johannes S
2nd place: Ceph. Nothing to add except RAM usage and network considerations. This case is like iSCSI or FCoE, where it is always better to have separate interfaces for storage.
Ceph looks nice. But since I have RAID on the shared datastore, and I cannot passthrough the disk from shared perc is a no go for me.

That's strange, that so enterprise solution like proxmox still doesn't have working solution (native, via GUI) with ability to set some cluster aware FS that supports snapshots. @t.lamprecht is proxmox planning to cover this scenario in some future?
 
Well in this forum some people reported that they use manual backups instead of snapshots as kind of a workaround since vm backups can use snapshots even if the storage can't. Of course (depending on your usecase) this is not always applicable since snapshots are not backups and vice versa.
A advantage of this approach is, that you don't need to revert the whole system to the snapshot if you just messed up some configuration files: Instead you would just restore them.
 
Well in this forum some people reported that they use manual backups instead of snapshots as kind of a workaround since vm backups can use snapshots even if the storage can't. Of course (depending on your usecase) this is not always applicable since snapshots are not backups and vice versa.
A advantage of this approach is, that you don't need to revert the whole system to the snapshot if you just messed up some configuration files: Instead you would just restore them.
Yup, that's some workaround here. But dunno how it'll behave with 3rd party backup software like Veeam B&R when you don't have datastore without snapshot ability, because normally Veeam is doing VM snapshot before backing it up.
 
Last edited:
Yup, that's some workaround here. But dunno how it'll behave with 3rd party backup software like Veeam B&R when you don't have datastore without snapshot ability, because normally Veeam is doing VM snapshot before backing it up.
Veaam and the teething troubles of it's Proxmox VE support are a story of it's own (just look it up here in the forum). Proxmox native backup functions (be it vzdumps or PBS) use qemus ability to create snapshots of virtual machines which are storage agnostic. I don't know whether Veeam utilizes them or not. However since in the newest Proxmox VE version they rehauled their backup api hopefully Veeam and other third party vendors will have a better support in the future.
 
I'm going to experiment a bit with Longhorn by SUSE. It has been a while since I've saw it and begin accumulating willness to try. As I've read, it has no trouble with raid.
 
Hi!

I'm looking for something similar for the new small setup:
- "3-node-storage" with separate "3-node-hypervisor" ( 6 servers in total ).

My searching found the followings for storage-nodes replication/HA:
Code:
- MooseFS - https://moosefs.com
- SaunaFS - https://github.com/leil-io/saunafs
- NVIDIA AIStore - https://aiatscale.org
- DAOS - https://daos.io/

Above list, the "NVIDIA AIStore" looks promising, but testing is needed.
 
Last edited:
  • Like
Reactions: aragel
Hi!

I'm looking for something similar for the new small setup:
- "3-node-storage" with separate "3-node-hypervisor" ( 6 servers in total ).

My searching found the followings for storage-nodes replication/HA:
Code:
- MooseFS - https://moosefs.com
- SaunaFS - https://github.com/leil-io/saunafs
- NVIDIA AIStore - https://aiatscale.org
- DAOS - https://daos.io/

Above list, the "NVIDIA AIStore" looks promising, but testing is needed.

Proxmox plugin for SaunaFS is in the works and hope it won't take long to appear.
 
What gave me best results in cluster-shared storage with prox was:
1st place: fibrechannel SAN with a <100TB each LUN and thick LVM - Rock solid. Works with iSCSI or FCoE but consider pure FC to leave ethernet alone. Better performance. You can configure this almost entirely using pve gui, with the exception of the bootstrap of lv and vg (one command, only the 1st time and in one node). You will miss snapshots though, and will need to use a separate LUN or storage for backups (can be Ceph).
2nd place: Ceph. Nothing to add except RAM usage and network considerations. This case is like iSCSI or FCoE, where it is always better to have separate interfaces for storage.
Hi,


If I understand correctly, in your 1st place setup a Fibre Channel SAN presents a shared LUN to all Proxmox nodes, and you use LVM (thick) on top of it. There is no filesystem at the host level, only LVM with VM disks as logical volumes.


Since standard LVM is not cluster-aware, how is data integrity guaranteed when multiple nodes see the same LUN?
What prevents two nodes from modifying LVM metadata at the same time and corrupting the volume group?


Also, how is this handled during VM migration?


I’m trying to understand how this setup avoids consistency issues compared to other shared block-storage approaches.


Thanks!
 
Since standard LVM is not cluster-aware, how is data integrity guaranteed when multiple nodes see the same LUN?
What prevents two nodes from modifying LVM metadata at the same time and corrupting the volume group?
The Proxmox VE cluster puts a storage lock in place so the other nodes know not to modify the metadata.

Also, how is this handled during VM migration?
As with any other shared storage, there is only ever one active VM process accessing the data. Either the source VM, or after the handover, when the state has been fully migrated, the target VM instance.
 
  • Like
Reactions: Johannes S