Storage - how to do it right?

Seabob

New Member
Jan 22, 2024
12
3
3
DE
www.seabob.com
Dear community,
I have about 20 years experience with smaller enterprise-class VMware deployments using SAN-storage. The first thing I had to learn about ProxMox is: it's using entirely different concepts regarding design and use of storage, so I need some advice for my planning.

observed differences to Vmware:
#1 as VMware is using its cluster aware filesystem VMFS, so you can use any volume for any purpose.
#2 usually you share volumes across all servers in the cluster and avoid using local storage.
#3 backup and replication of VMs are usually done by 3rd party tools (like VeeAm or Nakivo).
#4 in most cases you are using raw LUNs, which are provided from dedicated storage-systems by iSCSI or fibre channel.
#5 you can eihter migrate the processing of the VM to another host, while leaving the storage the same place (as it is shared, usually), or you migrate the VM to another store due space or performance reasons. In VMware this will alwys work with the Vm being shutdown and in manycases with a VM running. In ProxMox it appears to be the other way round.

What I like to find out is, which model, filesystem and technique to use best, when storage-systems are already in place? My NAS-devices can provide storage by NFS/SMB or by iSCSI.
How to plan where to store ISOs, backups and VMs? (we don't use containers right now)
How to asynchronously replicate VMs offsite for deasaster reovery purposes? (offsite => different town, datacenter)
When to "publish" storage assigned to one node to the cluster?

In the end I plan to attach 2 or 3 nodes in order to run VMs and have one storage-device for running business and another one for keeping backups and maybe replicas.

Is there any guide or manual available recommending when to use Ceph, Gluster, ZFS or LVMs and what hardware-components are needed to build such an environment?
For my taste, the "storage section" in the proxmox manual opens more questions than it answers for the first step.

Thanks in advance for helping me a bit out of this confusion.
 
How to plan where to store ISOs, backups.
The type of data you above requires file-based storage, for centralized shared storage that means NFS or CIFS. Given that PVE is a Linux system, I would stick with NFS.
How to asynchronously replicate VMs offsite for deasaster reovery purposes? (offsite => different town, datacenter)
Currently there is only one native PVE replication (integrated into PVE GUI/CLI/API) - ZFS. However, ZFS is a local filesystem (not compatible with shared storage requirements). Additionally the native replication requires that source and target nodes are in the same cluster. However, splitting a cluster across data centers is prone to timeouts and quorum issues. There would have to be a reliably low latency network between the two.
What we have seen with some of our customers is to use PBS backup/replication to move the data off-site.
When to "publish" storage assigned to one node to the cluster?
You'd have to expand on this. There is no "publish" operation in PVE. In general, all PVE nodes in the cluster must see the Shared Storage at the same time. PVE then arbitrates which node is the "owner" of a particular slice/file, depending on your Shared Storage implementation. You wouldnt be managing this manually.
Is there any guide or manual available recommending when to use Ceph, Gluster, ZFS or LVMs and what hardware-components are needed to build such an environment?
There are a few documents posted as sticky in the forum. There is also PVE documentation available. But there is no "one place covers all" that has what you want.
In general:
Ceph is used when there is no external shared storage available. It uses local disks of each node in the cluster to create a Distributed Storage model. It is well integrated into PVE. As everything in life, it comes with its pluses and minuses.
Gluster is similar to Ceph at a high level. It is currently in "retirement" phase by its primary maintainer - RedHat.
ZFS is not a Cluster Aware filesystem suitable for Shared Storage.
LVM is suitable for shared storage in PVE model. PVE acts as an arbiter of access to specific LVM slices by each node. However, only THICK LVM is supported in this use cases. Meaning that there is no Thin provisioning or Snapshot (thin Clone) support.



Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
  • Like
Reactions: Kungolf
Just a note, since my 3-year experience with Proxmox feels not much. But you might consider the following from my own observation:
ad #1 - the different storages (local and/or network) in Proxmox cluster are created with availability specified for selected nodes (or all), so if you are carefull with names you can use them across cluster quite freely
ad #2 - you can use network storage of different kind (PVE GUI shot):
1706023957397.png
ad #3 - there is a very good Proxmox Backup Server (PBS). It's a separate system which can be installed both on a physical machine (wiht sufficient storage) or as a VM and can also be connected to some external storage. It provides incremental backups, deduplication, granular access to files and more: https://www.proxmox.com/en/proxmox-backup-server/features
ad #4 - you can provide storage on the Dataceneter (cluster) level, including iSCSI, and enable access for other seleced nodes, if cecessary. In my experience, NFS via 10 GB SFP+ to an array with SATA SSD disks is enough for daily VM operation - at least for our purposes
ad #5 - VM migration between nodes works pretty good if the storages are properly configured for all nodes in the cluster. HA also is an option, if you have enough resources and it just works

Other topics: for example you can store ISOs on local storage, VMs disks on an array with iSCSi or NFS and backups on some external array connected directly to PBS instance via NFS. Replication of the copies can be achieved by installing external PBS system with appropriate storage and configuring synchronization of backups between both instances (via PBS GUI). Considering Ceph, Gluster, ZFS or LVMs - please note comment s by @bbgeek17

As far as the migration itself is considered, there are simple CLI tools and GUI options to convert exported vmware disk images to qcow2 format using ovf files. I have done that several times usually without complications.
 
Thanks @bbgeek17 and @Kungolf for your replies, this helps me moving forward.

I still have a few questions regarding ZFS:
#1 are the data integrity features also available with single disks formatted with ZFS? I mean, 1 ZFS-volume on one /dev/sd*, with no ZFS-RAID?
Sorry for asking, but the documents regarding ZFS are not always fully clear on this.
#2 I know to span a cluster across sites is a tricky task, but I also read about "ZSync" as potential alternative for such a use case. Do you have experience with it? (I will drill into PBS replication anyway, it sounds reasonable to me.)
#3 Is multipathed "ZFS over iSCSI" is recommended for local, 1 host-only storage? I mean, iSCSI in my case implies accessing a LUN on a block-level storage-system, which usually works with hardware-RAID-controllers. This approach leaves the responsibility for a consistent file-system on the initiator`s side (the host).

First conclusion for myself: ProxMox stays very different from VMware, but "different" here just means "many things to learn".
As there is obviously no "one size - fits all" as @bbgeek17 put it, I have to decide between shared network storage (file-level, NFS) or the enhanced data-integrity mechanisms of ZFS and potentially give up thoughts on iSCSI/block-level storage.

Maybe one of you can clarify another observation to me:
Say, I created a store called "host1_sdb" on host1 and a store "host2_sdb" on host2, which both reside on host-local disks (not something shared), then I can't move/migrate a VM from host1 to host2. But, when I name both stores "local_sdb", then moving/migration works. I mean, the volumes are same, only the naming changed.
Of course, naming local stores all in the same way will lead to some confusion in the GUI in order to figure out exactly which hosts "local_sdb" is addressed. Is this what you, @Kungolf, meant by "careful naming"?
 
#1 are the data integrity features also available with single disks formatted with ZFS? I mean, 1 ZFS-volume on one /dev/sd*, with no ZFS-RAID?
Sorry for asking, but the documents regarding ZFS are not always fully clear on this.

If you want to have auto healing using single disk then you have to set copies more than 1


copies=1|2|3
Controls the number of copies of data stored for this dataset. These copies are in addition to any redundancy provided by the pool, for example, mirroring or RAID-Z. The copies are stored on different disks, if possible.
The space used by multiple copies is charged to the associated file and dataset, changing the used property and counting against quotas and reservations.

Changing this property only affects newly-written data. Therefore, set this property at file system creation time by using the -o copies=N option.

Remember that ZFS will not import a pool with a missing top-level vdev. Do NOT create, for example a two-disk striped pool and set copies=2 on some datasets thinking you have setup redundancy for them. When a disk fails
you will not be able to import the pool and will have lost all of your data.

Encrypted datasets may not have copies=3 since the implementation stores some encryption metadata where the third copy would normally be.
 
Correct. But I don't think its that useful. Setting copies to two halves the capacity. So to get the same capacity you could directly create a mirror with copies=1 and get the additional benefit of no data loss or downtime in case a disk fails.
 
Maybe one of you can clarify another observation to me:
Say, I created a store called "host1_sdb" on host1 and a store "host2_sdb" on host2, which both reside on host-local disks (not something shared), then I can't move/migrate a VM from host1 to host2. But, when I name both stores "local_sdb", then moving/migration works. I mean, the volumes are same, only the naming changed.
Of course, naming local stores all in the same way will lead to some confusion in the GUI in order to figure out exactly which hosts "local_sdb" is addressed. Is this what you, @Kungolf, meant by "careful naming"?
Generally yes, naming storages is important, but you still need to select "Target storage" by name from the list in the right-bottom:
1706100824702.png
note that the storage that you want to migrate VM to has to be available for both nodes. "Current layout" default option implies, as far as I know, same storage name on both nodes, which indeed can be cofusing. On the other hand, if you plan to use HA, than it's probably better to locate selected VMs beforehand on commonly shared storage (with necessary speed and volume)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!