Storage for small clusters, any good solutions?

I don;t agree that the ceph learning curve is steep. Read the docs, get the right network(i would say everything else but this is primary),and start working with it. There are rarely problems, if we count out physical layer problems.
This implies that the operator has both the skill, experience, and the wherewithal (not having a dozen other responsibilities) to understand the docs and apply them. Ceph is not attractive at the low end precisely because it requires engineer level admin, which is neither common or cost effective. Its easy to forget that this forum contains people who already have the commitment and time; in a small business those things are costly.

In my experience, ceph begins to make sense at the 6+ nodecount, which puts it outside the realms of small business who are not data center operators; servers require "care and feeding." Yes, I can already hear the "but I run a 2 node cluster and its fine" but keep in mind what you are actually providing- homelab isnt a "for-realz" enterprise where poor performance, downtime=lost revenue, and reputational damage.

IMO, if a customer is requiring a 3 node cluster + SAN w/ controller based failover, blockbridge is hard to beat for proxmox customers.
To be honest, Blockbridge isn't a great fit for very small environments
You can live comfortably with LVM thick over block. It performs well and while does not offer all the desired features, its "good enough."
 
The point is that it's also often possible to do HA on the application level and might be even more relieable. For example for webhosting having multiple webserver vms (on different nodes) and before them a loadbalancer/reverse proxy like nginx or haproxy. Or a active/passive database cluster out of multiple mariadb or postgres instances on different linux vms.
We're definitely at cross purposes here. Highly available storage <> highly available applications. And striving to have an infrastructure that delivers both is not a bad thing at all.
 
Hi there,
the Title may be a bit deceptive as I know there are good solutions that work for many but for me/my workplace we face a bit of a dilemma.

I know I'm opening this can of worms again and this is also partly me venting a bit of my frustration and I'm sorry about that.

We wanna use Proxmox VE more or rather be able to get more of our customers to be able to switch to it. Many are already interested but for most of them the lack of a true cluster-aware filesystem is a problem.


I know that you can have a shared-lvm on a SAN storage and now even make snapshots with it since Version 9.0 and TPM Snapshots since Version 9.1 (if still not live and snapshots as volume chains are in tech-preview still), but then the problem is that everything is thick provisioned. So 2 Snapshots of a 500GB VM is 1.5TB of space used for just a single VM.


I also know that there is Ceph but I have some gripes with that too:
1. I find it a bit complicated and Im afraid one thing wrongly configured can have huge consequences in the long run (but that's my problem and I do need to do more research)

2. It needs 3 Servers minimum which is a problem as many of our customers that want to migrate have 2 Node HA Clusters running Hyper-V or VMware ESXi + a San storage (often direct attached) , so it would be a big investment getting another server (or 3 to renew the entire environment) +100GBit Network hardware and enough internal Storage on every host to cover for the storage you can't actually use.

3. I also read a couple of times that a 3-Node Ceph cluster, while technically possible, is less than optimal for production use and can be fragile if a node or a couple of disks fail.

So all in all ceph would also not be a viable solution for these customers and would only really be a good option for bigger customers and 5 or more servers.
Please do correct me if I'm missing something or if I'm completely in the wrong.


Are there any NAS Systems or renowned Storage Manufacturers (Dell, IBM, Lenovo) that have Solutions that can be used for ZFS over iSCSI?
I researched into that as well because in the Storage Table its listed to also have full functionality for snapshots and thin-provisioning while shared and fully supported by the proxmox team. It seems to be a rather obscure thing though, I couldn't find a whole lot about ZFS over iSCSI.


I also know that Cluster-Aware File-Systems like GlusterFS and OCFS2 exist and can work, I tried myself with OCFS2, but it's important for us that the technology used is officially supported by the proxmox team, so if there should be any problems that we cannot fix, the proxmox support won't handle it on a best-effort basis.


I want to ask if anyone has experience with blockbridge? It looks interesting.
If so, how is the support and is there support in the EU/Germany? Which hardware is needed?
Would a blockbridge storage work as a sort-of direct replacement for a SAN-Shared storage that was directly attached to 2-Hyper-V Servers for example?
It'd be great if anyone could share their experience with them.


Lastly, are there any other solutions that I'm missing, that are stable and tick the boxes of being thin-provisioned and have snapshot supported while being a shared storage?

-note: It's not that the customer must keep all their old hardware or stay on a SAN storage necessarily (that would be great though), if e.g. a blockbridge storage can be used as a viable alternative to a e.g. IBM FlashSystem 5015 or something like it, then buying then new storage would be fine. We just need a working solution that ticks all the boxes and doesnt mean replacing everything or being too expensive.

Thank you in advance, any guidance would be greatly appreciated!

I’m exploring a similar setup using DRBD (simple configuration, not LINSTOR).
I’ve developed a set of scripts to automate storage management in small Proxmox clusters:
  • Provision per-VM storage with DRBD + LVM
  • Resize storage layers safely (LV → DRBD → VG → Proxmox storage)
  • one-VM-per-storage policy for simplicity and reliability
The scripts are primarily intended for lab or small-scale HA environments, where a lightweight, script-driven workflow is preferable over full storage management solutions.

Check them out here:

https://codeberg.org/kmentzelos/ProxMox-Mini-Cluster
 
You can live comfortably with LVM thick over block. It performs well and while does not offer all the desired features, its "good enough."
@alexskysilk

You're right to call that out. "Very small environments" was a poor choice of words on my part. The distinction I was trying to make isn't about node count, it's about business value. We have customers running three node clusters with tens of millions of dollars of business value on them. Also, with hundreds of cores and VMs on a single node these days, a 3 node cluster can represent a relatively large production footprint. What I should have said is: we're honestly not the best fit for environments where someone is looking for minimum infrastructure spend, regardless of how many nodes they have. "Small" was the wrong word. "Non critical" or "low impact" would have been closer to what I meant.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
This implies that the operator has both the skill, experience, and the wherewithal (not having a dozen other responsibilities) to understand the docs and apply them. Ceph is not attractive at the low end precisely because it requires engineer level admin, which is neither common or cost effective. Its easy to forget that this forum contains people who already have the commitment and time; in a small business those things are costly.

In my experience, ceph begins to make sense at the 6+ nodecount, which puts it outside the realms of small business who are not data center operators; servers require "care and feeding." Yes, I can already hear the "but I run a 2 node cluster and its fine" but keep in mind what you are actually providing- homelab isnt a "for-realz" enterprise where poor performance, downtime=lost revenue, and reputational damage.



You can live comfortably with LVM thick over block. It performs well and while does not offer all the desired features, its "good enough."
Usually i don't agree with that,because i've had customers on 3-node CEPH/Proxmox for more than 3-4 years without a hiccup (not counting power errors,etc). And nowadays, with proxmox it is everything batteries included, only thing admin needs to know about it a little bit of linux,bit of networking and a bit of administration. Usually, I recommend my customers with going through CCNA route to learn a bit about systems and networks around them.

I could talk a lot about this, but i hate typing :D
 
  • Like
Reactions: Johannes S