Storage for small clusters, any good solutions?

I don;t agree that the ceph learning curve is steep. Read the docs, get the right network(i would say everything else but this is primary),and start working with it. There are rarely problems, if we count out physical layer problems.
This implies that the operator has both the skill, experience, and the wherewithal (not having a dozen other responsibilities) to understand the docs and apply them. Ceph is not attractive at the low end precisely because it requires engineer level admin, which is neither common or cost effective. Its easy to forget that this forum contains people who already have the commitment and time; in a small business those things are costly.

In my experience, ceph begins to make sense at the 6+ nodecount, which puts it outside the realms of small business who are not data center operators; servers require "care and feeding." Yes, I can already hear the "but I run a 2 node cluster and its fine" but keep in mind what you are actually providing- homelab isnt a "for-realz" enterprise where poor performance, downtime=lost revenue, and reputational damage.

IMO, if a customer is requiring a 3 node cluster + SAN w/ controller based failover, blockbridge is hard to beat for proxmox customers.
To be honest, Blockbridge isn't a great fit for very small environments
You can live comfortably with LVM thick over block. It performs well and while does not offer all the desired features, its "good enough."
 
The point is that it's also often possible to do HA on the application level and might be even more relieable. For example for webhosting having multiple webserver vms (on different nodes) and before them a loadbalancer/reverse proxy like nginx or haproxy. Or a active/passive database cluster out of multiple mariadb or postgres instances on different linux vms.
We're definitely at cross purposes here. Highly available storage <> highly available applications. And striving to have an infrastructure that delivers both is not a bad thing at all.
 
Hi there,
the Title may be a bit deceptive as I know there are good solutions that work for many but for me/my workplace we face a bit of a dilemma.

I know I'm opening this can of worms again and this is also partly me venting a bit of my frustration and I'm sorry about that.

We wanna use Proxmox VE more or rather be able to get more of our customers to be able to switch to it. Many are already interested but for most of them the lack of a true cluster-aware filesystem is a problem.


I know that you can have a shared-lvm on a SAN storage and now even make snapshots with it since Version 9.0 and TPM Snapshots since Version 9.1 (if still not live and snapshots as volume chains are in tech-preview still), but then the problem is that everything is thick provisioned. So 2 Snapshots of a 500GB VM is 1.5TB of space used for just a single VM.


I also know that there is Ceph but I have some gripes with that too:
1. I find it a bit complicated and Im afraid one thing wrongly configured can have huge consequences in the long run (but that's my problem and I do need to do more research)

2. It needs 3 Servers minimum which is a problem as many of our customers that want to migrate have 2 Node HA Clusters running Hyper-V or VMware ESXi + a San storage (often direct attached) , so it would be a big investment getting another server (or 3 to renew the entire environment) +100GBit Network hardware and enough internal Storage on every host to cover for the storage you can't actually use.

3. I also read a couple of times that a 3-Node Ceph cluster, while technically possible, is less than optimal for production use and can be fragile if a node or a couple of disks fail.

So all in all ceph would also not be a viable solution for these customers and would only really be a good option for bigger customers and 5 or more servers.
Please do correct me if I'm missing something or if I'm completely in the wrong.


Are there any NAS Systems or renowned Storage Manufacturers (Dell, IBM, Lenovo) that have Solutions that can be used for ZFS over iSCSI?
I researched into that as well because in the Storage Table its listed to also have full functionality for snapshots and thin-provisioning while shared and fully supported by the proxmox team. It seems to be a rather obscure thing though, I couldn't find a whole lot about ZFS over iSCSI.


I also know that Cluster-Aware File-Systems like GlusterFS and OCFS2 exist and can work, I tried myself with OCFS2, but it's important for us that the technology used is officially supported by the proxmox team, so if there should be any problems that we cannot fix, the proxmox support won't handle it on a best-effort basis.


I want to ask if anyone has experience with blockbridge? It looks interesting.
If so, how is the support and is there support in the EU/Germany? Which hardware is needed?
Would a blockbridge storage work as a sort-of direct replacement for a SAN-Shared storage that was directly attached to 2-Hyper-V Servers for example?
It'd be great if anyone could share their experience with them.


Lastly, are there any other solutions that I'm missing, that are stable and tick the boxes of being thin-provisioned and have snapshot supported while being a shared storage?

-note: It's not that the customer must keep all their old hardware or stay on a SAN storage necessarily (that would be great though), if e.g. a blockbridge storage can be used as a viable alternative to a e.g. IBM FlashSystem 5015 or something like it, then buying then new storage would be fine. We just need a working solution that ticks all the boxes and doesnt mean replacing everything or being too expensive.

Thank you in advance, any guidance would be greatly appreciated!

I’m exploring a similar setup using DRBD (simple configuration, not LINSTOR).
I’ve developed a set of scripts to automate storage management in small Proxmox clusters:
  • Provision per-VM storage with DRBD + LVM
  • Resize storage layers safely (LV → DRBD → VG → Proxmox storage)
  • one-VM-per-storage policy for simplicity and reliability
The scripts are primarily intended for lab or small-scale HA environments, where a lightweight, script-driven workflow is preferable over full storage management solutions.

Check them out here:

https://codeberg.org/kmentzelos/ProxMox-Mini-Cluster
 
You can live comfortably with LVM thick over block. It performs well and while does not offer all the desired features, its "good enough."
@alexskysilk

You're right to call that out. "Very small environments" was a poor choice of words on my part. The distinction I was trying to make isn't about node count, it's about business value. We have customers running three node clusters with tens of millions of dollars of business value on them. Also, with hundreds of cores and VMs on a single node these days, a 3 node cluster can represent a relatively large production footprint. What I should have said is: we're honestly not the best fit for environments where someone is looking for minimum infrastructure spend, regardless of how many nodes they have. "Small" was the wrong word. "Non critical" or "low impact" would have been closer to what I meant.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
This implies that the operator has both the skill, experience, and the wherewithal (not having a dozen other responsibilities) to understand the docs and apply them. Ceph is not attractive at the low end precisely because it requires engineer level admin, which is neither common or cost effective. Its easy to forget that this forum contains people who already have the commitment and time; in a small business those things are costly.

In my experience, ceph begins to make sense at the 6+ nodecount, which puts it outside the realms of small business who are not data center operators; servers require "care and feeding." Yes, I can already hear the "but I run a 2 node cluster and its fine" but keep in mind what you are actually providing- homelab isnt a "for-realz" enterprise where poor performance, downtime=lost revenue, and reputational damage.



You can live comfortably with LVM thick over block. It performs well and while does not offer all the desired features, its "good enough."
Usually i don't agree with that,because i've had customers on 3-node CEPH/Proxmox for more than 3-4 years without a hiccup (not counting power errors,etc). And nowadays, with proxmox it is everything batteries included, only thing admin needs to know about it a little bit of linux,bit of networking and a bit of administration. Usually, I recommend my customers with going through CCNA route to learn a bit about systems and networks around them.

I could talk a lot about this, but i hate typing :D
 
  • Like
Reactions: Johannes S
Thanks again everyone for your detailed answers.
So far Blockbridge still sounds very interesting and when the time comes I or someone from my team might reach out to the Blockbridge Team for more information.

For some of our customers who dont rely on the speed of a SAN and that already have high tier Synology NAS Systems I might need to look into NFS as a compromise solution.

I will also look at a replication cluster for smaller infrastructures.

I have also done a bit more research and have found Starwind. I already heard of them and used their V2V tool.
They have a VSAN Solution that they support but they also have a Plugin which lets you use an existing SAN with thin-provisioning and snapshot support. I think this plugin is also used in combination with their VSAN Solution to get thin+snapshot support.
So far, if there is support that is in a good price range, this sounds like the best way to still use your existing san, or even use new ones. (but again, haven't tested it yet)

The SAN Integration Plugin for Proxmox is free and I will definitively try it out as soon as I have the time to do so, but I don't know if they offer paid support for it or if that is only self supported.
Does anyone have experience with this solution? (I might make another post asking about this as this one is more about peoples experiences with Blockbridge and other solutions.

Also thank you for suggesting the ProxmoxMini Cluster scripts, I'm sure they work great, but sadly arent fitting for our situation as we need either solutions directly and fully supported by the Proxmox Team, or a 3rd Party with a good support. If there are Problems with the storage it could potentially halt the entire operation of a customer and in case we can't fix the problem (or if good 24/7) is already paid for, we need the developers fast support to fall back on.
The community is great and does great things especially for homelabs where people can be more experimental, but for us no official and fast support is still a no go. (Maybe we're just old fashioned in this regard, I don't know)
 
  • Like
Reactions: Johannes S