I get it. I get it. 3 Nodes for Ceph. But what about ZFS shared?

iliadz

New Member
Aug 26, 2024
28
4
3
I have a pretty basic need, for a small business I own. Basically one VM that I need to run, and in essence I want HA, in the sense that if the node fails, it's taken over by another proxmox server. I'd prefer it to used shared storage ZFS, and I have this running in lab right now. I've stayed away from CEPH mainly because of the infrsatructure requirements. My actual stored data is around 100GB, and it won't grow.

I'd prefer not to do replication. With that requirement, what are opinions on a 2 node cluster using ZFS shared and qdevice. Or a third "server" using a Nuc, and for the most part limiting the HA to the two "beefy" servers. Not really beefy, but not a nuc either.

Bonus points, I don't have an issue with CEPH in the sense of configuration and so on. Was quite easy to get that up and running. What seems to be a true mystery or at least, debate here, is what you "truly" need. If it's single VM that I am running, and it's not honeslty doing too much (restaurant POS system, 20 transactions per hour maybe) would I be ok with 1GB network cards? I can have a dedicated NIC per server for CEPH, and another for network, I can carve out a VLAN for both, or even have a dedicated switch. But upgrading the systems to support say 10GB and 10GB switches etc. I really can't spend the money right now.
 
I'd prefer not to do replication. With that requirement, what are opinions on a 2 node cluster using ZFS shared and qdevice. Or a third "server" using a Nuc, and for the most part limiting the HA to the two "beefy" servers. Not really beefy, but not a nuc either.
A two-node cluster with a QDevice (all on the same local network) is fine.

What do you mean with "ZFS shared"? You say you don't want ZFS replication so it cannot be provided by the Proxmox nodes. If you run the ZFS on some other machine that is reachable by all nodes, that's fine (if the network is fast enough) but then you would use iSCSI or NFS (or SMB) to access it. Or maybe I'm missing something here?
 
I have a pretty basic need, for a small business I own. Basically one VM that I need to run, and in essence I want HA, in the sense that if the node fails, it's taken over by another proxmox server. I'd prefer it to used shared storage ZFS, and I have this running in lab right now. I've stayed away from CEPH mainly because of the infrsatructure requirements. My actual stored data is around 100GB, and it won't grow.

I'd prefer not to do replication. With that requirement, what are opinions on a 2 node cluster using ZFS shared and qdevice. Or a third "server" using a Nuc, and for the most part limiting the HA to the two "beefy" servers. Not really beefy, but not a nuc either.

Bonus points, I don't have an issue with CEPH in the sense of configuration and so on. Was quite easy to get that up and running. What seems to be a true mystery or at least, debate here, is what you "truly" need. If it's single VM that I am running, and it's not honeslty doing too much (restaurant POS system, 20 transactions per hour maybe) would I be ok with 1GB network cards? I can have a dedicated NIC per server for CEPH, and another for network, I can carve out a VLAN for both, or even have a dedicated switch. But upgrading the systems to support say 10GB and 10GB switches etc. I really can't spend the money right now.
I would say go with CEPH, it works very well. with 2 nodes and 1 shared zfs storage - the storage server is still the single point of failure.

with 3 nodes go with CEPH and have 3 way replication. 1G network is fine for your use case since there is not much data being written.
 
A two-node cluster with a QDevice (all on the same local network) is fine.

What do you mean with "ZFS shared"? You say you don't want ZFS replication so it cannot be provided by the Proxmox nodes. If you run the ZFS on some other machine that is reachable by all nodes, that's fine (if the network is fast enough) but then you would use iSCSI or NFS (or SMB) to access it. Or maybe I'm missing something here?
I'm probably using the wrong terminology. I have a truenas server set up, using ZFS over iSCSI.
 
  • Like
Reactions: leesteken
I'm probably using the wrong terminology. I have a truenas server set up, using ZFS over iSCSI.
Then your Truenas is your single point of failure.
I recommend that you use ZFS with replication. You can reduce the replication to 1 minute and thus you would have a maximum of 1 minute of data loss in the event of a hardware defect.

In this setup you can also migrate your VM live and HA will of course also work completely.

But please don't forget the qdevice, which can also be a RasPi or similar device.
 
I would say go with CEPH, it works very well. with 2 nodes and 1 shared zfs storage - the storage server is still the single point of failure.

with 3 nodes go with CEPH and have 3 way replication. 1G network is fine for your use case since there is not much data being written.
The storage server being the single point of failure was/is my biggest point of concern, but it is a risk I was willing to accept considering all the hardware requirements aruond CEPH. There is a thread here where someone had basically asked the same question in regards to CEPH and a small 3 node system.

https://forum.proxmox.com/threads/should-i-not.138481/

And
Then your Truenas is your single point of failure.
I recommend that you use ZFS with replication. You can reduce the replication to 1 minute and thus you would have a maximum of 1 minute of data loss in the event of a hardware defect.

In this setup you can also migrate your VM live and HA will of course also work completely.

But please don't forget the qdevice, which can also be a RasPi or similar device.

I really should do a rasberry pie, simply because, well, I've never played with a rasberry pie. Total random mention but here in Guatemala they have rasberry shakes. And yes, you heard that right. Aptly named because they use the rasberry pies to monitor earthquake activity.

I totally get it on the single point of failure, I just have to do the math on which one is more acceptable, as any business has to. The system is tied to a RFID card system, that if it gets out of sync with our POS system, it will cause a mess. So in the event that the shared storage does die, the consideration then is rebuilding that (fairly easy with a server waiting, again, small amount of data to store), or trying to get the cards synced back with the POS system, which although can work, it will likely be a bit more difficult. Right now my single point of failure is everything, so anything will be an improvement.

I'm right in the middle to be honest on which route to go, as I get more advice. If CEPH isn't that heavy considering my setup, I'd likely just opt for that. Three nodes, 1Gb card for CEPH, 1Gb card for "other", and running the one VM. As well as a good backup strategy. If you don't mind Falk, I've read quite a few of your posts, and your far more knowledgeable in this realm than I will be for the next oh, 10 years or so.... any opinion on this route? The majority of threads here jump into the more enterprise class (and rightfully so) scenarios, where this really is small business scenario.
 
Hi, a Ceph 3 node system with small computers and the low requirements you have will also run, but if something unexpected happens, you will quickly be overwhelmed.
I have equipped some customers with up to 60VMs with ZFS Replika. The setup is easy to understand and maintain. It also runs very robustly. The real extreme case of a hardware dying completely is extremely rare and very few will experience it. In this case, you will be happy if you have lost a maximum of 1 minute of data. A backup always involves more loss.

If you have two robust computers with redundant disks, nothing will probably ever happen and with the ZFS replica you have the possibility to migrate your VM live at any time in the shortest possible time and also to replace a component of a computer.

I am a fan of Ceph, but with only one disk per system and a slow network you can run into other problems that you get slow IOPs and in the worst case Ceph sets your data readonly.
That's why I always recommend Ceph only with enterprise hardware and a fast network.
 
  • Like
Reactions: UdoB
Hi, a Ceph 3 node system with small computers and the low requirements you have will also run, but if something unexpected happens, you will quickly be overwhelmed.
I have equipped some customers with up to 60VMs with ZFS Replika. The setup is easy to understand and maintain. It also runs very robustly. The real extreme case of a hardware dying completely is extremely rare and very few will experience it. In this case, you will be happy if you have lost a maximum of 1 minute of data. A backup always involves more loss.

If you have two robust computers with redundant disks, nothing will probably ever happen and with the ZFS replica you have the possibility to migrate your VM live at any time in the shortest possible time and also to replace a component of a computer.

I am a fan of Ceph, but with only one disk per system and a slow network you can run into other problems that you get slow IOPs and in the worst case Ceph sets your data readonly.
That's why I always recommend Ceph only with enterprise hardware and a fast network.
Ok, seems like words of wisdom :) I will look into it, and thanks.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!