Moving from lvm-thin to ceph/zfs, thing to consider

adrian_vg

Active Member
Mar 8, 2020
74
27
28
Sweden
Hi all,

I'm currently running a three-node cluster with proxmox 6 using lvm-storage but would like to move to ceph and zfs.
Two of the cluster nodes are rack servers and the third is a witness node with less than stellar storage capabiliites.
FWIW, this is a home lab cluster running my personal homepage, a Nextcloud instance, Docker stuff and other more or less important services for myself.

My plan is to reinstall the rack servers using two of the available disks for the system in a raid1 fashion with the server's own raid controller, and leave the other four disks as separate disks to create ODSs on.

Now, if I read the documentation correctly, I need three nodes for the ceph cluster.
But do I also need similarly sized ODSs on each node, or will it work with only two?

Looking at https://pve.proxmox.com/pve-docs/images/screenshot/gui-ceph-osd-status.png from https://pve.proxmox.com/wiki/Deploy_Hyper-Converged_Ceph_Cluster#pve_ceph_osds, it seems like it's okay to use an odd number of OSDs on each node.

Since my tertiary node (the witness node) is very small, I'm thinking a smallish OSD on this one, say 50 GB (and 4x 2 TB on each of the other rack servers) would probably work?

What are your thoughts on this please?
Is there anything I should reconsider or rethink?

Thanks in advance!
 
Ceph needs at lest 3 nodes as it does the redundancy on the node level. You might be able to get it working some way, but I do not recommend it as you will run into other issues at some point which will be much harder to debug and figure out if you run a non standard setup.

What I recommend in your situation is to use local ZFS storages in both nodes and use the VM replication. This way you can still use HA and if you run a short replication interval, data loss in a HA failover situation will be minimal.

The ZFS storages need the same name on both nodes for it to work.
 
  • Like
Reactions: adrian_vg
Ceph needs at lest 3 nodes as it does the redundancy on the node level. You might be able to get it working some way, but I do not recommend it as you will run into other issues at some point which will be much harder to debug and figure out if you run a non standard setup.

What I recommend in your situation is to use local ZFS storages in both nodes and use the VM replication. This way you can still use HA and if you run a short replication interval, data loss in a HA failover situation will be minimal.

The ZFS storages need the same name on both nodes for it to work.
That's an interesting angle. Thanks!
 
One more thing, if the third node is a full PVE install instead of a qdevice, use the HA group feature to limit the HA VMs to just the two main nodes!
 
One more thing, if the third node is a full PVE install instead of a qdevice, use the HA group feature to limit the HA VMs to just the two main nodes!
Aha, thanks!
Discovered that feature recently on our proxmox-lab--ceph-cluster at work, but didn't quite know how and when to use it.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!