Moving from lvm-thin to ceph/zfs, thing to consider

adrian_vg · Nov 29, 2021

Hi all,

I'm currently running a three-node cluster with proxmox 6 using lvm-storage but would like to move to ceph and zfs.
Two of the cluster nodes are rack servers and the third is a witness node with less than stellar storage capabiliites.
FWIW, this is a home lab cluster running my personal homepage, a Nextcloud instance, Docker stuff and other more or less important services for myself.

My plan is to reinstall the rack servers using two of the available disks for the system in a raid1 fashion with the server's own raid controller, and leave the other four disks as separate disks to create ODSs on.

Now, if I read the documentation correctly, I need three nodes for the ceph cluster.
But do I also need similarly sized ODSs on each node, or will it work with only two?

Looking at https://pve.proxmox.com/pve-docs/images/screenshot/gui-ceph-osd-status.png from https://pve.proxmox.com/wiki/Deploy_Hyper-Converged_Ceph_Cluster#pve_ceph_osds, it seems like it's okay to use an odd number of OSDs on each node.

Since my tertiary node (the witness node) is very small, I'm thinking a smallish OSD on this one, say 50 GB (and 4x 2 TB on each of the other rack servers) would probably work?

What are your thoughts on this please?
Is there anything I should reconsider or rethink?

Thanks in advance!

aaron · Nov 29, 2021

Ceph needs at lest 3 nodes as it does the redundancy on the node level. You might be able to get it working some way, but I do not recommend it as you will run into other issues at some point which will be much harder to debug and figure out if you run a non standard setup.

What I recommend in your situation is to use local ZFS storages in both nodes and use the VM replication. This way you can still use HA and if you run a short replication interval, data loss in a HA failover situation will be minimal.

The ZFS storages need the same name on both nodes for it to work.

adrian_vg · Nov 29, 2021

aaron said:
Ceph needs at lest 3 nodes as it does the redundancy on the node level. You might be able to get it working some way, but I do not recommend it as you will run into other issues at some point which will be much harder to debug and figure out if you run a non standard setup.

What I recommend in your situation is to use local ZFS storages in both nodes and use the VM replication. This way you can still use HA and if you run a short replication interval, data loss in a HA failover situation will be minimal.

The ZFS storages need the same name on both nodes for it to work.

That's an interesting angle. Thanks!

aaron · Nov 29, 2021

One more thing, if the third node is a full PVE install instead of a qdevice, use the HA group feature to limit the HA VMs to just the two main nodes!

adrian_vg · Nov 29, 2021

aaron said:
One more thing, if the third node is a full PVE install instead of a qdevice, use the HA group feature to limit the HA VMs to just the two main nodes!

Aha, thanks!
Discovered that feature recently on our proxmox-lab--ceph-cluster at work, but didn't quite know how and when to use it.

Search

Search

Moving from lvm-thin to ceph/zfs, thing to consider

adrian_vg

Well-Known Member

aaron

Proxmox Staff Member

adrian_vg

Well-Known Member

aaron

Proxmox Staff Member

adrian_vg

Well-Known Member

We value your privacy