Proxmox VE 2-Node Cluster and Ceph Lab Environment

Daniel-San

New Member
Mar 16, 2023
4
0
1
Hello guys out there,

I am new to Proxmox - used to work with a VMWare 2-Node Cluster and VSAN with Core-i7 12th generation, 64 GB RAM, 2,5 Gbit/s NIC for VM traffic, 1x NVME and 2x SSD on each host over 2x10 Gbit/s direct attached HCI network as my lab environment at home.
I know the Proxmox documentation and the recommendations/best practices about Ceph - they are very similar to VMWare VSAN in production environments.
Nevertheless you can run a VMWare 2-node cluster with VSAN and a witness very smoothly for a lab environment.
Has anyone made experience with Proxmox and Ceph and having fun running on a similar a setup?
I am trying to figure out a new hypervisor setup for my home lab environment which is a lot easier to maintain than the VMWare hypervisor lab environment.
The Proxmox and Ceph combination sounds good - if you can run it on 2-nodes and 6 OSD‘s and 2x 10 Gbit/s direct attached dedicated network despite recommendations and best practices.

Best regards,
Daniel
 
Ceph is not suitable for a two node cluster.

Have a look at DRBD for that setup. Linbit provides a plugin for Proxmox.
Hello, thank you for your hint.
I had a look - with Linbit and DRDB 9 you need at least a diskless DRDB client for a two node cluster that you want to use in primary/primary HA mode with live migration capabilities (in comparison to the VMWare features where a two node cluster is always active/active).
Unlike in DRDB 8 where you had the chance for tweaking the pacemaker and fencing configuration/policies these policies for avoiding split brain are only activated in DRDB 9 with that diskless DRBD client.

If I want to give Proxmox a try in my lab without a third node, I think I have to test the cluster HA with ZFS and replication. Live migration would not be available of course, too.
Setting up a small qDevice and keeping the two node cluster direct attached with 10 Gbit/s is a lot easier for me without the need for investing in a 10 Gbit/s switch architecture for a third node (whether as diskless DRBD client or real third node for CEPH) in my home lab.

Best regards,
Daniel
 
Last edited:
If I want to give Proxmox a try in my lab without a third node, I think I have to test the cluster HA with ZFS and replication. Live migration would not be available of course, too.
Setting up a small qDevice and keeping the two node cluster direct attached with 10 Gbit/s is a lot easier for me without the need for investing in a 10 Gbit/s switch architecture for a third node (whether as diskless DRBD client or real third node for CEPH) in my home lab.
Hello Daniel,

you can build a pseudo 2 node cluster for testing. But you need at least a small third node as quorum. The direct connection of the Ceph network does not work, because the third node also needs a connection to the Ceph network.
I have such a setup at home for testing, Ceph with 40GBit, but the third node has the storage network only as VLAN over GBit.

You can easily use ZFS replication with the direct connection, I like to do 1 minute scheduling.
Live migration also works without problems, during live migration the replication delta is simply transferred and the replication direction rotates automatically.
This is something vSphere can not do, so it is often misunderstood.
 
If we are talking about a lab environment where you don't care about downtime, then you could run a Ceph cluster, by setting the "size" parameter of your pools to 2. But you won't be having a good time testing out all the self healing / recovering features of Ceph, as for that, you need larger clusters. But those can be set up as nested VMs to get a better understanding of how it works.

In a small 2 node setup, you could also use ZFS pools with the same name on each node and then make use of the replication feature. This can be combined with HA. The only downside is, that you might have some data loss if a node dies, back to the last successful replication of the disk images. The shortest possible interval currently is each minute.

As @gurubert already mentioned, you will need at least a small device to set up the QDevice mechanism in order to have more than 50% of the votes, should one of the 2 nodes be down. This could also be installed on an already existing machine you might have. Packages for the external part should be available on pretty much every Linux distribution.


Ah man, @Falk R. was a tad faster, but since I have written it out already, I'll stil post this :)
 
Hello guys, several weeks passed and after poring over lots of blogs and documentations I have configured following solution with having some weeks of experience by now:
Inspired by following guy: https://sudonull.com/post/13691-Secure-storage-with-DRBD9-and-Proxmox-Part-1-NFS, I have created an active passive DRBD9 volume after all with some adopted differences for my 2-node cluster regarding the blog recommendations:
  • my DRBD volume is running on top of a LVM on each node - for the LVM some SSD‘s have been bundled on each node
  • adapting the LVM chunk size had to be made because of using 4K alignment SSD‘s
  • I have activated the LVM snapshotting script for the DRBD volume replication across the nodes like in the Linbit documentation recommended
  • my LXC container is running on CentOS 9 providing NFS service on the DRBD shared volume for the 2 nodes
  • I have setup a QDevice on my RPI for quorum purposes on my 2 node cluster
  • I have disabled the HA manager setup from the blog for my NFS service LXC container after some cold bootings
Basically the HA manager configuration is working very well - on an online cluster. But most of the time my cluster is shutdown and will be booted only when I have a testing use case.
In that scenario the HA manager configuration is a real pain in the ass. The HA manager service has priority over the DRBD sync. Therefore it could happen that the NFS LXC container is roaming constantly between the nodes after a complete fresh boot of the cluster but never powered on and pinned to one node.
Manually checking the DRBD primary and secondary role on the two nodes after powering on the cluster, maybe switching the roles manually and afterwards powering on the NFS LXC container on the DRBD primary node is only an additional time of 2 minutes max before starting the rest of the VM‘s which have been placed on the NFS share.

Best,
Daniel
 
Why not use the DRBD resources directly as storage within Proxmox?

https://linbit.com/drbd-user-guide/linstor-guide-1_0-en/#ch-proxmox-linstor
Hi, I had a look at it, too.
The reason I am going an indirect way over the NFS service LXC container with primary-secondary DRBD is that I need snapshotting capabilities for my VM’s and the complexity is a little bit less.
With using NFS shared storage over the LXC container I can use qcow2 disk format on both nodes for VM‘s and snapshotting them.
At the moment the Linbit Proxmox Plug-In can pass-through DRBD resources directly to the Proxmox cluster nodes supporting only RAW-Disk format for VM‘s.
In my Home Lab I will test things - and I will misconfigure things. Reverting back to the prior snapshot is a life saver for gaining experience and not starting things from scratch.

Best,
Daniel
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!