ceph 3 node cluster disaster mode?

sarsenal

Active Member
Mar 5, 2016
25
3
43
49
I am trying to find out that if it is possible to run a 3 node cluster with Ceph storage down to a single server? Yes, I know it is not ideal. I do have UPS (2 hours) and a generator. But in the event I need to trim the cluster down to a single server by migrating all VM/CT's to a single machine which in HA mode it would move it to one server. One server can run everything and ceph storage doesn't even exceed 6% of each node storage in the Ceph pool. So, one server can handle it all. This would be just until I can switch back to full power as a last resort. Hopefully I never have to do this, but good to know how in the event it is required.

Would changing the pvecm expected 1 and osd_pool_default_min_size = 1 work?

Would this be enough to allow Ceph to run?

With that being said. How do you re-sync once you bring the cluster back? I can I just use a Qdevice on a Raspberry Pi to keep the cluster up, but Ceph is the issue.
 
How many OSDs do you have in the nodes?

You could try to change the crush rule used to not use the "host" as redundancy level but "osd". If you have a few OSDs that have enough free space, you would at least have some redundancy in single host mode in case a disk fails. This way Ceph will store the 3 replicas on one node, but on separate OSDs.

But ideally you will always have at least 3 nodes, or during a failure, at least 2 :)
 
12 osd's per node. If I have to recover from backup I have a PBS server that can recover if needed. Yes I always plan to go to 2 servers, but if we have a bad winter again and it requires me to run the whole house too, I will need to cut the servers, 40G fiber switches, anything else back to a as low as I can go. I might even start to move VM/CT off-site but that will take time they are very large. I don't mind replication per node. It would not be for long. I just want to be able to bring the other nodes back online ASAP and maybe start copying the DATA off-site. The mail servers I have in a cluster mode. If I lose the local VM it doesn't hurt anything. I just reload and rsync from current off-site image. Just trying to have a plan for worst case.

So would the changes I listed above work? If I just wanted to allow Ceph to run on one node. What is the correct process to bring the 2nd or even 3rd node back on-line?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!