Small Homelab Cluster Ceph help

cyrus104

Active Member
Feb 10, 2020
57
1
28
39
Good Day,

I have a homelab that running right now and changing up the configuration to be a 3 node cluster and want to run Ceph. I am space limited inside each of my identical nodes and I am looking for the best Ceph configuration.

The 3 nodes are identical:

Xeon D2146
64GB Ram
1x 512GB msata - Proxmox install
1x 1TB Samsung 970 Pro M.2 NVME - DB/WAL? not sure how to ensure that both are on here.
1x 6.4TB Intel DC P4600 U.2 NVME - Ceph storage

Each of the units has a very lightly loaded 10GB interface with a 3 to spare, I'll look at adding a storage switch.
I'm not sure if I should use the Samsung as DB/WAL as it's slightly faster than the Intel NVME drive but not by much.

Thanks for the help with the configuration.
 
1x 6.4TB Intel DC P4600 U.2 NVME - Ceph storage
Is this model correct? Since, they outperform the Samsung 970 pro. Putting the WAL/DB on separate devices will not benefit the performance.

1x 512GB msata - Proxmox install
The MON DB will be placed on the OS disks as well. If it isn't fast enough it will slow down the Ceph cluster.

Each of the units has a very lightly loaded 10GB interface with a 3 to spare, I'll look at adding a storage switch.
Use a full mesh, its faster and less SPoF.
https://pve.proxmox.com/wiki/Full_Mesh_Network_for_Ceph_Server

You can also find our Ceph benchmark paper, its forum thread and our Ceph docs below.
https://pve.proxmox.com/pve-docs/chapter-pveceph.html
https://www.proxmox.com/en/downloads/item/proxmox-ve-ceph-benchmark
https://forum.proxmox.com/threads/proxmox-ve-ceph-benchmark-2018-02.41761/
 
Is this model correct? Since, they outperform the Samsung 970 pro. Putting the WAL/DB on separate devices will not benefit the performance.
This is the correct model and yes you are correct this drive has higher IOPS performance and much better endurance, the sequential speed is a little slow but for this purpose that's less important.

The MON DB will be placed on the OS disks as well. If it isn't fast enough it will slow down the Ceph cluster.
Is there a way to force this onto a faster drive, I normally like to keep my OS on a separate drive than my data.

I will attempt the configuration for full mesh, it would be nice to have an option to make some of this automated. In theory a wizard option should be to terribly hard to make, specifying interfaces to nodes and a free subnet to use.

Over based on the system specs that I've laid out what are your thoughts on how to best configure these nodes. I have a NAS for backups and iso storage, Ceph will most likely only be for VMs, containers, and hopefully a Docker install either by VM or running on the host OS. I would like to move to LXC but all of my "day" job has forced it and so I have to make sure I keep it in my homelab for dev.
 
Is there a way to force this onto a faster drive, I normally like to keep my OS on a separate drive than my data.
You can mount the /var/lib/ceph/ to a different device or configure a different DB location.

Over based on the system specs that I've laid out what are your thoughts on how to best configure these nodes. I have a NAS for backups and iso storage, Ceph will most likely only be for VMs, containers, and hopefully a Docker install either by VM or running on the host OS. I would like to move to LXC but all of my "day" job has forced it and so I have to make sure I keep it in my homelab for dev.
There are not many choices. Most important keep the corosync traffic physically separated and best use two links.
https://pve.proxmox.com/pve-docs/chapter-pvecm.html
 
You can mount the /var/lib/ceph/ to a different device or configure a different DB location.
So it looks like my Samsung 970 Pro is not support due to a hardware RAID controller... I'm unaware that one is in use here. Nvme0n1 is my Intel DC P4600 drive, would the follow configuration do what you are recommending in your advice?
ceph.JPG

There are not many choices. Most important keep the corosync traffic physically separated and best use two links.
https://pve.proxmox.com/pve-docs/chapter-pvecm.html
I have an unused 6x 10GB port switch, would this be ok to use as the link between the unit? It would only have the 3 clustered nodes connected to it. I could create a full mesh but I may be adding another node or two... maybe but not yet.
 
So it looks like my Samsung 970 Pro is not support due to a hardware RAID controller... I'm unaware that one is in use here. Nvme0n1 is my Intel DC P4600 drive, would the follow configuration do what you are recommending in your advice?
Disks need an empty GPT partition. And the note is a general one, see the reference documentation.

I have an unused 6x 10GB port switch, would this be ok to use as the link between the unit? It would only have the 3 clustered nodes connected to it. I could create a full mesh but I may be adding another node or two... maybe but not yet.
Besides SPoF, it should be ok.
 
Disks need an empty GPT partition. And the note is a general one, see the reference documentation.
Thanks I'll look it up. The configuration in the picture should work as it is completely blank. For right now it looks like the Intel drive is being recognized and should be good to use. Will putting the Data/DB/WAL on the large / fast Intel drive work "ok", I understand Ceph is more geared toward a LOT of drives not so much a small setup like this.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!