Advice on 5 Node Setup

Seed · Oct 18, 2019

Hello,

Long time tinkerer and user of proxmox some 10 years ago for work. Now I'm looking at proxmox to manage some resources I have in my home lab that ive accumulated over the years. Ive primarily used OMV for my container and storage management and some other nodes that were running docker and such but now I want to try and change everything to make my nodes more of a resource pool and need some help/guidance.

Overview.

1. machine1 12 core 32GB RAM (250GB SSD) 9200-16e with 4 SFF-8088 ports
2. machine2 12 core 64GB RAM (250GB SSD) with 4 SFF-8088 ports
3. machine3 8 core 64GB RAM (250GB SSD)
4. machine4 4 core 16GB RAM (250GB SSD & 512GB NVMe) 9200-8e with 2 SFF-8088 ports
5. machine5 4 core 16GB RAM (250GB SSD) 9200-8e with 2 SFF-8088 ports
6. Disk Array1 8 bay with 8 10TB drives with 2 SFF-8088 ports. Currently under an mdadm raid 6 array that hosts container configs and a large amount of media data) **
7. Disk Array2 8 bay - Empty with 2 SFF-8088 ports.

** I have 4 10TB drives in a synology that I may bring in if necessary, so 12 disks available.

I am thinking of putting proxmox on all 5 nodes and creating a cluster. Then I would attached the 4 nodes, one to each port of the disk arrays so that each array is attached to 2 nodes claiming say 3 disks each. Each array would be independent and on separate circuits/UPS in the house.

Then I would use cephFS to cluster all 12 disks together for a pool. Then I can use that pool to attach to a VM that has docker that will run Plex that will mount the volume for the media.

Is this reasonable and can it work?
Can i grow the pool this way as well?
Can I used mismatched disk sizes as well to grow with?

Thank you for the help!

Seed

AllanM · Oct 20, 2019

Seed said:
1. machine1 12 core 32GB RAM (250GB SSD) 9200-16e with 4 SFF-8088 ports
2. machine2 12 core 64GB RAM (250GB SSD) with 4 SFF-8088 ports
3. machine3 8 core 64GB RAM (250GB SSD)
4. machine4 4 core 16GB RAM (250GB SSD & 512GB NVMe) 9200-8e with 2 SFF-8088 ports
5. machine5 4 core 16GB RAM (250GB SSD) 9200-8e with 2 SFF-8088 ports
6. Disk Array1 8 bay with 8 10TB drives with 2 SFF-8088 ports. Currently under an mdadm raid 6 array that hosts container configs and a large amount of media data) **
7. Disk Array2 8 bay - Empty with 2 SFF-8088 ports.

** I have 4 10TB drives in a synology that I may bring in if necessary, so 12 disks available.

I am thinking of putting proxmox on all 5 nodes and creating a cluster. Then I would attached the 4 nodes, one to each port of the disk arrays so that each array is attached to 2 nodes claiming say 3 disks each. Each array would be independent and on separate circuits/UPS in the house.

I think that's a reasonable approach...

Initial thoughts based on what I have read about ceph and how I would conceptualizing using this "mixed bag" of nodes for a cluster, is that I would probably move some RAM from machine 3 to machine 1, and move the 9200-16e from machine 1 to machine 3... (if each of these machines are generationally similar enough to support this sort of hardware adjustment).

The end configuration would look as follows:

1. machine1 12 core 64GB RAM (250GB SSD)
2. machine2 12 core 64GB RAM (250GB SSD) with 4 SFF-8088 ports
3. machine3 8 core 32GB RAM (250GB SSD) 9200-16e with 4 SFF-8088 ports
4. machine4 4 core 16GB RAM (250GB SSD & 512GB NVMe) 9200-8e with 2 SFF-8088 ports
5. machine5 4 core 16GB RAM (250GB SSD) 9200-8e with 2 SFF-8088 ports

The idea here is that you build your ceph cluster across machines 2-5, install ceph on machine 1 purely to be able to access the ceph hosted file systems.

Use Machine 1 for the most demanding VM workloads.
Use machine 2 for light VM workloads and ceph manager/monitor workloads.
Use machine 3 for very light VM workloads and ceph manager/monitor workloads.
Use machine 4 and 5 for ceph only. (don't host VM's here). Ceph manager/monitor in standby only here to provide HA to the cluster.

Observe hardware utilization and adjust / add workloads as is reasonable.

Consider adding an additional SSD to each machine to use as DB/WAL for the HDD OSD's.

Then I would use cephFS to cluster all 12 disks together for a pool. Then I can use that pool to attach to a VM that has docker that will run Plex that will mount the volume for the media.

CephFS is a file system designed for directly handling files from proxmox (or whatever is "hosting" ceph, could be a RH or other linux distro). It's not very efficient as a file system to run disk images from. For that you'll want to spin up an RDB "pool." In many deployments, no CephFS "pool" even needs to be configured.

Is this reasonable and can it work?

For homelab purposes it should work fine.

Can i grow the pool this way as well?

You can add more disks and disk arrays and configure them as OSD's, but keep in mind that each OSD comes with additional compute and memory requirements. Those "lighter duty" machines probably shouldn't be used to host a large number of OSD's. (4 disk/OSD per node is probably a good healthy place to be for lower end hardware). If your 8 and 12 core machines are hosting VM's and handling ceph monitor/manager functions, they also probably shouldn't be used to host more than ~4 OSD's each, though that is depending on the age/generation of the servers in question here, and how fast the CPU's are "clocked."

Consider picking up another dual-port DAS disk array, and another node, and installing a SAS card in machine 1 and the new node, so that you have a total of 3 X 8 bay arrays, split up into 4 X direct attach drives per node X 6 = 24 drives/OSD's total. This gives a nice "direction" for expansion that keeps the load distributed across the cluster nicely.

Can I used mismatched disk sizes as well to grow with?

Mismatched disk sizes will result in less efficient scaling, but can be done.

Using crush rules and custom disk classes would allow you to "separate" disks of different sizes into separate pools, however, I would do some more research on what the ramifications of only having "2 disks" per pool per node is.. My assumption is that a 3 node cluster with 4 OSD's per node all in a single pool, and a 6 node cluster with 4 OSD's per node split into 2 pools, should each work fine since we have a nice distribution of a minimum 12 disks per pool throughout the cluster, but I may be wrong on that.

Since you have a large investment of 10TB drives, best option for simple scaling is to keep adding more 10TB drives.

-------------

Have you considered a separate SSD "pool" to run VM's from? I would use the 10TB spinners for data storage, and try to run all software from SSD's. Do your servers have "internal" room (2.5" bays?) for SSD's? The beauty of software defined storage and a "quorum" based decision making process in a cluster, is that enterprise "class" hardware is largely optional. Commodity grade stuff can work and is a reasonable compromise for non-commercial deployments. You could probably set up a 12 X 240GB SSD (4 per node, across 3 nodes or 2 per node across 6 nodes) for under $400 using commodity drives. This would give you a "1TB" SSD pool (3X replicated) to run software from.

Seed · Oct 22, 2019

AllanM said:
Have you considered a separate SSD "pool" to run VM's from? I would use the 10TB spinners for data storage, and try to run all software from SSD's. Do your servers have "internal" room (2.5" bays?) for SSD's? The beauty of software defined storage and a "quorum" based decision making process in a cluster, is that enterprise "class" hardware is largely optional. Commodity grade stuff can work and is a reasonable compromise for non-commercial deployments. You could probably set up a 12 X 240GB SSD (4 per node, across 3 nodes or 2 per node across 6 nodes) for under $400 using commodity drives. This would give you a "1TB" SSD pool (3X replicated) to run software from.

Thank you for the input.

The primary reason I'm using the arrays is for media storage, not VMs to run on. The VMs will run on local SSDs. I am experimenting with that as well. I dont have an option to add a lot of SSDs on some of these machines. I am going to experiment with partitioning the local disks with debian then put proxmox on it and see if I can utilize the other partitions as some sort of VM pool. This is sorta secondary to the real need which is to have a large array of disks to host all my years of media, which is about 12TB currently.

Today I'll finish getting all the hosts in place. I had to retire the 8c machine as it was unstable, and the AMD was too $$$$ to finish, so will have a 4node setup and 3 sets of 10TB disks attached to em via the 2 arrays.

Search

Search

Advice on 5 Node Setup

Seed

Renowned Member

AllanM

Renowned Member

Seed

Renowned Member

We value your privacy