5 Node Setup/Ceph Advice

Seed

Active Member
Oct 18, 2019
98
60
38
124
Hello,

piggy backing another thread I have on here, ive done some major changes and upgrades in my home lab setup and am looking for some guidance.

What I need to do is the following with the following hardware:

1. Create a storage setup that hosts about 12TB of media, pictures, movies, all that kind of stuff that ive amassed over the years. This will continue to grow. I'll want to mount this to a VM and then have containers accessing the data that way. Or is there a better way?
2. Have a place to provision VMs that can be shared or moved to other nodes. Doesn't necessarily have to be HA but would be nice.
3. Will have a few containers running most likely within a VM that will need to mount a large volume.

I have the following nodes:

Host (1,2,3)
  • 8C/16T (2.1GHz)
  • 64GB DDR4 ECC
  • 120GB SSD (System Disk)
  • 500GB SSD (unused)
  • 4x10TB WD RED
  • NVMe
    • 1 x 512GB NVMe host 1
    • 2 x 256GB NVMe host 2,3
  • 2 x 10GBe per node for storage network

HP (1,2)
  • 4C/8T (3.5GHz)
  • 16GB RAM
  • 120GB SSD (System Disk)
  • 250GB SSD
  • 3 x 160GB mechanical enterprice SAS drives
  • 2 x 10GBe per node for storage network

Questions:

1. Should I have all of these in one proxmox cluster? My guess is yes
2. For the 3 nodes with large disks, I'm thinking I'll use CEPH across 3 nodes and 12 disks (4 OSDs per node)
  • I could also plug all these drives into a single external array and manage it that way, no ceph? Will this be better faster?
3. I would like to use the 500GB SSDs as a shared resource for VMs. Should I do this with ceph as well across the 3 hosts with their 500GB SSDs?
4. How can I leverage the NVMe drives on these hosts? Can I put a file system on there to help manage ceph performance?

Thanks!
 
Should I have all of these in one proxmox cluster? My guess is yes

Makes it easier to manage stuff, live-migrate and makes it also slightly more convenient to access the ceph cluster, if you decide to go that way, for the "non-ceph" nodes (tip: they should still install ceph, to match client versions).

2. For the 3 nodes with large disks, I'm thinking I'll use CEPH across 3 nodes and 12 disks (4 OSDs per node)
  • I could also plug all these drives into a single external array and manage it that way, no ceph? Will this be better faster?

Faster, maybe, depends on what you use there to host the disks. ZFS over iSCSI could be done.
The big advantages for ceph in your case would be:
* easier to add additional storage (ceph is made to be expanded to almost unlimited sizes, on the fly)
* fail-over, no single point of failure
* tightly integrated management of the Ceph Storage server in PVE

3. I would like to use the 500GB SSDs as a shared resource for VMs. Should I do this with ceph as well across the 3 hosts with their 500GB SSDs?

You could add those as OSDs in the normal ceph setup and seprate them from the HDDs with separate CRUSH rules, see:
https://pve.proxmox.com/pve-docs/chapter-pveceph.html#pve_ceph_device_classes
https://forum.proxmox.com/threads/ceph-ssd-and-hdd-pools.42032/#post-202132

Then you can create a pool using the HDD crush rule and one using the SSD rule.
But, as you have only one SSD per host the redundancy is naturally a bit limited, it could be useful to use a 2/2 replication/min-replication configuration for that pool, this gives you a bit more space but you have roughly the same failure capability than a 3/2, 500 GB SSDs are quite cheap nowadays, so you could add a second one to each node and use 3/2, that would give you some room if one SSD fails (as with three any failure will be quite stressful, if the data is important)

How can I leverage the NVMe drives on these hosts? Can I put a file system on there to help manage ceph performance?

You could put a journal or block.db on them, but you need to be aware that while no data will be lost, once this one then fails the whole node won't be able to operate if you do so. I mean, as long as they do not get really hot and you do not start to write maniacally on them failure probability is quite low once their are over the "infant mortallity" line in the bathtub-curve.

Else you could also add them to the Ceph pool, maybe to the fast SSD pool, so you get some better redundancy and you effectively will have a fast pool for "hot-warm data" and the HDD pool for "warm-cold data".

Hope this helps a bit.
 
  • Like
Reactions: Seed

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!