Insight on developmentally cluster; ceph, clustering, maybe nas storage for vm

ns33

New Member
Apr 4, 2024
3
0
1
I'm in the process of spinning up a cluster to prove to a program office that moving from a nearly bare metal deployment to nearly virtual is the way forward. Internally our pm is onboard; I just need to prove it before full funding becomes available.

Currently I have 3 nodes pulled from basically the trash; two have the same cpu/ram/ssd/gpu config and the 3rd is very different but more powerful. I will be using nvidia vgpu as many of the physical systems require cuda. Each node does have 6x 10gb nics but run at 1gb standalone, some are in a lacp config for connection to the switch stack

My understanding is, if I wanted to use ceph, to prove failover is possible, the nodes need to basically be the same, same cpus, ram layout, gpus, storage; right? Is there anyway to ignore the need to have identical systems with ceph installed, just to prove a point that failover is possible?

Would ceph even be needed if there's an interest in using redundant nas's for the vm storage? We're moving everything to fiber and pm is onboard with dual port 100g fiber cards for each system and I'm recommending each nas be flash based. Is ceph pointless if the fiber/flash nas config is used; will vm failover still be possible? Or is the nas method a bad idea? If the nas method is fine, I'd guess I'd only need mirrored boot drives on each node?

As for clustering, are there any pitfalls for using nodes with different hardware?

And for the networking of the cluster; at the moment, since the network backbone in my lab is only 1gbe, on each node, I have two of the 10gb nics directly connecting the nodes together; n1eth0 -> n2eth1, n2eth0 -> n3eth1, n3eth0 -> n1eth1. Nothing is configured yet, just pve installed on each node. Is this fine? Ideally I'd have a switch between them but since we're jumping to fiber, buying a bunch of 10gb copper sfp connectors, is a waste so direct connections are the only way for now.

Sorry if this is all over the place; I spewed this out in the order my brain let it out.
 
Ceph nodes could be different but you should have nearly same amount of data space (as osd number x size) available as the data would need to rebuild on remaining nodes if one node fail !
Pve nodes (if not even the same as the ceph nodes) could have different cpus but you may get vm migration errors if using vcpu type host which is sometimes needed but mostly the pve default type do which is migrateable. RAM is mostly more useful than cpu while less but more powerful cores are more useful than cpus with many cores and less freq.
Redundant nas still works but you may have limited pve features available in gui which isn't the case in ceph but if having some old nas already don't decommission it and use both. If you plan to commision a new flash nas you should do a poc with the vendor and if you don't satisfied with the useable features you will be able to give back. Otherwise with ceph you could easily not be satisfied to performance so a poc is useful here too.
Mirrored boot drives should always be if on production system.
You may encounter setup corosync problems without switch now and would be easier use even a 1Gb switch just for function.
Anyway good luck and have fun with pve :)
PS: Don't forget your back concept ...
 
Last edited: