I'm in the process of spinning up a cluster to prove to a program office that moving from a nearly bare metal deployment to nearly virtual is the way forward. Internally our pm is onboard; I just need to prove it before full funding becomes available.
Currently I have 3 nodes pulled from basically the trash; two have the same cpu/ram/ssd/gpu config and the 3rd is very different but more powerful. I will be using nvidia vgpu as many of the physical systems require cuda. Each node does have 6x 10gb nics but run at 1gb standalone, some are in a lacp config for connection to the switch stack
My understanding is, if I wanted to use ceph, to prove failover is possible, the nodes need to basically be the same, same cpus, ram layout, gpus, storage; right? Is there anyway to ignore the need to have identical systems with ceph installed, just to prove a point that failover is possible?
Would ceph even be needed if there's an interest in using redundant nas's for the vm storage? We're moving everything to fiber and pm is onboard with dual port 100g fiber cards for each system and I'm recommending each nas be flash based. Is ceph pointless if the fiber/flash nas config is used; will vm failover still be possible? Or is the nas method a bad idea? If the nas method is fine, I'd guess I'd only need mirrored boot drives on each node?
As for clustering, are there any pitfalls for using nodes with different hardware?
And for the networking of the cluster; at the moment, since the network backbone in my lab is only 1gbe, on each node, I have two of the 10gb nics directly connecting the nodes together; n1eth0 -> n2eth1, n2eth0 -> n3eth1, n3eth0 -> n1eth1. Nothing is configured yet, just pve installed on each node. Is this fine? Ideally I'd have a switch between them but since we're jumping to fiber, buying a bunch of 10gb copper sfp connectors, is a waste so direct connections are the only way for now.
Sorry if this is all over the place; I spewed this out in the order my brain let it out.
Currently I have 3 nodes pulled from basically the trash; two have the same cpu/ram/ssd/gpu config and the 3rd is very different but more powerful. I will be using nvidia vgpu as many of the physical systems require cuda. Each node does have 6x 10gb nics but run at 1gb standalone, some are in a lacp config for connection to the switch stack
My understanding is, if I wanted to use ceph, to prove failover is possible, the nodes need to basically be the same, same cpus, ram layout, gpus, storage; right? Is there anyway to ignore the need to have identical systems with ceph installed, just to prove a point that failover is possible?
Would ceph even be needed if there's an interest in using redundant nas's for the vm storage? We're moving everything to fiber and pm is onboard with dual port 100g fiber cards for each system and I'm recommending each nas be flash based. Is ceph pointless if the fiber/flash nas config is used; will vm failover still be possible? Or is the nas method a bad idea? If the nas method is fine, I'd guess I'd only need mirrored boot drives on each node?
As for clustering, are there any pitfalls for using nodes with different hardware?
And for the networking of the cluster; at the moment, since the network backbone in my lab is only 1gbe, on each node, I have two of the 10gb nics directly connecting the nodes together; n1eth0 -> n2eth1, n2eth0 -> n3eth1, n3eth0 -> n1eth1. Nothing is configured yet, just pve installed on each node. Is this fine? Ideally I'd have a switch between them but since we're jumping to fiber, buying a bunch of 10gb copper sfp connectors, is a waste so direct connections are the only way for now.
Sorry if this is all over the place; I spewed this out in the order my brain let it out.