Hi,
I am running a 2-Nodes Proxmox cluster (2 x Dell R640, PVE 8.4), with VMs stored on a separate shared NFS storage. Due to Microsoft licensing for it's Server products, I am limited to 48 cores for the entire cluster.
My goal is to use Ceph (NVMe only) instead of NFS - I have used it in a lab (both Proxmox and Cephadm), I really like the idea of it. I understand I could turn this into a 3 nodes x 16 cores system, but I understand Ceph really start shining at 5+ nodes anyways, and I cannot reasonably (due to Microsoft licensing) get to 5 nodes with my constraints without giving MS more of my money.
I am left with three options, and I would like to know what the community thinks of these.
Note: the existing nodes are overprovisioned in terms of cores and memory at the moment.
Option 1: Use my current 2 node Proxmox for Ceph and add (yet-to-be understood/yet-to-be-invented) ceph-only nodes. This begs the question: Can I somehow add Proxmox nodes that aren't actually PVE's but only limited to adding Ceph storage to an existing Proxmox cluster? So they don't have "cores that could be used for VMs" to a Microsoft audit?
Option 2: Just start an independent Ceph storage (cephadm, etc.) on 5 new nodes, not using Proxmox's ceph features. This is definitely the easier solution to understand, BUT
Option 3: Add 3 nodes of the smallest NVMe/ECC devices I can find (Lenovo P320 seem to fit the bill), and use NVMe passthrough on a single VM per Proxmox node (to get to 5 Cephadm nodes), not using Proxmox ceph features. I know HA wont work with PCIe passthrough, but that's fine as the HA-part of storage would be handled by Ceph.
I am running a 2-Nodes Proxmox cluster (2 x Dell R640, PVE 8.4), with VMs stored on a separate shared NFS storage. Due to Microsoft licensing for it's Server products, I am limited to 48 cores for the entire cluster.
My goal is to use Ceph (NVMe only) instead of NFS - I have used it in a lab (both Proxmox and Cephadm), I really like the idea of it. I understand I could turn this into a 3 nodes x 16 cores system, but I understand Ceph really start shining at 5+ nodes anyways, and I cannot reasonably (due to Microsoft licensing) get to 5 nodes with my constraints without giving MS more of my money.
I am left with three options, and I would like to know what the community thinks of these.
Note: the existing nodes are overprovisioned in terms of cores and memory at the moment.
Option 1: Use my current 2 node Proxmox for Ceph and add (yet-to-be understood/yet-to-be-invented) ceph-only nodes. This begs the question: Can I somehow add Proxmox nodes that aren't actually PVE's but only limited to adding Ceph storage to an existing Proxmox cluster? So they don't have "cores that could be used for VMs" to a Microsoft audit?
Option 2: Just start an independent Ceph storage (cephadm, etc.) on 5 new nodes, not using Proxmox's ceph features. This is definitely the easier solution to understand, BUT
- it seems a waste not to use those 2 x 8 U.2 NVMe slots in my current Proxmox nodes
- My current UPS is reaching it's limit, I am not sure I can get to 5 more nodes, even of modest devices. Using the existing nodes would likely be more power efficient. Removing the NFS storage will free me about 2-3 nodes worth of power, but that does not compensate for adding 5 nodes.
- Is PCIe passthrough reliable for passing U2 drives to VMs? Any gotcha's?
- Can a Proxmox cluster use a ceph storage partly provisioned by it's own VMs? Does it make sense from a HA perspective? When the VM starts it would try using it's own storage for the OS...seems weird.
- ...I guess I could easily put the Ceph VM's OS storage on the local ZFS disk of the node, so the virtualized ceph node booting would not rely on it's own ceph storage for the OS.
- I do feel this solution becomes full of pitfalls...,But then again it seems the best way to reuse the existing node's wattage and free U.2 slots