[SOLVED] Proxmox on SATA & Ceph on NVMe - does it make sense?

vertisan

New Member
May 14, 2024
5
0
1
Hi there!
I currently have the following hardware that makes up my Proxmox cluster:
3x Lenovo m720q
- i5-8500T
- 32GB
- SSD M.2: 1TB NVMe (on which the entire Proxmox is installed, along with storage for VMs)
- SSD SATA 2.5": <empty>

Now I would like to go a step further and try to introduce Ceph. The aforementioned Lenovos do have a PCI-Express slot, but I would like to leave it free for the future NIC 2.5Gbe, so I was thinking about the following solution:
- the current NVMe will be intended for Ceph, on which all VMs will appear
- Proxmox will be installed on a 2.5" SATA SSD

The question now is, does it make sense? Won't this SATA SSD "limit" the work of the entire cluster? In general, the Proxmox cluster is a part of my homelab that does not see much traffic or usage: K3s cluster, Home Assistant, Jellyfin, Vault, Consul and a few smaller ones.
 
I run a 6-node hyper-converged PM+Ceph cluster on similar 1L mini-PCs. Let me drop a couple of specific nuggets for you to consider but feel free to ask any additional questions. Keep in mind that Ceph is distributed storage so your networking btwn nodes is just as important as your local storage. You might want to consider the possibility of using a pair of 2.5G Ethernet USB adapters (the blue USB ports should support full 2.5G throughput...) on each node. You have the option of either bonding them using LACP or OpenVSwitch to give you more throughput and some high-availability. You might also consider leaving PM on your NVMe drive and just dedicate it to running the node. I would use the SATA SSD for Ceph and your VMs since it writes a -LOT- to the drives you allocate for it and NVMe drives tend to wear out much faster than SATA SSDs. Putting your VMs on your Ceph storage also gives you the ability to live-migrate them across nodes, which you may find a nice capability to have. The SATA SSD will be slower than NVMe but the speed of SATA pairs well with 2.5G Ethernet. Good luck!
 
Last edited:
I run a 6-node hyper-converged PM+Ceph cluster on similar 1L mini-PCs. Let me drop a couple of specific nuggets for you to consider but feel free to ask any additional questions. Keep in mind that Ceph is distributed storage so your networking btwn nodes is just as important as your local storage. You might want to consider the possibility of using a pair of 2.5G Ethernet USB adapters (the blue USB ports should support full 2.5G throughput...) on each node. You have the option of either bonding them using LACP or OpenVSwitch to give you more throughput and some high-availability. You might also consider leaving PM on your NVMe drive and just dedicate it to running the node. I would use the SATA SSD for Ceph and your VMs since it writes a -LOT- to the drives you allocate for it and NVMe drives tend to wear out much faster than SATA SSDs. Putting your VMs on your Ceph storage also gives you the ability to live-migrate them across nodes, which you may find a nice capability to have. The SATA SSD will be slower than NVMe but the speed of SATA pairs well with 2.5G Ethernet. Good luck!
Great, thanks for your opinion!
I have just a question, when I'll use SATA SSDs for Ceph - do you happen to know how it looks with performance then? VMs with SATA SSDs disks in Ceph run noticeably slower than those on NVMe?
 
A standard 1Gb/s for now for the entire network.
Then (see the article above) you shouldn't use Ceph at all because the performance will be quite ugly. Instead I would go with storage replication ( https://pve.proxmox.com/wiki/Storage_Replication ) or use a NAS as shared storage (still not great performance but better than with Ceph).

If you then want to play around with ceph you could install a Proxmox cluster with enabled Ceph on one of theProxmox nodes inside your cluster just for learning things. For lxcs the performance impact shouldn't be much of a problem and for vms (since it's just a learning environment for understanding Ceph) it should still be tolerable: https://pve.proxmox.com/wiki/Nested_Virtualization
 
Last edited:
  • Like
Reactions: gurubert
Great, thanks for your opinion!
I have just a question, when I'll use SATA SSDs for Ceph - do you happen to know how it looks with performance then? VMs with SATA SSDs disks in Ceph run noticeably slower than those on NVMe?
Try to manage your expectations of what kind of performance you can expect from Ceph running on the hardware you describe. I run my VMs off enteprise-grade SATA SSDs (though the Ceph WAL is stored on an NVMe device...) and I find the performance acceptable for my needs as a hobbiest. As @gurubert indicated, your 1GB interface on your nodes will/should be used for management purposes only (PM GUI and pulling updates for your nodes+VMs...). Think about how to augment your networking to better accommodate both Proxmox and Ceph's minimum recommended requirements, since that will likely be your bottleneck given your hardware. If you look around in the PM forums for people running Ceph in "production", you might get discouraged by all the mentions of multiple 10/25/100G Ethernet interfaces and all-NVMe enterprise storage. You certainly don't need that to run Ceph but you should consider the hardware you can provide and adjust your expectations accordingly.
 
Great, thank you guys for your opinion!
To be clear, why I wanted to use Ceph: even if this is a simple homelab, I have a few services that I'm using for work in my company. These services don't have any HA solution, so I wanted to enable HA for them on the Proxmox side on the VM's level. I also have a single NAS server, so I could move storage into it, but still it is a single hardware, that's why I wanted to use my existing quorum with Proxmox using Ceph.
Maybe you can propose another solution for this case if Ceph is not too good solution with my current hardware?
 
Do I understand correctly that the services you want to host don't support any HA architecture (e.g. active/passive failover, clustering, etc.) and so you want to setup a platform where these services can run continuously by migrating their VMs btwn PM nodes either manually (i.e. PM node reboot) or automatically (i.e. PM node failure). Does that about capture it? BTW, you mentioned this is a homelab setup but could you please clarify...are these services running "on-site" or "off-site" for your company?
 
Great, thank you guys for your opinion!
To be clear, why I wanted to use Ceph: even if this is a simple homelab, I have a few services that I'm using for work in my company. These services don't have any HA solution, so I wanted to enable HA for them on the Proxmox side on the VM's level. I also have a single NAS server, so I could move storage into it, but still it is a single hardware, that's why I wanted to use my existing quorum with Proxmox using Ceph.

I sincerly hope that nothing mission-critical is running in your cluster (e.G. something whose loss or non-availability would bankrupt your company), especially if you need good performance for it. To be frankly: In any case I wouldn't use Ceph if you have such a low-level network environment.

To quote from https://pve.proxmox.com/wiki/Deploy...r#_recommendations_for_a_healthy_ceph_cluster:
If unsure, we recommend using three (physical) separate networks for high-performance setups:
  • one very high bandwidth (25+ Gbps) network for Ceph (internal) cluster traffic.
  • one high bandwidth (10+ Gpbs) network for Ceph (public) traffic between the ceph server and ceph client storage traffic. Depending on your needs this can also be used to host the virtual guest traffic and the VM live-migration traffic.
  • one medium bandwidth (1 Gbps) exclusive for the latency sensitive corosync cluster communication.

So your bandwith is definitively way under the recommended minimum.
Storage replication now is a different beast: You can have a kind of "pseudo-ha": Your VMs/LXCs got replicated to the other nodes in the cluster, and will resumed on them if their host node fails. This come with a cost though: Since the replication is asyncron, thus you will loose any data added since the last succesful replication. The default schedule is to run the replication every 15 minutes, but this can be reduced to one minute. For many (but not all) applications this is more than enough, so if your applications can live with a minimal data loss, this would be my preferred approach.

But even for that the usual recommendation for the cluster network (having a dedicated network just for corosync communication) holds true: https://pve.proxmox.com/wiki/Cluster_Manager#pvecm_cluster_network

You might also want to have another dedicated network link for the actual replication/migration network (which can also serve as redundant link for corosync in case the main cluster network fails) for sufficient performance. This together might be enough for your usecase.
 
  • Like
Reactions: gurubert