Hi all,
I’m running a 3-node Proxmox homelab cluster with Ceph for VM storage. Each node has two 800GB Intel enterprise SSDs for OSD data, and a single 512GB consumer NVMe drive used for the DB/WAL for both OSDs on that node. I'm benchmarking the cluster and seeing low IOPS and high latency, especially under 4K random workloads. I suspect the consumer NVMe is the bottleneck and would like to replace it with an enterprise NVMe (likely something with higher sustained write and DWPD).
Before I go ahead, I want to:
Thanks in advance!
I’m running a 3-node Proxmox homelab cluster with Ceph for VM storage. Each node has two 800GB Intel enterprise SSDs for OSD data, and a single 512GB consumer NVMe drive used for the DB/WAL for both OSDs on that node. I'm benchmarking the cluster and seeing low IOPS and high latency, especially under 4K random workloads. I suspect the consumer NVMe is the bottleneck and would like to replace it with an enterprise NVMe (likely something with higher sustained write and DWPD).
Before I go ahead, I want to:
- Get community input on whether this could significantly improve performance.
- Confirm the best way to replace the DB/WAL NVMe without breaking the cluster.
- One node at a time: stop OSDs using the DB/WAL device, zap them, shut down, replace NVMe, recreate OSDs with the new DB/WAL target.
- Monitor rebalance between each step.
Thanks in advance!
Last edited: