Ceph DB/WAL on SSD

naltalef

New Member
Oct 6, 2025
9
2
3
Hello.
I'm planning to install PVE on servers that have one SSD and six disks. The idea is to configure Ceph OSDs in each disk and dedicate the SSD to the DB/WAL.
When doing this configuration from the GUI, it doesn't seem to allow you to put both the DB and WAL on the SSD, but only one.
However, using the CLI, I understand that it's possible (I haven't tested it). Is this correct?
If you can't actually put both the DB and WAL on the SSD, which would be preferable?
Thank you very much.
Best regards.
Norberto
 
  • Like
Reactions: bbx1_
see https://pve.proxmox.com/wiki/Deploy_Hyper-Converged_Ceph_Cluster#pve_ceph_osds
"The WAL is placed with the DB, if not specified separately"

IIRC the default space usage is 10% (?) of the OSD size but you can adjust.
Perfect. Understood
Side note: the SSD is a single point of failure for all 6 HDDs.
Wow. Sounds logical, but worrying.
I don't think it's a good idea to use it then. I'll leave it as local storage.

Many thanks for your help
 
  • Like
Reactions: SteveITS
In the earlier days ,usually it was recommended 1 ssd or nvme onto 4 hdds for wal.
Thanks for your response. Okay. In this case, it would be an SSD for six HDDs.
The problem is that I can't currently modify the existing hardware in each server.
That is, a 750 GB SSD and six 2.2 TB drives.
And I'm concerned about SPOF using one SSD with multiple drives.
I understand that the SSD isn't likely to fail, but if it does, I'll lose the entire Ceph node.
 
Could you please clarify what "not useful" means in this context?
My client is planning to replace a 4-node Dell VxRay cluster with 6 HDDs each. I currently have no way to change the hardware configuration.
The problem is that with pool replication, I only get 33% of the raw space, which is less than the 50% they currently have with vSAN.
Hence my idea to use erasure coding. I had thought of k=3 m=1 and failure domain = host.
Thanks in advance
 
Another issue might be that EC ( like ZFS RAIDZ compared to Mirrors) might hurt VM performance compared to the default setup or am I'm missing something? I'm aware that in larger ( 8 nodes and more) the scaleout-nature of Ceph fix this
 
  • Like
Reactions: gurubert
With m=1 you have the same redundancy as with size=2 and min_size=1 or in other words you have a RAID5.

You will lose data in this setup.
OK. Understood
You could run with k=2 and m=2 but will still have to cope with the EC overhead (more CPU and more network communication).
OK. Thanks for your suggestion.
The situation is far from ideal, but space is essential. The client is trying to move away from VMware because they don't want to face the licensing costs. Except for a couple of database servers, the rest of the VMs aren't particularly demanding.
I can manage to put the most critical VMs in a replicated pool.
Thank you very much for your suggestions, and have a good weekend.