Ceph DB/WAL on SSD

naltalef · Oct 6, 2025

Hello.
I'm planning to install PVE on servers that have one SSD and six disks. The idea is to configure Ceph OSDs in each disk and dedicate the SSD to the DB/WAL.
When doing this configuration from the GUI, it doesn't seem to allow you to put both the DB and WAL on the SSD, but only one.
However, using the CLI, I understand that it's possible (I haven't tested it). Is this correct?
If you can't actually put both the DB and WAL on the SSD, which would be preferable?
Thank you very much.
Best regards.
Norberto

SteveITS · Oct 6, 2025

see https://pve.proxmox.com/wiki/Deploy_Hyper-Converged_Ceph_Cluster#pve_ceph_osds
"The WAL is placed with the DB, if not specified separately"

IIRC the default space usage is 10% (?) of the OSD size but you can adjust.

Side note: the SSD is a single point of failure for all 6 HDDs.

naltalef · Oct 6, 2025

SteveITS said:
see https://pve.proxmox.com/wiki/Deploy_Hyper-Converged_Ceph_Cluster#pve_ceph_osds
"The WAL is placed with the DB, if not specified separately"

IIRC the default space usage is 10% (?) of the OSD size but you can adjust.

Perfect. Understood

SteveITS said:
Side note: the SSD is a single point of failure for all 6 HDDs.

Wow. Sounds logical, but worrying.
I don't think it's a good idea to use it then. I'll leave it as local storage.

Many thanks for your help

ness1602 · Oct 6, 2025

In the earlier days ,usually it was recommended 1 ssd or nvme onto 4 hdds for wal.

naltalef · Oct 6, 2025

ness1602 said:
In the earlier days ,usually it was recommended 1 ssd or nvme onto 4 hdds for wal.

Thanks for your response. Okay. In this case, it would be an SSD for six HDDs.
The problem is that I can't currently modify the existing hardware in each server.
That is, a 750 GB SSD and six 2.2 TB drives.
And I'm concerned about SPOF using one SSD with multiple drives.
I understand that the SSD isn't likely to fail, but if it does, I'll lose the entire Ceph node.

gurubert · Oct 9, 2025

naltalef said:
I understand that the SSD isn't likely to fail, but if it does, I'll lose the entire Ceph node.

This is correct. But depending on how many nodes you have this is not critical.

ness1602 · Oct 9, 2025

So spread you db/wal onto 2+ disks.

naltalef · Oct 14, 2025

gurubert said:
This is correct. But depending on how many nodes you have this is not critical.

I will have 4 nodes

naltalef · Oct 14, 2025

ness1602 said:
So spread you db/wal onto 2+ disks.

Understood. But unfortunately, I can't change the hardware configuration right now.

gurubert · Oct 15, 2025

naltalef said:
I will have 4 nodes

So you will lose 25% capacity in case of a dead node.

Make sure to set the nearfull ratio to 0.75 so that you get a warning when OSDs have less than 25% free space.

https://bennetgallein.de/tools/ceph-calculator

naltalef · Oct 17, 2025

gurubert said:
So you will lose 25% capacity in case of a dead node.

Make sure to set the nearfull ratio to 0.75 so that you get a warning when OSDs have less than 25% free space.

https://bennetgallein.de/tools/ceph-calculator

Hi.
Ok. Thanks for your suggestion.
I'm planning to use a Erasure Coding pool.

regards

gurubert · Oct 17, 2025

naltalef said:
I'm planning to use a Erasure Coding pool.

EC with only 4 nodes is not useful. You need at least 8 or 10 nodes to get useful k and m values for erasure coding.

ness1602 · Oct 17, 2025

i'm working with 5 nodes and EC pool, so i would say atlest 5 but maybe it doesnt make that much sense.

gurubert · Oct 17, 2025

With 5 nodes you can have k=2 and m=2 which gives you 200% raw usage instead of 300% with size=3 replicated pools.
But this is still a very small cluster for erasure coding.

naltalef · Oct 17, 2025

Could you please clarify what "not useful" means in this context?
My client is planning to replace a 4-node Dell VxRay cluster with 6 HDDs each. I currently have no way to change the hardware configuration.
The problem is that with pool replication, I only get 33% of the raw space, which is less than the 50% they currently have with vSAN.
Hence my idea to use erasure coding. I had thought of k=3 m=1 and failure domain = host.
Thanks in advance

gurubert · Oct 17, 2025

With m=1 you have the same redundancy as with size=2 and min_size=1 or in other words you have a RAID5.

You will lose data in this setup.

You could run with k=2 and m=2 but will still have to cope with the EC overhead (more CPU and more network communication).

Johannes S · Oct 17, 2025

Another issue might be that EC ( like ZFS RAIDZ compared to Mirrors) might hurt VM performance compared to the default setup or am I'm missing something? I'm aware that in larger ( 8 nodes and more) the scaleout-nature of Ceph fix this

naltalef · Oct 17, 2025

gurubert said:
With m=1 you have the same redundancy as with size=2 and min_size=1 or in other words you have a RAID5.

You will lose data in this setup.

OK. Understood

gurubert said:
You could run with k=2 and m=2 but will still have to cope with the EC overhead (more CPU and more network communication).

OK. Thanks for your suggestion.
The situation is far from ideal, but space is essential. The client is trying to move away from VMware because they don't want to face the licensing costs. Except for a couple of database servers, the rest of the VMs aren't particularly demanding.
I can manage to put the most critical VMs in a replicated pool.
Thank you very much for your suggestions, and have a good weekend.

Search

Search

Ceph DB/WAL on SSD

naltalef

New Member

SteveITS

Active Member

naltalef

New Member

ness1602

Famous Member

naltalef

New Member

gurubert

Distinguished Member

ness1602

Famous Member

naltalef

New Member

naltalef

New Member

gurubert

Distinguished Member

naltalef

New Member

gurubert

Distinguished Member

ness1602

Famous Member

gurubert

Distinguished Member

naltalef

New Member

gurubert

Distinguished Member

Johannes S

Distinguished Member

naltalef

New Member

We value your privacy