Ceph block.db and block.wal

JackScreecher

New Member
Apr 8, 2022
7
0
1
Hello, I'm looking over the Proxmox documentation for building a Ceph cluster here...

https://pve.proxmox.com/wiki/Deploy_Hyper-Converged_Ceph_Cluster

There is a small section entitled block.db and block.wal which says...

If you want to use a separate DB/WAL device for your OSDs, you can specify it through the -db_dev and -wal_dev options. The WAL is placed with the DB, if not specified separately.

I was wondering if anyone knows how much of a performance advantage is gained by doing this? The NVME drives I'm using are Seagate Firecuda 530's and they are PCIv4 7000+ MB/s. Do you think it's still a good idea to have an additional drive for the journal? My Ceph cluster will be dedicated - so not used as local storage. The HV's will be separate.
 
Unfortunately, Firecuda is a consumer not an enterprise product,so i guess you won't get much performance out of it
 
Unfortunately, Firecuda is a consumer not an enterprise product,so i guess you won't get much performance out of it
I appreciate what you say, but I think It's the perfect drive for this type of project. This drive uses the high end Phison PS5018-E18 Controller. It's TLC not QLC and it has a 5100TBW endurance rating. It's a very high performance drive. Yes it may be marketed at the desktop market but it's still a good choice for a system like this due to the redundancy of Ceph. I'd say these are as good, if not better than many enterprise drives on the market right now. As it's PCIe 4 the drive is already twice as fast as any PCie 3 drive. It has LDPC, ECC and End-to-End Data Path Protection. Ticks all the boxes for me.

Anyway, my question relates to how necessary it is to use a separate device for DB/WAL and whether this is something people would always recommend.

Perhaps someone with experience of using Proxmox and Ceph could spare a moment to comment. Many thanks.
 
Yes, it is always recommended to use db/wal, proxmox in the gui uses some percent for it automatically.
 
Yes, it is always recommended to use db/wal, proxmox in the gui uses some percent for it automatically.
Sorry my question was whether to use a separate drive for this. I'm looking to see other people's experience of running db/wal from the osd's compared to using an additional partitioned drive.

It isn't necessary to do it but the documentation suggests there is a performance benefit, although it isn't clear how much. As I'm using the fastest NVME drives available, capable of 7000MB/s I wonder if I really need to run the journal on an additional drive.

I'm hoping to hear someone else's experience of this.
 
This drive uses the high end Phison PS5018-E18 Controller. It's TLC not QLC and it has a 5100TBW endurance rating. It's a very high performance drive. Yes it may be marketed at the desktop market but it's still a good choice for a system like this due to the redundancy of Ceph.
Consumer SSDs don't have power loss protection (PLP). Such SSDs are either slow at sync writes or their controller treats all writes as async internally (may cause data corruption).

Anyway, my question relates to how necessary it is to use a separate device for DB/WAL and whether this is something people would always recommend.
You may achieve better performance, if you locate DB/WAL on a drive with much faster low queue depth sync writes (like Intel Optane NVMe or NVDIMM/PMEM drive). But Ceph has many bottlenecks, so there is no guarantee. You can test this scenario with a ramdisk, if you have sufficient RAM on your servers.
On the other hand separate DB/WAL increases complexity and a probability of an OSD failure (single OSD depends on multiple drives).
 
  • Like
Reactions: JackScreecher
Thanks for your replies.

The Ceph documentation does say It is only useful to use a WAL / DB device if the device is faster than the primary device (e.g., when it is on an SSD and the primary device is an HDD).

I'll do some testing.
 
The KC3000 is almost identical to the 530 so I imagine it will be very similar. The Intel D7-P5510 is a good enterprise drive but it has lower write performance and most likley wouldn't be fast enough to make a difference. I'll do some testing. Thanks for your efforts ness1602 :)
 
Using a separate DB device saved nearly half the IOPS in my previous tests, but IOPS is usually not the bottleneck for SSD.
So I don't think SSD needs separate DB/WAL devices, which increases the possibility of failure.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!