Ceph storage Journal question

kumarullal

Renowned Member
Jun 17, 2009
184
0
81
LA, USA
If we use an SSd for Journaling, then we get a Warning that ceph-disk:Osd will not be hot-swapable if the journal is not on the same device as the OSD data.
So in other words, in order to replace a failed disk, one has to shutdown the proxmox/ceph node to replace the failed disk?
I read somewhere that with the new version of ceph , firefly (which is what I plan to install on proxmox ve) there is no need for a seperate disk for Journal? Is this true. If yes, are there any downside to include journal on the same OSD device? Will this cause any problems?
 
If we use an SSd for Journaling, then we get a Warning that ceph-disk:Osd will not be hot-swapable if the journal is not on the same device as the OSD data.
So in other words, in order to replace a failed disk, one has to shutdown the proxmox/ceph node to replace the failed disk?
I read somewhere that with the new version of ceph , firefly (which is what I plan to install on proxmox ve) there is no need for a seperate disk for Journal? Is this true. If yes, are there any downside to include journal on the same OSD device? Will this cause any problems?
Its not that you cannot hot swap, but you have to mark the OSD down and out before you remove it. Down and Out prevents queing any further read/write on that OSD.

Starting from Firefly, ceph does not use any journaling. So it should not be a problem.
 
hot-swap in this context means that you can remove a disk from ceph node A and put it into node B (to balance capacities for instance) and it will continue running there. While nice, I wouldn't consider this a deal-breaker if you lost this capability.


Journals:

At the moment a ceph write will consist of 2 operations: write to the journal and write to the object store. if those are on the same disk, the write rates get slowed down. People sometimes simplify the slowdown to 50%, which is an oversimplification, but the drop will be very noticable (like 30-50%). This is why separate journal disks exist. If you want more than 1 journal on a journal disk, it needs to be an SSD to now slow down things.


Additionally, like synncom mentioned, the Ceph guys have started experimenting with different object store technologies that dont need 2 operations per write. While support for this is already in firefly, it's still experimental and not yet properly documented, so tread carefully (or better yet: AVOID if you value your data).

From the firefly release docs:

Key/value OSD backend (experimental): An alternative storage backend for Ceph OSD processes that puts all data in a key/value database like leveldb. This provides better performance for workloads dominated by key/value operations (like radosgw bucket indices).
 
I wanna say no but that depends on whether the single-device setup is "fast enough" for what you want/expect. With SSDs it is very likely for the network to be the bottleneck anyhow. This is something that you should benchmark with your actual setup in my opinion.
 
no, with ssd osd, keep journal on same ssd too
Very True.

Also if the number of OSD per node is above 8ish, Ceph performs better with Journal+OSD on same HDD.

I am still using Ceph Emperor and not seeing myself converting all Ceph clusters to Firefly this year. Just letting other bravehearts out there to try it first so i can read their reports. :)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!