Ceph performance - raid configration

Leo David

Well-Known Member
Apr 25, 2017
115
6
58
45
Hi guys,
Firstly I would like to thank to all the people that are working on such a product that you just deploy and almost forget about it, without the need to continuously fix and repair, while focusing on the upper services ( vms ) that the platform helps you to roll.
Secondly I am sorry if this subject has been discussed before, and I might be too lazy for searching for the answer.
My question is related to the actual deployment that I am involved in, which involves starting with a set of 4 x Dell PE R640 nodes. They are equipped with Perc H730p mini controller with a 2Gb cache, while the data disks are all 1.9TB sas ssd 12Gb/s ( 4 per node ). It ssems that with this card I can configure a mixed mode of Raid1 for OS disks and passthrough for the rest of the data disks - if the case.
The plan is to use the platform as a very nice HyperConverged setup, and having Ceph to be used as underlying storage. I have some past Ceph experiences starting Jewel , and i've been through a lot of issues regarding disks and controller types when we talk about performance.
Now, being the fact that I am starting with only 4 nodes and all disks are enterprise grade ssd:

1. Would it be better to configure each of the osd's disks as:
- raid0 with cache enabled
- passthrough with cache enabled
- passthrough without cache

2. I have also quoted 2 x write intensive 400GB disks per node to be used as journal disks:
- should I take them out and not consider using separate journal disks being that osds are all sas ssd's ?
- if using journal disks is still recommended, should I use them in a raid1 array for having raid fault tolerance ?

3. Any other recommendations regarding controller / disks / raid / passthrough setup ?
Firstly I have considered Perc h330, but I have gave up on it, because the lack of cache and some other's users bad performance experience.

Thank you so much, and have a very nice weekend !

Cheers,

Leo
 
Last edited:
1. Would it be better to configure each of the osd's disks as:
- raid0 with cache enabled
- passthrough with cache enabled
- passthrough without cache
Ceph (or ZFS for that matter) and RAID is a no go.
https://pve.proxmox.com/pve-docs/chapter-pveceph.html#_precondition

2. I have also quoted 2 x write intensive 400GB disks per node to be used as journal disks:
- should I take them out and not consider using separate journal disks being that osds are all sas ssd's ?
- if using journal disks is still recommended, should I use them in a raid1 array for having raid fault tolerance ?
With Bluestore there is no journal anymore. Bluestore's DB can be located on a separate device. Why use RAID, something that is fault-tolerant at that level already?

3. Any other recommendations regarding controller / disks / raid / passthrough setup ?
Firstly I have considered Perc h330, but I have gave up on it, because the lack of cache and some other's users bad performance experience.
Use a HBA and save some hassle.
 
Thank you Alwin,

Ceph (or ZFS for that matter) and RAID is a no go.
- in this case, i assume "passthrough without cache" would be the most appropiate option ?

With Bluestore there is no journal anymore. Bluestore's DB can be located on a separate device. Why RAID something that is fault-tolerant at that level already?
- i mean to keep wal db on the write-intensive disks ( indeed, no journal anymore in this version ).
- so i guess it would make sense to keep the wal db on a raid1 array to prevent loosing of all the osds in case of wal db drive fails ?



Use a HBA and save some hassle.
- you are perfectly right !
- the only reason for not installing a pure hba and relay on perc730p passthrough capablitty is to be sure that we can gave up on ceph ( if needed ) and create local raided storage without the need of a datacenter visit for card replacement / disks rewiring.

Have a nice day,

Leo
 
- in this case, i assume "passthrough without cache" would be the most appropiate option ?
HBA, HBA, HBA. ;)

- i mean to keep wal db on the write-intensive disks ( indeed, no journal anymore in this version ).
- so i guess it would make sense to keep the wal db on a raid1 array to prevent loosing of all the osds in case of wal db drive fails ?
You cut the performance of the DB disk in half. For data that is redundant on cluster level (eg. 3x replicas). If the DB disk fails, then it failed. You replace the disk and re-create all the OSDs. Ceph takes care of the recovery.

- the only reason for not installing a pure hba and relay on perc730p passthrough capablitty is to be sure that we can gave up on ceph ( if needed ) and create local raided storage without the need of a datacenter visit for card replacement / disks rewiring.
There is ZFS as well. ;)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!