Does RAID card(JBOD) make difference in Ceph? It may be so

wahmed

Famous Member
Oct 28, 2012
1,116
44
113
Calgary, Canada
www.symmcom.com
Does a RAID card in JBOD mode make any difference in Ceph or any other storage? From running some benchmarks on Ceph OSDs it appears that it does. Take a look at the image below:

raid-write.PNG

This is a 6 node Ceph cluster with 4 OSD in each. Node 17, 18, 19 got cacheless RAID driving each OSDs. Node 20,21,22 has no RAID card. All OSDs are directly connected to motherboard SATA port. The write performance is noticeable. The entire storage also got slower after adding 5th and 6th node. All nodes got identical motherboard and CPU. The only variables are the RAID cards.
I ordered some RAID cards to install in node 20,21 and 22 then run same benchmark to see how the number changes.

Any thoughts?
 
Does a RAID card in JBOD mode make any difference in Ceph or any other storage? From running some benchmarks on Ceph OSDs it appears that it does. Take a look at the image below:

View attachment 2484

This is a 6 node Ceph cluster with 4 OSD in each. Node 17, 18, 19 got cacheless RAID driving each OSDs. Node 20,21,22 has no RAID card. All OSDs are directly connected to motherboard SATA port. The write performance is noticeable. The entire storage also got slower after adding 5th and 6th node. All nodes got identical motherboard and CPU. The only variables are the RAID cards.
I ordered some RAID cards to install in node 20,21 and 22 then run same benchmark to see how the number changes.

Any thoughts?

Hi Wasim,
it's depends on the Raid-controller. I have seen the same with Areca Raid Controller - there was the cache used for writing. So the latency was much better.

The only "problem" is, that after an hdd-failure you must define the new disk as pathtrough.

About onbord-Sata: I have noticed better performance with an LSI SAS9201/16i than with internal sata. But this may depends on the motherboard.

If you use Intel DC S3700 SSDs as journal-disk, you don't need caching raid-cards ;-)

Udo
 
The same observation goes for HBA. The reason for this is that with a HBA or Raid Controller you do not need to compete with the CPU for resources and they are dedicated devices.
 
If you use Intel DC S3700 SSDs as journal-disk, you don't need caching raid-cards ;-)

I have been trying to avoid SSD for journal, could you tell ? :)
But i think to gain much performance out of Ceph, this is going to be the only viable option.

I can get my hands on few Intel S3500 120GB SSD for extremely low prices. From what i can tell from Intel site, this SSDs got 445mb Read and 135mb write. Where as S3700 100GB has 500mb Read and 200mb Write for more price of course. Should i just put the thought of using S3500 aside?
Udo you are using S3700 240GB SSD i believe which got 365mb Write.


Good point mir about the HBA/Raid CPU usage. I didnt think of it. Using a HBA/Raid card does offload some processing from CPU.
 
To get an idea of IOPS, ran some fio test as described by Mr. Hans: http://www.sebastien-han.fr/blog/20...-if-your-ssd-is-suitable-as-a-journal-device/

I had 2 different Kingston SSDs laying around so just used them to see the difference. Values are when they maxed out.

Kingston V300 120GB = 194 iops
Kingston KC300 120GB = 58,000 iops
OCZ-Agility3 60GB = 46,000 iops
Segate 600 240GB = 2900 iops
My trusty 2TB Seagate OSD = 95 iops

Whats the iops on those S3700 Udo?
 
Last edited:
I can get my hands on few Intel S3500 120GB SSD for extremely low prices. From what i can tell from Intel site, this SSDs got 445mb Read and 135mb write. Where as S3700 100GB has 500mb Read and 200mb Write for more price of course. Should i just put the thought of using S3500 aside?

Be careful with the S3500's, its not the performance why you'd want to use the S3700's, its the fact that the S3700's use eMLC and give you endurance of 10 DWPD. Using an SSD as a journal across multiple HDDs will mean it is used more heavily than as a storage device. The S3500 will probably simply meet its max life quicker than you'd think, depending on your workload of course.

Another good option is the Samsung 845DC PRO (not EVO!), which is offering the same endurance as the S3700s, but is a newer generation drive with lower cost and higher performance.
 
Be careful with the S3500's, its not the performance why you'd want to use the S3700's, its the fact that the S3700's use eMLC and give you endurance of 10 DWPD. Using an SSD as a journal across multiple HDDs will mean it is used more heavily than as a storage device. The S3500 will probably simply meet its max life quicker than you'd think, depending on your workload of course.

Another good option is the Samsung 845DC PRO (not EVO!), which is offering the same endurance as the S3700s, but is a newer generation drive with lower cost and higher performance.

Hi,
not all people aggree with the 845DC: http://www.spinics.net/lists/ceph-users/msg15204.html

Udo
 
To get an idea of IOPS, ran some fio test as described by Mr. Hans: http://www.sebastien-han.fr/blog/20...-if-your-ssd-is-suitable-as-a-journal-device/

I had 2 different Kingston SSDs laying around so just used them to see the difference. Values are when they maxed out.

Kingston V300 120GB = 194 iops
Kingston KC300 120GB = 58,000 iops
OCZ-Agility3 60GB = 46,000 iops
Segate 600 240GB = 2900 iops
My trusty 2TB Seagate OSD = 95 iops

Whats the iops on those S3700 Udo?
Hi Wasim,
the Intel DC S3700 200MB reach 35381 IOPS with the fio-test.

Udo
 
Got the Kingston SSDs and RAID cards and already setup journals for 3 of the nodes. Adding OSDs to 4th node now. Did not run any benchmark yet. It is recovering at 15 max-backfills and 15 max-recovery. I am not sure if i was to face WOW factor or not due to the SSD journal. But it seems to me recovery should have gone little faster. It is recovering at 156mb/s which is about the same without the SSD. Or is it during actual client writes that the SSD journal speed is noticeable?
 
To get an idea of IOPS, ran some fio test as described by Mr. Hans: http://www.sebastien-han.fr/blog/20...-if-your-ssd-is-suitable-as-a-journal-device/

I had 2 different Kingston SSDs laying around so just used them to see the difference. Values are when they maxed out.

Kingston V300 120GB = 194 iops
Kingston KC300 120GB = 58,000 iops
OCZ-Agility3 60GB = 46,000 iops
Segate 600 240GB = 2900 iops
My trusty 2TB Seagate OSD = 95 iops

Whats the iops on those S3700 Udo?
How did you tests these devices? I have development cluster consisting of 3 proxmox nodes with 3 HDDs + 4 SSD in each. CEPH configure that one SSD is a journal for HDDs and other 3 SSD are in a writeback cache tier. As a development cluster, Kinston V300 SSDs are used to keep costs down and scores high in Sebastien's blog. Strange thing is that VMs are maxing out at around 100 IOPS for read operations. I would love to repeat you performance tests so I can compare with your findings.
 
How did you tests these devices? I have development cluster consisting of 3 proxmox nodes with 3 HDDs + 4 SSD in each. CEPH configure that one SSD is a journal for HDDs and other 3 SSD are in a writeback cache tier. As a development cluster, Kinston V300 SSDs are used to keep costs down and scores high in Sebastien's blog. Strange thing is that VMs are maxing out at around 100 IOPS for read operations. I would love to repeat you performance tests so I can compare with your findings.
I basically followed the instructions in Sebastian's site. Wanted to follow the exact path. I never clarified with him but i think he meant Kingston KC300 and not Kingston V300. It has been while but i tried quite a few times to squeeze performance out of Kingston V300 but never went higher than 190 IOPS avg. May be i did something wrong but other SSD performd higher with same test.
 
I have tested my V300 using tha same methodology that is presented on Sebastien's blog. I even managed to get a little higher scores than what is seen on site. Interesting thou is perfomance is apalling in VMs. When I read in VM, I get around 30MB/sec regardless of read ahead settings, IOwait is high in VM and very low in cluster (around 2% in each hypervisor). It looks like there is high wait time to initiate data read from SSDs or there is something wrong with kvm part not able to request more than one ceph block at a time.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!