Ceph vs Sheepdog

leihnix

Member
Mar 22, 2016
26
3
23
Hello all,

today i was excited when I finally saw that Hitoshi Mitake announced version 1.0 of Sheepdog.

https://github.com/sheepdog/sheepdog/releases/tag/v1.0

I have tested around with ceph for some time now, i am not so happy with the performance in a small cluster environement (4 Nodes 8 OSD), specially when it comes to rebalancing after a OSD failure. I wonder how it will compare to Sheepdog. I am going to setup a standalone sheepdog cluster in the following weeks and do some testing.

The performance of Sheepdog looks better in some areas. But ceph hat its strenght for sure as well.
http://events.linuxfoundation.jp/sites/events/files/slides/COJ2015_Sheepdog_20150604.pdf

Greetings Leihnix
 
... i am not so happy with the performance in a small cluster environement (4 Nodes 8 OSD), specially when it comes to rebalancing after a OSD failure.
...
Hi,
you need to do some tuning to avoid an huge performance lost during rebuild.
Like
Code:
osd max backfills = 1
osd recovery max active = 1
osd_op_threads = 4
osd_disk_threads = 1 #disk threads, which are used to perform background disk intensive OSD operations such as scrubbing
there are some settings, which can help.

Udo
 
Hello Udo,

these settings I got from one of your prevous posts thanks for this, and I have implemented them already. Still after a OSD fail, I have around 50 virtual servers running, lots of then stop reacting for around 10 minutes. and then slowly come back. Specailliy the windows virtual hosts shutdown or loose one harddrive in that process. Some linux machines mount their harddrives read only, and i have to restart them. I guess 8 OSDs, jounaled by SSD is just not enough for a ceph cluster.

Hi,
you need to do some tuning to avoid an huge performance lost during rebuild.
Like
Code:
osd max backfills = 1
osd recovery max active = 1
osd_op_threads = 4
osd_disk_threads = 1 #disk threads, which are used to perform background disk intensive OSD operations such as scrubbing
there are some settings, which can help.

Udo
 
Hello Udo,

these settings I got from one of your prevous posts thanks for this, and I have implemented them already. Still after a OSD fail, I have around 50 virtual servers running, lots of then stop reacting for around 10 minutes. and then slowly come back. Specailliy the windows virtual hosts shutdown or loose one harddrive in that process. Some linux machines mount their harddrives read only, and i have to restart them. I guess 8 OSDs, jounaled by SSD is just not enough for a ceph cluster.
Hi leihnix,
if your hdd reach 150 IOPS (which is not bad) then your ceph cluster can max provide 1200 IOPS (of reads) - this is not much for 50 VMs.
If then one OSD die, others do a lot of write (and all others a lot of reading to provide the writes) and your IOs break down...

Perhaps more tuning is possible, but it's sound that you need (much) more OSDs.

Udo
 
Hello Udo,

thank you very much for your reply.

I have a PVE cluster of 7 nodes with around 50 VMs (Mixed systems, mail, dns, owncloud, web, db), they are all some mid range Dell 1HE rackserver with 4 harddrive bays. So far I have equipped 4 of the seven with 1 harddrive for PVE, 1 SSD and two spindle drives 2TB, WD Red 7500K. that gives me 8 OSDs. I have 3 more nodes that i can ceph enable that would give me 6 more OSDs, so 14 OSDs in total.

I am thinking maybe I sould get rid of the SSD and add 7 more harddives into the cluster to have more IOPS and be slower on the writes because of no SSD backed jounaling. That would give me 21 OSDs in total.

If I stay with SSD jounaling I get 14 OSDs in total, and hope there will be enough IOPS.

For now I moved all my VMs away from the ceph installation to the local disk, not to have unresponsive systems in case of a failing disk. Which ofcouse gives me some backup work in case of a failing PVE disk :).

I have read through the performance options in this forum, and followed all the hints. Doing more manual reading currently... and maybe find some better tuning.

The other tought I had was maybe Sheepdog is better in the sence of recovering, and I can switch to Sheepdog.

Have a wonderful Sunday evening!
Leihnix
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!