new proxmox ceph cluster- network speed- ssd...

felipe

Well-Known Member
Oct 28, 2013
222
6
58
Hi,we are allready running some proxmox hosts.
no its is finally time to get ceph running in production :) - testing with simple small 3 ceph node cluster went very well....


we will use 3 identical machines with 1 proxmox disk, 3 ssd for journals (firefly without journal is experimental only..) and 15 spinners each. plenty of ram & cpuall server
will have 4 1g network cards and 2 10g (dual) cards
so we will make bond with the 4 1g cards for virt traffi
cone bond from the 2 10g cards for osd traffic with rr
one bond from the 2 10g cards for monitors with rr
connected with link aggregation to 2 10g switches.
should provide best performance?
what size of ssd do you use?
what kind of ssds? which speed did work for you (one ssd for 5 spinners) ?
do you use slightly different ssd's? -> because with similar load etc. equal ssd's would fail more or less the same time... killing everything... i know this will happen in 1,2,3 years so nobody maybe still has experience with it (when they reach thier tbw)
some more suggestions?


best regards
philipp
 
As the ceph documentation says (and also my experience) you should use only 3 spinners with 1 ssd for journal. During recovery more spinners will literally kill that one ssd and that would be a huge performance impact for your cluster. Also if one ssd goes down all the spinners belonging to it will go down. You should take that into account. Also you should asses your required performance and think about some caching tier. Right now i'm experimenting with the new cache tier feature in firefly but since bobtail i'm using the CacheCade feature of the LSI 2208 controller for the spinners. It is a huge performance boost. I'm using 256GB ssds, 4 for CacheCade and 2 for journal in each node. One journal partition is 20GB.

For 15 spinners i think one 10G link for cluster and one 10G link for client communication is enough. I have 6 spinners in my 3 servers and i'm not using even half of the bandwidth during total (one whole node) recovery so i wouldn't bother with bonding but YMMV depending on your expected workload.
 
so you have 6 ssds and 6 spinners in one node? thats finally a 1:1 for ssd:spinner

how fast are your ssds? maybe with faster ones i could use 4 spinners? because with so many spinners makes a bit difference to use 3,4 or 5:1
cachecade also is interesting. the same with the new firefly cache tier feature.
but with so many spinners i would need a looot of ssd's for cache (4k can be improved fast.. but simple throughput from 15 spinners should be at least 2-3 ssds...)
what do you think to have 16 spinners with writeback cache from raidcontroller with 4 ssds and use a seperate pure ssd pool for database server etc...
how would you do your firefly caching? i think i should allways calculate per 3 for the right amount for ssds...
 
btw. what is your throughput and what your 4k io? to have some numbers to calculate.

thank you!
 
Well, you should test the 4 spinner 1 SSD journal setup it might work with a decent datacenter SSD. I've used Samsung 830 and 843 series but they wouldn't last half a year under our load so i've switched over to the new 843 Datacenter series and have high hopes for it. Right now the 3 spinner 1 SSD setup is working fine i wouldn't put more load on the SSD. With 15 spinners in a host i would try out the cache tier feature but put those ssds in dedicated hosts. I couldn't do that so my test scenario was a bit flawed and i had performance issues so i postponed the whole concept when i'll have enough hardware to test it out.

Stupid question but how can i measure my 4k io?

Running rados bench i can sustain about 600MB/s IO on the 3 node cluster if every SSD is working fine. As i see now the bottleneck are still the journal ssds and as i said each is only used by 3 spinners.
 
the new samsung 834 datacenter ssd has the same TBW for the 120 and 256GB disk... so with 3 disks it should be enough to use the smaller (and much cheaper) 120gig ssd?
4k io - inside a virtual machine on windows with crystal diskmark... what is quiete interresting too. and on host level with fio - check on google there are a lot of howtos...
why would you put those ssds on dedicated hosts?
 
the new samsung 834 datacenter ssd has the same TBW for the 120 and 256GB disk... so with 3 disks it should be enough to use the smaller (and much cheaper) 120gig ssd?
4k io - inside a virtual machine on windows with crystal diskmark... what is quiete interresting too. and on host level with fio - check on google there are a lot of howtos...
why would you put those ssds on dedicated hosts?

Yep, 120GB is enough if you are using it only for journal. Well, i don't have any Windows VMs on the cluster.

If your cache pool and storage pool OSDs are mixed on the same hosts you'll have a nightmare to manage your crushmap so it's better to separate them at the host level (well, if you are using the host level in your crushmap...). And it is much more flexible. My plan is to use 2U chassis with 12 spinners as cold storage and blade-like machines (SM Twinblade) with 6 SSDs for hot storage (cache). Much cheaper, more maintanable and more flexible to expand in the long run. I could expand the cluster at any given level without much effort.
 
and on host level with fio - check on google there are a lot of howtos...
why would you put those ssds on dedicated hosts?

Thanks for the tip regarding fio. Is there a job configuration you would like me to test on the cluster?
 
would be interesting how much troughoutput you get in a virt machine. and 4k in virtmachine. also with the host.
uhh so many servers and ssds for ceph cluster is not in the budget right at the moment :-(

 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!