Tuning performance in VM with sceduler

mir

Famous Member
Apr 14, 2012
3,568
127
133
Copenhagen, Denmark
Hi all,

Inspired by a thread about ceph and which scheduler to use in a VM I made a quick test using fio. Results are stunning so I hope others can duplicate my results?
My storage is based on ZFS.

Test file used:
Code:
# This job file tries to mimic the Intel IOMeter File Server Access Pattern
[global]
description=Emulation of Intel IOmeter File Server Access Pattern
[iometer]
bssplit=512/10:1k/5:2k/5:4k/60:8k/2:16k/4:32k/4:64k/10
rw=randrw
rwmixread=80
direct=1
size=4g
ioengine=libaio
# IOMeter defines the server loads as the following:
# iodepth=1    Linear
# iodepth=4    Very Light
# iodepth=8    Light
# iodepth=64    Moderate
# iodepth=256    Heavy
iodepth=64

Results
Code:
                   NFS                      iSCSI
CFQ       r:4537 w:1130      r:  6927 w: 1733
NOOP    r:7484 w:1874      r: 11454 w: 2874
It seems scheduler has a big impact on performance when dealing with file systems with native sophisticated caching.
 
I have now testing cluster running on latest proxmox nodes, which have intel ssd drives, connected with 10gb switch. One pool is gluster on zfs based on two striped drives and second one is ceph pool based on two osd drives on two servers. VM is latest debian with writeback cache and raw format on gluster running on one of the storage nodes.

These are my results

a) Ceph
noop: r/w 4215 / 1058
deadline: r/w 4212 / 1055
cfq: r/w 4214 /1052

b) Gluster
noop: r/w 4928 / 1235
deadline r/w 4206/1051
cfq: r/w 5059/1262

I also run a bonnie benchmark and it showed almost twice better results for gluster. Maybe I have some misconfiguraton somewhere ...
 
Cache is writeback as I wrote and mount options are following:
/dev/disk/by-uuid/e4027256-ecf7-4257-ac5a-30c228d2f74a on / type ext4 (rw,relatime,errors=remount-ro,user_xattr,barrier=1,data=ordered)

I made no changes to vm except scheduler.
 
So after barrier=0 and reboot just to be sure are results on gluster and cfq 4648 and 1161, so it is even lower then with barriers in my case.
 
How do you get giant on proxmox?
edit
/etc/apt/sources.list.d/ceph.list

deb http://ceph.com/debian-giant wheezy main


#apt-get update
#apt-get dist-upgrade

on each node

then,

/etc/init.d/ceph restart mon
on each monitor nodes


then

/etc/init.d/ceph restart osd
on each osd nodes
 
Thanks spirit, much appreciated. Is it as simple as that to upgrade from firefly to giant?
- upgrade
- restart monitors
- restart osd's

No other procedures/gotchas?

You mentioned "For ceph with ssd drives, you really should try giant.". Is it a similar improvement for spinners + ssd journals?

I'm only running a small setuup at the moment, two osd spinners + two ssd journal partitions on top of ZFS with a ZFS cache. Read performance is good, but write is marginal.
 
nb. Sorry, one last question - is giant compatible with the proxmox ceph management tools and ui?
 
Thanks spirit, much appreciated. Is it as simple as that to upgrade from firefly to giant?
- upgrade
- restart monitors
- restart osd's

No other procedures/gotchas?


no it's really simple. Just check that ceph health is ok (through gui or #ceph -w), between each daemon restart


You mentioned "For ceph with ssd drives, you really should try giant.". Is it a similar improvement for spinners + ssd journals?
I'm only running a small setuup at the moment, two osd spinners + two ssd journal partitions on top of ZFS with a ZFS cache. Read performance is good, but write is marginal.


The major improvement with giant, is that osd daemon can use more cores to scale, because before they was some big lock. So only when you need a lot of ios. (so ssd).
For write, maybe it could help too, because for me, write bottleneck is cpu on osd node.


About zfs, i'm not sure it's well tested with ceph. do you use zfs only of osd spinner ? if yes, do you have disable zil on it ? (we have already journal in ceph)

 


The major improvement with giant, is that osd daemon can use more cores to scale, because before they was some big lock. So only when you need a lot of ios. (so ssd).
For write, maybe it could help too, because for me, write bottleneck is cpu on osd node.


So at worst, no worse :) and possibly better. Also I was wanting to expeirment with Caching Tiers in giant as they seemed more mature.


About zfs, i'm not sure it's well tested with ceph. do you use zfs only of osd spinner ?

Yes - only the spinner is managed by zfs, a directory mount with a zfs ARC and L2ARC. It helps a lot with the read performance - 200MB/s with it, 70 MB/s without.

if yes, do you have disable zil on it ? (we have already journal in ceph)

I don't think you can disable the RAM zil in zfs, but I have disabled the SSD zil (SLOG). I use the ssd partition for the ceph journal.

I had to set it all up manully using ceph-osd, but its not difficult and a useful learning exercise.
 
Also I was wanting to expeirment with Caching Tiers in giant as they seemed more mature.
yes, it's have been improve with giant. but I think we should wait for next Hammer release to be good.
Also note, that ceph tiering works with 4M objects size (so 1 small 4k read promote the full object in ssd tier), so zfs granularity is better for this.
 

yes, it's have been improve with giant. but I think we should wait for next Hammer release to be good.

Be interesting to test with


Also note, that ceph tiering works with 4M objects size (so 1 small 4k read promote the full object in ssd tier), so zfs granularity is better for this.

Yah, I have no idea how that would work with VM images.

thanks.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!