Proxmox ceph, LAG's and jumbo frames

nethfel

Member
Dec 26, 2014
151
0
16
Hi all,
I have a question I'm hoping that someone has experimented with in the past before I go head first into this.

Ok - first, on my Ceph nodes, I currently have 3x 1Gig ethernet ports in a LACP configuration using layer 2+3.

I'm wondering if anyone has seen any benefit with Ceph and using jumbo frames?

If it works well, I might be interested in trying it - the only problem - I'd loose 1 of the 3 ethernets for the lag (the onboard ethernet doesn't support an mtu over 1500) - so would it really benefit me to have a higher mtu if I loose one of my ports in the LAG?

Thoughts?
 
Hi all,
I have a question I'm hoping that someone has experimented with in the past before I go head first into this.

Ok - first, on my Ceph nodes, I currently have 3x 1Gig ethernet ports in a LACP configuration using layer 2+3.

I'm wondering if anyone has seen any benefit with Ceph and using jumbo frames?

If it works well, I might be interested in trying it - the only problem - I'd loose 1 of the 3 ethernets for the lag (the onboard ethernet doesn't support an mtu over 1500) - so would it really benefit me to have a higher mtu if I loose one of my ports in the LAG?

Thoughts?

I use LACP bonds with 2 nics and use Jumbo frames with Ceph. I put the howto in the proxmox wiki for openvswitch http://pve.proxmox.com/wiki/Open_vSwitch

Depending on how many physical nodes you have, having 3 ports may not help because LACP bonds don't just act like a bigger trunk. Unless you have more than 3 nodes I doubt you'll see a performance difference with 3 nics vs 2.
 
I currently have 4 OSD nodes w/ 12 OSDs total (for now). There are 2 links per LAG per node. The storage network (cluster network) is configured separate from the public network (that took a bit of fiddling since I used ceph within prox to set up my ceph network, but this was a significant performance booster for me, a 30MB improvement from what I ran a week ago with ceph on a shared public/cluster network (public/cluster referring to just ceph operations, both OSD communications as well as monitor and client communication) - see: http://forum.proxmox.com/threads/20804-Need-to-make-sure-I-have-replicas-right-for-PVE-Ceph?p=106137)

I ran some numbers tonight with different MTU settings to two different pools, clearing the caches between each run. One pool was setup for a size of 2, the other 3. The numbers weren't really what I expected. I thought that the larger MTU would be more efficient on the storage network

Code:
LACP Layer 2+3


MTU				2 rep			3 rep


1500 AVG		122 MB/s			89.4 MB/s
1500 MAX		164 MB/s			136 MB/s


4000 AVG		119.385 MB/s		84.296 MB/s
4000 MAX		164 MB/s			120 MB/s


9000 AVG		116.697 MB/s		84.477 MB/s
9000 MAX		164 MB/s			84.477 MB/s

It's interesting to see that for me - with the switch I have, the amount of nodes, etc. it would appear that an MTU of 1500 is the best with the rados bench test, of course the big question then becomes - as this is just a benchmark testing as opposed to real operations, how might real operations have different results?