Proxmox ceph, LAG's and jumbo frames

nethfel

Member
Dec 26, 2014
151
0
16
Hi all,
I have a question I'm hoping that someone has experimented with in the past before I go head first into this.

Ok - first, on my Ceph nodes, I currently have 3x 1Gig ethernet ports in a LACP configuration using layer 2+3.

I'm wondering if anyone has seen any benefit with Ceph and using jumbo frames?

If it works well, I might be interested in trying it - the only problem - I'd loose 1 of the 3 ethernets for the lag (the onboard ethernet doesn't support an mtu over 1500) - so would it really benefit me to have a higher mtu if I loose one of my ports in the LAG?

Thoughts?
 
Hi all,
I have a question I'm hoping that someone has experimented with in the past before I go head first into this.

Ok - first, on my Ceph nodes, I currently have 3x 1Gig ethernet ports in a LACP configuration using layer 2+3.

I'm wondering if anyone has seen any benefit with Ceph and using jumbo frames?

If it works well, I might be interested in trying it - the only problem - I'd loose 1 of the 3 ethernets for the lag (the onboard ethernet doesn't support an mtu over 1500) - so would it really benefit me to have a higher mtu if I loose one of my ports in the LAG?

Thoughts?

I use LACP bonds with 2 nics and use Jumbo frames with Ceph. I put the howto in the proxmox wiki for openvswitch http://pve.proxmox.com/wiki/Open_vSwitch

Depending on how many physical nodes you have, having 3 ports may not help because LACP bonds don't just act like a bigger trunk. Unless you have more than 3 nodes I doubt you'll see a performance difference with 3 nics vs 2.
 
I currently have 4 OSD nodes w/ 12 OSDs total (for now). There are 2 links per LAG per node. The storage network (cluster network) is configured separate from the public network (that took a bit of fiddling since I used ceph within prox to set up my ceph network, but this was a significant performance booster for me, a 30MB improvement from what I ran a week ago with ceph on a shared public/cluster network (public/cluster referring to just ceph operations, both OSD communications as well as monitor and client communication) - see: http://forum.proxmox.com/threads/20804-Need-to-make-sure-I-have-replicas-right-for-PVE-Ceph?p=106137)

I ran some numbers tonight with different MTU settings to two different pools, clearing the caches between each run. One pool was setup for a size of 2, the other 3. The numbers weren't really what I expected. I thought that the larger MTU would be more efficient on the storage network

Code:
LACP Layer 2+3


MTU				2 rep			3 rep


1500 AVG		122 MB/s			89.4 MB/s
1500 MAX		164 MB/s			136 MB/s


4000 AVG		119.385 MB/s		84.296 MB/s
4000 MAX		164 MB/s			120 MB/s


9000 AVG		116.697 MB/s		84.477 MB/s
9000 MAX		164 MB/s			84.477 MB/s

It's interesting to see that for me - with the switch I have, the amount of nodes, etc. it would appear that an MTU of 1500 is the best with the rados bench test, of course the big question then becomes - as this is just a benchmark testing as opposed to real operations, how might real operations have different results?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!