Recommendations for Ceph on gigabit?

Waschbüsch

Renowned Member
Dec 15, 2014
93
8
73
Munich
Hi all,

Currently, I have the following setup:
I run a pve cluster with 3 nodes. Each node has two gigabit network interfaces one with a public management IP and one on an internal switch with a private IP.
Now, since the public IPs are not on the same network, I created the cluster using the private IPs. The internal network is also used to mount NFS shares.
Thus far everything works really well.
Now, what I would like to know is if it makes sense to setup ceph on the three nodes:
  • Given the fact that it is 'only' gigabit ethernet, what will performance be like?
  • Even if it is (which I anticipate) too slow for running VM images, would it be suitable for backup purposes?
  • Do I need a separate (e.g. VLAN) network on top of the existing private LAN?
  • Are there any other caveats I need to watch out for?

Thanks a lot,

Martin
 
Last edited:
Hello Martin,

Given the fact that it is 'only' gigabit ethernet, what will performance be like?

Depends on your situation. 10Gbit/s is of course better than 1Gb/s, but if you don´t have to much load it will work with 1Gb/s without problems.

Even if it is (which I anticipate) too slow for running VM images, would it be suitable for backup purposes?

Yes.

Do I need a separate (e.g. VLAN) network on top of the existing private LAN?

No. What you maybe should consider: a third LAN (and 3rd NICs in all nodes) in order to have a seperate physical network just for CEPH (makes it clearer, especially in case of trouble shooting). You need a switch too, a simple unmanaged (costs less than 30$) one is sufficient.

Are there any other caveats I need to watch out for?

Use "pveceph" on command line or WEB GUI - but not the standard ceph calls (except for read information only)

Kind regards

Mr.Holmes
 
If you are to buy new nics for the ceph storage you could also consider Infiniband based network. 3 nics and the needed cables will give you 10-20 Gb speed for around 250$. Try search ebay. Infiniband is fully supported by the proxmox kernels and proxmox/debian.
 
Hi all,

  • Given the fact that it is 'only' gigabit ethernet, what will performance be like?

We use such a setup for small office clusters. With the NICs bonded, we've found you get around theoretical performance out of the network. So it will definitely work fine if you don't have disk-intensive operations. IOPS are actually showing about the same as our 10G cluster, just overall throughput is slower. We run with SSDs on the nodes and Open vSwitch with a ceph vlan interface broken out with jumbo frames.

As with anything, it depends on your workload, but we haven't found dual gigabit to be as disappointing as we would have expected.
 
Hi all and thanks for the replies.
After playing around with this a bit, I am quite amazed at how well things worked.
I used the internal gigabit LAN for the ceph setup and added one OSD per node.
Then moved a running(!) VMs disk image from local LVM to ceph and vice versa.
I also tried online changes to the pool layout, e.g. stuff like:

root@node1:~# ceph osd pool set rbd min_size 1 ; ceph osd pool set rbd size 2
root@node1:~# ceph osd pool set rbd size 3 ; ceph osd pool set rbd min_size 2

As well as disabling and re-enabling one node to see if rebuilding works, eg.
root@node1:~# pveceph stop
root@node1:~# pveceph start

Everything worked as advertised and even quite fast. Read speeds were around 80 M/s and write speeds around 60 M/s when all three OSDs were online and the size/min_size ratio was 3:2.

In short, this is awesome technology!
Martin
PS: One caveat: adding a lvm as OSD requires manual intervention as the pveceph tool balks when you attempt to do that. Once added by hand, however, things run just fine!
 
...
In short, this is awesome technology!
+1


Hi Martin.
PS: One caveat: adding a lvm as OSD requires manual intervention as the pveceph tool balks when you attempt to do that. Once added by hand, however, things run just fine!
hmm, I'm sure that's not the best idea to use lvm as OSD... (performance and so on).
I would advise to use full disks.

Udo
 
Sure thing, but if your current setup is a Hardware-RAID 6 with 4 Disks with a LVM on top of that ...
Since I was just testing, this was MUCH easier than migrating the RAID 6 to RAID 5 and then use the single disk, etc.
If I were to setup a cluster from scratch, I'd certainly setup ceph using drives directly.

Martin
 
Hello Martin,
Use "pveceph" on command line or WEB GUI - but not the standard ceph calls (except for read information only)
Mr.Holmes

What did you mean by not the standard ceph calls? As far as i know all Ceph cli commands work just fine. Dont really have to use pveceph. Only really needed when initially setting up Ceph on Proxmox nodes.
 
If you are to buy new nics for the ceph storage you could also consider Infiniband based network. 3 nics and the needed cables will give you 10-20 Gb speed for around 250$. Try search ebay. Infiniband is fully supported by the proxmox kernels and proxmox/debian.

Interesting that you imply you don't need an Infiniband switch for that setup? How does it work exactly?
Is 3 the maximum number of nodes possible without a switch?
 
Yes, 3 nodes is maximum without a switch. If you plug in dual port infiniband nics and connect the nodes in a ring then if one node goes down the remaining 2 nodes will always be able to maintain communication and also be notified when the failed nodes comes up again. This means a quorum is always able to be established. See attached picture.


nodes.png
 
Last edited:
Update:

In case anyone wondered: I ended up using three Intel X540-T1 and a Netgear XS708E to up everything to 10GbE. It really rocks!
 
I have everything now running on 1 Gbit, but Proxmox gives very high IO delays. (journal is on SSD)
Throughput is at about max 1gbit, but CPU and io delay usage are quite high.
Is this normal?

Nodes are 2x Xeon E5335 with 32GB Ram


When monitoring my bandwith with nload, ceph traffic never goes to 1Gbit (max is around 600Mbit)
I can easily reach 1 Gbit with iPerf so the network on the interface is ok.

I also enabled CephFS and mounted with ceph-fuse for OpenVZ.
In the performance with RBD I see with rados benchmark that I reach about 88MB/s but sometimes it goes to 0MB/s (it has timeouts)
Could it be that the journal size is to small on my SSD?
Or it takes to long to flush the journal?

When checking the performance of CephFS with pveperf it only gives 20 FSYNC...

The Apply latency on the OSD tab in proxmox shows a high ms on the nodes which are using ceph >300, sometimes even >1000
Not sure where to look to debug this, hope someone can shine some light on the matter.
 
Chances are your journal size is too small if you're getting 0 especially if during bench tests. The default journal size on a base ceph proxmox install is 5 gig. I know I run into trouble with that small of a journal on a bonded network and have the default journal size currently set for 12gig which seems to keep my throughput at a reasonable rate

Also know if your using ceph via proxmox with a single gigabit interface, your sharing that interface for a lot of traffic, both cluster and client which will reduce your throughput. If you can add an additional Ethernet and move your cluster traffic to it (leaving client traffic on the other Ethernet) you'll have much better throughput even with the limitation of a single 1gig Ethernet...
 
Chances are your journal size is too small if you're getting 0 especially if during bench tests. The default journal size on a base ceph proxmox install is 5 gig. I know I run into trouble with that small of a journal on a bonded network and have the default journal size currently set for 12gig which seems to keep my throughput at a reasonable rate

Also know if your using ceph via proxmox with a single gigabit interface, your sharing that interface for a lot of traffic, both cluster and client which will reduce your throughput. If you can add an additional Ethernet and move your cluster traffic to it (leaving client traffic on the other Ethernet) you'll have much better throughput even with the limitation of a single 1gig Ethernet...

Sorry, I do have 2 gbit adapters.
One is used for clients and one for the cluster network of Ceph.
I should see if I can increase the journal.
In the mean time I will try if setting a lower osd target transaction size will help anything.....
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!