That being said, I did so many tests with Proxmox/Ceph that I am not sure I tested exactly this case. I only have 2gbe with bonding, but it should still make a noticeable difference. So give me a few days and I may be able to give you something more conclusive.
so, let me follow through on this promise.
The test bed for the results below is:
Hardware
3x HP ProLiant DL380 G6 Server 2x Xeon X5670 Six Core 2.93 GHz
96GB RAM, 2x P410 RAID Controllers with 512MB battery backed write cache
each with 6x 2TB 2.5" desktop HDDs, journals on same disks (see below).
each with 8x1 GBit/s ethernet ports operating at maximum speed (tested)
Network
Zyxel 48 port L2 switch with 1Gbit/s per port
separate port-based VLANs for each test with separate LACP groups as necessary.
Ceph RBD pool
size 3 (3x replication)
256 PGs
about the OSDs: many other sources and my own tests on different hardware indicate that you can keep your journals on the same spinners if you have disk controllers with write back cache that properly order your writes. Unfortunately, these P410 controllers do not seem to do well, although I cannot rule out a misconfiguration yet. As it is, a Ceph OSD benchmark (
ceph tell osd.x bench) only results in about 33MB/s write performance each, which is probably capping the test results in some cases.
If anyone would like to discuss this in detail we can start a different thread. I have an interesting comparison to an Areca controller here.
In all tests, I kept the setup the same and just changed the networks, changing the Ceph monitors and Proxmox storage config as appropriate.
When I write "2gbps" this means 2x1gbps with LACP (
bond-xmit_hash_policy layer3+4)
All network write ops are zeros though netcat that go through a full capacity 1gbps network into a Linux VM (raw throughput tested at 112 MB/s) and from there to a file on its RBD backed disk.
first test: 1gbps
proxmox-public, 2gbps
ceph-public, 2gbps
ceph-cluster
- RADOS bench 60s write: 94.5 MB/s, read: 312 MB/s
- network write through VM: 99.5 MB/s
second test: 1gbps
proxmox-public, 1gbps
ceph-public, 2gbps
ceph-cluster
- RADOS bench 60s write: 106.6 MB/s, read: 171.3 MB/s
- network write through VM: 90.2 MB/s
third test: 1gbps
proxmox-public and ceph-public, 2gbps
ceph-cluster
- RADOS bench 60s write: 116.8 MB/s, read: 163.3 MB/s
- network write through VM: 77.9 MB/s
fourth test: 1gbps
proxmox-public, 4gbps
ceph-public and
ceph-cluster
- RADOS bench 60s write: 115.2 MB/s, read: 329.0 MB/s
- network write through VM: 76.9 MB/s
Unfortunately the RADOS write benchmark is pretty inconclusive and I suspect my slow OSDs capping and distorting the performance. In another test cluster with similar network setup as in the first test, but more potent harddisk controllers, it goes above 160MB/s write speed.
Still, when you use the same network for ceph-public and VM traffic, there is a clear tendency for slowing down. Keep in mind I mostly had uni-directional traffic. The effect would probably be worse in a real-world scenario with more bi-directional load.
An interesting observation is test 4: it shows the limits of LACP and Ceph in small clusters. Ceph and LACP can in fact reach higher performance than a single ethernet link (not obvious here due to OSD performance cap), but the low "network/VM" throughput indicates it cannot nearly utilize all 4 links. Due to the nature of LACP this would probably change with more hosts, but here it clearly has no use.
In conclusion: the Proxmox Wiki is most certainly right: if you have a 10 GBit/s interface, put
ceph-public and
ceph-cluster on the same interface.
However, if you are fooling around with LACP in small clusters, separate your networks as much as possible, which includes keeping your VM client traffic out of them.
P.S.: sorry for using 2 different forum accounts. I work at different institutions and probably need to consolidate them.