A little networking advice. Public/Cluster/Ceph

jkirker

Member
Feb 1, 2016
48
1
8
53
Orange County, California
I'm currently running 3 networks.

I've got public: 1x1G
xxx.xxx.xxx.xxx

Private/Cluster: 1x1G
10.100.10.x

Ceph: (2x10G on each host node and 4x10G on each Ceph node - all bonded)
172.16.0.x

Everything is going great with Ceph so far as I can see - and everything is working great with the cluster. However, if I clone a clustered node that lives on the Ceph network, it's taking 2 hours to clone a 500GB VM.

It doesn't seem right to me that it should take this long. So I'm wondering if when it copies, if the copy is happening over the Private/Cluster network.

If it is, should I just migrate my Private/Cluster network and combine it with the Ceph network to take advantage of the larger pipe?

Any suggestions and thoughts would be greatly appreciated.
 
Hi,

I wouldn't mix networks.

The fastes way would to use rbd copy.
 
OK... Thanks again Wolf.

Rather than bonding 2 10G's on each host node would you recommend that I dedicate 1x10G for the cluster and 1x10G for Ceph?

I'm not running any Ceph OSD's on the host nodes.

Because the speed is quite painful.

I'm at 22% after an hour with a 450G VM using the backup scheduler... :(

So I've gotta be doing this wrong. Snapshot Mode, Priority 7, LZO
 
Last edited:
Do you talk about copy or backup this are different things.
 
Cluster network.
Ignore what I wrote, it it not true in your setup.

If you make a clone (Full) or copy you use qemu rbd driver, so you are only using the ceph network.

backup on the source side is the same and on the destination side dependence on where you store it.

So I've gotta be doing this wrong. Snapshot Mode, Priority 7, LZO
No locks ok to me, but how fast is your ceph in the vm sequenziell write?
Where do you store the backup?
 
From inside a VM with the drive on Ceph:
hdparm -t /dev/vda1 I- 72 MB/sec

Moving the VM to a local spinner:
hdparm -t /dev/vda1 - 126 MB/sec

Moving the VM to a local SSD
hdparm -t /dev/vda1 336 MB/sec

Doing an hdparm -t directly on the local SSD /dev/sdb1 on the host node I'm getting 497 MB/sec

Shouldn't the hdparm be faster than 72 MB/s over Ceph with it being the only VM on ceph with no activity but the test and two ceph nodes with 8 or 9 odd's each? On a dual 10G connection?

I've tested two machines point to point over the 2x10G network and I can get full throughput direct. Again, perhaps I have something misconfigured. I dunno.. But it's super frustrating.

Thanks again for all your thoughtful advice.
 
I think you have not enough OSD or to slow one.

Because when I calculate your benchmark with you disk size the it takes over 4 hours.

Your bottleneck is not the network. If you have such a small setup you have to use SSD only cluster.
 
Thanks Wolfgang...

I've read in the Ceph docs that the minimum recommended OSD count is 11 and performance should increase from there. How many OSD's would you recommend as a minimum? I can add another JBOD hardware node and fill up the bays with drives, but I want to make sure that I'll see a performance increase before spending the money and effort.

If you feel that it'll make the difference I'm looking for I'll take that step.

I just figured I'd get pretty good performance with 16 spinners. In a RAID 6 config w/ 11 7200RPM SAS spinners I'm guessing I'd get throughput of at least 600-800MB. As much as 1800 in a RAID 0.

If I only had the budget to go all SSD... :)

With my current hardware, would you recommend that maybe I cease using Ceph for the time being and just turning my JBOD boxes into RAID 5 boxes using NFS?
 
Last edited:
I would never use ceph with less then 3 nodes.
Because every osd needs cpu power and also a real cluster has min 3 nodes.
Ceph is a cluster Filesystem.

Im not sure if you get happy with more spinning osd's.
 
Just to confirm,

When you speak of 3 Ceph nodes, you are referring to 3 ceph nodes holding OSD's yes?

Currently I have 3 low-power monitoring nodes, 3 2x8 e-5 front end web servers and 2x 12 drive JBOD nodes for CEPH with 8 spinners each.

So I have more than 3 nodes - however I only have 2 nodes dedicated to Ceph while the other systems are monitoring and pulling data from it.
 
Last edited:
Now I'm confused.

Are your osd and mon on different servers?
How much mon you have and how much osd you have total?
On how much server you have the osd?
Are the ceph server on ProxmoxVE?
 
Yes.

2 OSD nodes (2 JBOD servers w/ 8 spinners each for Ceph - capacity for 12 drives) w/ 4x10G each (overkill)
3 Compute Nodes (1x256SSD boot/swap, 1x1TB SSD Local Storage, 1x3TB Local Backup - web servers)
3 Monitor Nodes (Old spare parts servers brought back in)

I read somewhere that the OSD nodes should not have the monitors on them as it could cause internal flapping and some other issues. Not sure if I read that on Proxmox or Ceph forums.

So I have a total of 7 servers. The plan was to expand compute nodes and JBOD servers as needed when more resources are necessary.
 
I think I"m making some progress. While I'm not getting blazing fast speeds, I am getting 300-400MB/s with Ceph which is an improvement.

I noticed this presentation where he suggested 1 journal SSD per 3 OSD's. So I'm going to try 2 journal SSD per 4 OSD's.
http://slides.com/sebastienhan/ceph-performance-and-benchmarking#/27

Things still seem slow moving disks from local SSD to Ceph - slower than I feel they should.

Also, one thing that is interesting is that when I run the following on the host, vs. inside of a KVM that is on the same drive, the speeds are very different. Does KVM w/ CentOS have that much overhead which would impact iops? Or is there anything else that might be limiting it?

ProxMox host - Local SSD:
root@mhosttest:/home/vms# time sh -c "dd if=/dev/zero of=ddfile bs=8k count=250000 && sync"; rm ddfile
2048000000 bytes (2.0 GB) copied, 1.37107 s, 1.5 GB/s
real 0m5.304s
user 0m0.044s
sys 0m1.336s

KVM machine living on Local SSD w/ virtio0: (same SSD as above)
[root@kvmtest home]# time sh -c "dd if=/dev/zero of=ddfile bs=8k count=250000 && sync"; rm ddfile
250000+0 records in
250000+0 records out
2048000000 bytes (2.0 GB) copied, 3.61053 s, 567 MB/s
real 0m4.398s
user 0m0.046s
sys 0m3.798s
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!