[SOLVED] Ceph Public/Cluster Networks another question

RobFantini

Famous Member
May 24, 2012
2,023
107
133
Boston,Mass
we have this in our ceph.conf for few years.
Code:
public_network = 10.11.12.0/24
cluster_network = 10.11.12.0/24

pve vm's run at 10.1.10.0/24 on a seprate pair of switches used in lacp bond . corosync.conf uses 2 other nics and switches for cluster communications.

having read forum posts, pve docs and https://docs.ceph.com/en/latest/rados/configuration/network-config-ref/ , it seems that using separate networks for public and cluster does not make much of a difference in for our 7 node cluster. However I am not an expert at ceph or cluster so do not know the correct choice .

Question - should we just use the pve vm network 10.1.10.0/24 for the ceph network address ?
 
Last edited:
So having thought ceph probably uses those addresses to prioritize networks. I will set public to vm network range 10.1.0.0/16 .

so another question:
should pbs access the storage network using the public or cluster network?
 
so there is no way to answer my question with out having a lot more info on our set up.

currently our public / vm network has 1G , ceph cluster 10G . so we'll use the 10G for pbs and do backups after office hours.

later we'll have a 10G public / vm network and a 40G ceph cluster . we'd use the 10G for pbs .
 
public network is traffic from vm to osd , and vm to monitor.
private network is for replication between osd. (optionnal, if not defined, it'll use the public network)

so, if you have only 1x10G network, simply use it as public network.

if you use 1G for your public network, you can't read/write more than 1G. and osd will not replicate faster on the 10G network.
 
  • Like
Reactions: RobFantini
public network is traffic from vm to osd , and vm to monitor.
private network is for replication between osd. (optionnal, if not defined, it'll use the public network)

so, if you have only 1x10G network, simply use it as public network.

if you use 1G for your public network, you can't read/write more than 1G. and osd will not replicate faster on the 10G network.
that makes sense , thank you.
 
Hi, please, if I understand it correct, my setup is wrong?

Code:
[global]
     auth_client_required = cephx
     auth_cluster_required = cephx
     auth_service_required = cephx
     cluster_network = 172.30.255.2/24
     fsid = 6f7535ba-40b5-493d-84ce-63548cbf502e
     mon_allow_pool_delete = true
     mon_host = 172.30.250.1 172.30.250.2 172.30.250.3
     osd_pool_default_min_size = 2
     osd_pool_default_size = 2
     public_network = 172.30.250.1/26

cluster_network = 172.30.255.2/24 = 10Gbs interface
public_network = 172.30.250.1/26 = 1Gbs interface

I having issue with I/O on ceph disk, where I can max 250M write/read. I have SSD as OSD (Intel DC4500 960GB) When I tried benchmark on SSD disks, which are used as OSD, there was I/O about 500MB, not only 250MB, so have I incorrect setup ceph?

Here is my results via rados tool:
Code:
root@pve1:~# rados -p ceph-rbd bench 10 write --no-cleanup
hints = 1
Maintaining 16 concurrent writes of 4194304 bytes to objects of size 4194304 for up to 10 seconds or 0 objects
Object prefix: benchmark_data_pve1_3440803
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
    0       0         0         0         0         0           -           0
    1      16        82        66   263.966       264    0.542006    0.180852
    2      16       137       121   241.968       220     0.55921    0.229871
    3      16       211       195   259.965       296   0.0349484    0.231927
    4      16       277       261   260.965       264    0.538848    0.230039
    5      16       336       320   255.963       236   0.0300209    0.238909
    6      16       387       371   247.297       204    0.569416    0.247073
    7      16       435       419   239.392       192    0.483429    0.256821
    8      16       492       476   237.963       228    0.523084    0.259917
    9      16       554       538   239.074       248    0.439085    0.260591
   10      16       604       588   235.164       200   0.0236146    0.264923
Total time run:         10.5448
Total writes made:      605
Write size:             4194304
Object size:            4194304
Bandwidth (MB/sec):     229.497
Stddev Bandwidth:       33.1354
Max bandwidth (MB/sec): 296
Min bandwidth (MB/sec): 192
Average IOPS:           57
Stddev IOPS:            8.28385
Max IOPS:               74
Min IOPS:               48
Average Latency(s):     0.274623
Stddev Latency(s):      0.252756
Max latency(s):         1.1047
Min latency(s):         0.0189343
root@pve1:~# rados -p ceph-rbd bench 10 seq
hints = 1
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
    0       0         0         0         0         0           -           0
    1      15        80        65   259.813       260   0.0199865    0.179958
    2      16       136       120   239.893       220   0.0197613    0.229199
    3      16       210       194   258.577       296    0.499108    0.228889
    4      16       276       260   259.921       264    0.533227    0.230233
    5      16       335       319   255.129       236    0.548045    0.238376
    6      16       386       370   246.603       204   0.0174453    0.247193
    7      16       434       418   238.799       192    0.496285    0.256982
    8      15       488       473   236.345       220   0.0237839    0.259635
    9      16       550       534    237.19       244    0.591687    0.260797
   10      16       597       581    232.27       188   0.0183163    0.266789
Total time run:       10.5106
Total reads made:     598
Read size:            4194304
Object size:          4194304
Bandwidth (MB/sec):   227.58
Average IOPS:         56
Stddev IOPS:          8.60814
Max IOPS:             74
Min IOPS:             47
Average Latency(s):   0.278147
Max latency(s):       1.00734
Min latency(s):       0.0173812
root@pve1:~# ceph -s
  cluster:
    id:     6f7535ba-40b5-493d-84ce-63548cbf502e
    health: HEALTH_WARN
            mon pve2 is low on available space
 
  services:
    mon: 3 daemons, quorum pve1,pve2,pve3 (age 12h)
    mgr: pve2(active, since 16h), standbys: pve1, pve3
    osd: 2 osds: 2 up (since 5d), 2 in (since 8d)
 
  data:
    pools:   1 pools, 128 pgs
    objects: 148.86k objects, 581 GiB
    usage:   1.1 TiB used, 655 GiB / 1.7 TiB avail
    pgs:     128 active+clean
 
  io:
    client:   1.3 KiB/s rd, 14 KiB/s wr, 0 op/s rd, 3 op/s wr

Thanks for help.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!