Adding cluster_network to an existing ALL public_network configuration

liszca

Active Member
May 8, 2020
67
1
28
23
Since new Hardware has arrived I wanted to configure a separate network for the OSDs

Its 4 Hosts each has one OSD

Code:
 # ceph osd tree
ID  CLASS  WEIGHT   TYPE NAME         STATUS  REWEIGHT  PRI-AFF
-1         3.72595  root default                              
-3         0.93149      host aegaeon                          
 0    ssd  0.93149          osd.0         up   1.00000  1.00000
-5         0.93149      host anthe                            
 1    ssd  0.93149          osd.1         up   1.00000  1.00000
-7         0.93149      host atlas                            
 2    ssd  0.93149          osd.2         up   1.00000  1.00000
-9         0.93149      host calypso                          
 3    ssd  0.93149          osd.3         up   1.00000  1.00000

So I configured the extra ethernet with static IPs: 10.1.0.10..13 for host aegaeon,anthe,atlas,calypso.
Then I changed /etc/ceph/ceph.conf: cluster_network 10.1.0.10/24 followed by a restart: systemctl restart ceph*

After host 10.1.0.10 came back I checked for cluster_network:
ceph config set global cluster_network
This gave me the expected result so I did systemctl restart ceph* on every other node.
But somehow it didn't find together by itself complaining about slow ops.
Exakt message:
"oldest one blocked for 221 sec, mon.aegaeon has slow ops"

Is my approach wrong or do I have to set the cluster_network differently to how I did?
 
Last edited:
The services might need to be recreated as they might still run only with public ips in ceph. Whats your network bandwith you have in the ceph-networks? Separating ceph networks usually helps the mot when you are on low-bandwithn and have more then 3 hosts. Clusternetwork helps the most on recovery. I would destroy one of the mons (if you have 4) and the create a new one, delete the next one, recreate it etc. until your finished

Can you share your /etc/network/interfaces file? Having it only with one osd, might not help that much. Its likely that your limiting the performance with the single osd.
 
Last edited:
The services might need to be recreated as they might still run only with public ips in ceph. Whats your network bandwith you have in the ceph-networks? Separating ceph networks usually helps the mot when you are on low-bandwithn and have more then 3 hosts. Clusternetwork helps the most on recovery. I would destroy one of the mons (if you have 4) and the create a new one, delete the next one, recreate it etc. until your finished
I planned for 2.5Gb. But 3 of the 5 USB 2.5Gb where not able to operate correctly.

After managing to set it up by removal of the fault USB Ethernet it is running. But I am using 1Gb ethernet.
Recovery doesn't seem to use even full bandwith of 1Gb, I am curious if the remaining USB Ethernet is still faulty.

In case somebody is interested which hardware:
https://geizhals.de/inter-tech-argus-it-732-lan-adapter-88885593-a2750561.html
What I noticed on the broken ones:
  • The only managed to negotiate for 1Gb
  • And they didn't git as warm as the working ones, which I think is normal in 1Gb mode

Can you share your /etc/network/interfaces file? Having it only with one osd, might not help that much. Its likely that your limiting the performance with the single osd.
I have multiple diffent network configurations:

Host: Aegaeon
Code:
auto lo
iface lo inet loopback

iface enxf4b52021da43 inet manual
    ethernet-wol g

auto enx00e04c680029
iface enx00e04c680029 inet static
    address 10.1.0.10/24

auto vmbr0
iface vmbr0 inet static
    address 192.168.0.10/24
    gateway 192.168.0.1
    bridge-ports enxf4b52021da43
    bridge-stp off
    bridge-fd 0

Host: Anthe
Code:
auto lo
iface lo inet loopback

iface enx6045cba2e668 inet manual
    ethernet-wol g

iface enx001f2955f0d4 inet manual

auto enx001f2955f0d5
iface enx001f2955f0d5 inet static
    address 10.1.0.11/24

auto vmbr0
iface vmbr0 inet static
    address 192.168.0.11/24
    gateway 192.168.0.1
    bridge-ports enx6045cba2e668
    bridge-stp off
    bridge-fd 0

Host: Atlas
Code:
Is Powered off for testing recovery speed


Host: Calypso
Code:
auto enx00e04c680053
iface enx00e04c680053 inet static
    address 10.1.0.13/24
    ethernet-wol g

iface enxf4b520183dac inet manual

auto vmbr0
iface vmbr0 inet static
    address 192.168.0.13/24
    gateway 192.168.0.1
    bridge-ports enxf4b520183dac
    bridge-stp off
    bridge-fd 0


And the Ceph config, I also added the OSDs to the public net in addition

1708826109306.png

Code:
[global]
     auth_client_required = cephx
     auth_cluster_required = cephx
     auth_service_required = cephx
     cluster_network = 10.1.0.0/24
     #cluster_network = 192.168.0.0/24
     fsid = ddfe12d5-782f-4028-b499-71f3e6763d8a
     mon_allow_pool_delete = true
     mon_host = 192.168.0.10 192.168.0.11 192.168.0.12 192.168.0.13
     ms_bind_ipv4 = true
     ms_bind_ipv6 = false
     osd_pool_default_min_size = 2
     osd_pool_default_size = 3
     public_network = 192.168.0.0/24

[client]
     keyring = /etc/pve/priv/$cluster.$name.keyring

[mds]
     keyring = /var/lib/ceph/mds/ceph-$id/keyring

[mds.aegaeon]
     host = aegaeon
     mds_standby_for_name = pve

[mds.anthe]
     host = anthe
     mds standby for name = pve

[mds.atlas]
     host = atlas
     mds_standby_for_name = pve

[mds.calypso]
     host = calypso
     mds_standby_for_name = pve

[mon.aegaeon]
     public_addr = 192.168.0.10

[mon.anthe]
     public_addr = 192.168.0.11

[mon.atlas]
     public_addr = 192.168.0.12

[mon.calypso]
     public_addr = 192.168.0.13

[osd]
    public_network = 192.168.0.0/24
    cluster_network = 10.1.0.0/24

[osd.0]
    host = aegaeon
    public_addr = 192.168.0.10/24
    cluster_addr = 10.1.0.10/24
[osd.1]
        host = anthe
    public_addr = 192.168.0.11/24
    cluster_addr = 10.1.0.11/24
[osd.2]
        host = atlas
    public_addr = 192.168.0.12/24
    cluster_addr = 10.1.0.12/24
[osd.3]
    host = calypso
    public_addr = 192.168.0.13/24
    cluster_addr = 10.1.0.13/24
 
Last edited: