Unable to change ceph networks

mart.v · Apr 21, 2024

Hi guys,

I am trying to change Ceph networks. Since now I had one subnet (172.16.254.0/24) both for public and cluster network. The goal is to have this configuration:

Code:

cluster_network = 10.10.112.0/24
public_network = 10.10.111.0/24

I have followed this tutorial https://forum.proxmox.com/threads/ceph-changing-public-network.119116/#post-614241

Now I have successfully changed the public network - so all clients are using the new IP range, as well as MON, MGR and MDS.

The problem is when I try to restart first OSD. It show as UP, but it's gone after few seconds. I can see that the service is running, bind to correct IP, but somehow other OSDs are reporting it as dead:

Code:

2024-04-21T22:19:02.241484+0200 mon.node13 (mon.0) 9815 : cluster [DBG] osd.14 reported failed by osd.19
2024-04-21T22:19:02.527605+0200 mon.node13 (mon.0) 9818 : cluster [DBG] osd.14 reported failed by osd.15
2024-04-21T22:19:02.660661+0200 mon.node13 (mon.0) 9819 : cluster [DBG] osd.14 reported failed by osd.24
2024-04-21T22:19:02.803907+0200 mon.node13 (mon.0) 9820 : cluster [DBG] osd.14 reported failed by osd.58
2024-04-21T22:19:02.901519+0200 mon.node13 (mon.0) 9823 : cluster [DBG] osd.14 reported failed by osd.67
2024-04-21T22:19:03.462197+0200 mon.node13 (mon.0) 9831 : cluster [DBG] osd.14 reported failed by osd.32
2024-04-21T22:19:03.589426+0200 mon.node13 (mon.0) 9832 : cluster [DBG] osd.14 reported failed by osd.89

I have checked through telnet that host server with those OSDs can communicate with osd.14 on their port.

Code:

"osd":{
         "hb_front_addr":"[v2:10.10.111.201:6802/464238,v1:10.10.111.201:6803/464238]",
         "front_addr":"[v2:10.10.111.201:6800/464238,v1:10.10.111.201:6801/464238]",
         "hb_back_addr":"[v2:172.16.254.201:6802/464238,v1:172.16.254.201:6803/464238]",
         "back_addr":"[v2:172.16.254.201:6800/464238,v1:172.16.254.201:6801/464238]",
      }

When I roll back the changes and restart OSD with old cluster network, everything is fine. Running on 16.2.15.

Any ideas what could be wrong?

Moayad · Apr 25, 2024

Hi,

mart.v said:
Now I have successfully changed the public network - so all clients are using the new IP range, as well as MON, MGR and MDS.

Did you restart all the OSDs and MONs after the IP changed? Could you please also share the output of the /etc/pve/ceph.conf and the output of `pveceph status` command?

mart.v · Apr 26, 2024

Moayad said:
Did you restart all the OSDs and MONs after the IP changed? Could you please also share the output of the /etc/pve/ceph.conf and the output of `pveceph status` command?

Thank you for your reply. Yes, I did restart everything.

Now there is a small progress. As I mentioned in the first post, I was able to change the public network to 10.10.111.0/24. I was unable to change the cluster network to 10.10.112.0/24.

BUT when I tried to change the cluster network from 172.16.254.0/24 to 10.10.111.0/24 (same as new public network), it worked. I have restarted every service and it is running smooth.

I am still unable to change the cluster network from 10.10.111.0/24 to 10.10.112.0/24 to have separated networks. I encounter the very same error

Code:

cluster [DBG] osd.14 reported failed by osd.19

Status ceph:

Code:

# pveceph status
  cluster:
    id:     ecc963a4-009f-4236-87fe-e672a7cb5d49
    health: HEALTH_OK


  services:
    mon: 5 daemons, quorum node13,node99,node98,node1,node97 (age 3h)
    mgr: node16(active, since 46h), standbys: node17
    mds: 1/1 daemons up, 1 standby
    osd: 84 osds: 84 up (since 3h), 84 in (since 4d)


  data:
    volumes: 1/1 healthy
    pools:   7 pools, 1985 pgs
    objects: 43.90M objects, 49 TiB
    usage:   130 TiB used, 70 TiB / 200 TiB avail
    pgs:     1984 active+clean
             1    active+clean+scrubbing+deep


  io:
    client:   34 MiB/s rd, 76 MiB/s wr, 1.34k op/s rd, 6.18k op/s wr

Search

Search

Unable to change ceph networks

mart.v

Well-Known Member

Moayad

Proxmox Staff Member

mart.v

Well-Known Member

Attachments