Hi guys,
I am trying to change Ceph networks. Since now I had one subnet (172.16.254.0/24) both for public and cluster network. The goal is to have this configuration:
I have followed this tutorial https://forum.proxmox.com/threads/ceph-changing-public-network.119116/#post-614241
Now I have successfully changed the public network - so all clients are using the new IP range, as well as MON, MGR and MDS.
The problem is when I try to restart first OSD. It show as UP, but it's gone after few seconds. I can see that the service is running, bind to correct IP, but somehow other OSDs are reporting it as dead:
I have checked through telnet that host server with those OSDs can communicate with osd.14 on their port.
When I roll back the changes and restart OSD with old cluster network, everything is fine. Running on 16.2.15.
Any ideas what could be wrong?
I am trying to change Ceph networks. Since now I had one subnet (172.16.254.0/24) both for public and cluster network. The goal is to have this configuration:
Code:
cluster_network = 10.10.112.0/24
public_network = 10.10.111.0/24
I have followed this tutorial https://forum.proxmox.com/threads/ceph-changing-public-network.119116/#post-614241
Now I have successfully changed the public network - so all clients are using the new IP range, as well as MON, MGR and MDS.
The problem is when I try to restart first OSD. It show as UP, but it's gone after few seconds. I can see that the service is running, bind to correct IP, but somehow other OSDs are reporting it as dead:
Code:
2024-04-21T22:19:02.241484+0200 mon.node13 (mon.0) 9815 : cluster [DBG] osd.14 reported failed by osd.19
2024-04-21T22:19:02.527605+0200 mon.node13 (mon.0) 9818 : cluster [DBG] osd.14 reported failed by osd.15
2024-04-21T22:19:02.660661+0200 mon.node13 (mon.0) 9819 : cluster [DBG] osd.14 reported failed by osd.24
2024-04-21T22:19:02.803907+0200 mon.node13 (mon.0) 9820 : cluster [DBG] osd.14 reported failed by osd.58
2024-04-21T22:19:02.901519+0200 mon.node13 (mon.0) 9823 : cluster [DBG] osd.14 reported failed by osd.67
2024-04-21T22:19:03.462197+0200 mon.node13 (mon.0) 9831 : cluster [DBG] osd.14 reported failed by osd.32
2024-04-21T22:19:03.589426+0200 mon.node13 (mon.0) 9832 : cluster [DBG] osd.14 reported failed by osd.89
I have checked through telnet that host server with those OSDs can communicate with osd.14 on their port.
Code:
"osd":{
"hb_front_addr":"[v2:10.10.111.201:6802/464238,v1:10.10.111.201:6803/464238]",
"front_addr":"[v2:10.10.111.201:6800/464238,v1:10.10.111.201:6801/464238]",
"hb_back_addr":"[v2:172.16.254.201:6802/464238,v1:172.16.254.201:6803/464238]",
"back_addr":"[v2:172.16.254.201:6800/464238,v1:172.16.254.201:6801/464238]",
}
When I roll back the changes and restart OSD with old cluster network, everything is fine. Running on 16.2.15.
Any ideas what could be wrong?