[SOLVED] ceph problems moving cluster to new subnet

kyriazis

Well-Known Member
Oct 28, 2019
98
6
48
Austin, TX
Hello..

We have a proxmox cluster that moved sites, and cannot move the old subnet to the new site.

So far, I've done the following:
- Brought up the cluster on a private network not connected to the (new) site network
- Let ceph settle
- Followed online tutorials to change the proxmox cluster to the new subnet IP addresses (basically change /etc/hosts and /etc/network/interfaces and rebooted all nodes), while still not connected to the site network
- I am planning on connecting the site network, but I was thinking it's safer to do that after ceph has recovered; so I haven't done that yet.

Proxmox cluster is OK and has quorum. However, since this cluster is not connected to the site network I don't have GUI access, just console access through BMC.

My problem is with ceph. Since corosync has quorum, I modified /etc/pve/ceph.conf and changed all references of the old subnet to the new subnet, and rebooted all nodes. I am at a stage, though, where no nodes can connect to any monitors, and the mon log files complain that they can't reach the other mons through the old IP address, even though ceph.conf points to the new IP address.

I tried destroying a monitor using pveceph, but the command times out.
I tried creating a new monitor on a different node, but I get an error that it cannot connect to ceph cluster despite configured monitors.

What is the right way to configure the ceph cluster to the new network?

Thank you!
 
Manually update Ceph monitor IPs using ceph mon set-addrs, restart all monitors, and verify with ceph -s.
That doesn't work because no monitor is responding, "ceph -s" hangs, and no ceph commands responds.

However, I found a solution, and posting it here to help other people that may run into this:

I added the "old" IP address as a 2nd IP address on the monitor nodes. Then the monitors were able to see each other, and
"ceph -s" worked.

After that I was able to use pveceph to delete re-create the monitors one by one. I didn't have to mess around with the manager(s) and/or the MDSs, the cluster just came up after monitor quorum was established with the new IP addresses.

There were some small issues with time sync since I am not connected to an external network with an NTP server, but pending that issue, it just worked.
 
That doesn't work because no monitor is responding, "ceph -s" hangs, and no ceph commands responds.

However, I found a solution, and posting it here to help other people that may run into this:

I added the "old" IP address as a 2nd IP address on the monitor nodes. Then the monitors were able to see each other, and
"ceph -s" worked.

After that I was able to use pveceph to delete re-create the monitors one by one. I didn't have to mess around with the manager(s) and/or the MDSs, the cluster just came up after monitor quorum was established with the new IP addresses.

There were some small issues with time sync since I am not connected to an external network with an NTP server, but pending that issue, it just worked.
I'm glad the issue is resolved!