Migrating Proxmox HA-Cluster with Ceph to new IP Subnet (Reup)

TechByGiusi

New Member
Apr 26, 2025
6
0
1
Hey there,


I am in the process of migrating my entire cluster, consisting of three nodes, to a new subnet.


Old addresses:
10.10.20.11/24, 10.10.20.12/24, 10.10.20.13/24


New addresses:
10.10.0.10/24, 10.10.0.11/24, 10.10.0.12/24


I have already updated all necessary files, following a guide I found on Reddit:
https://www.reddit.com/r/Proxmox/comments/10s48vm/how_can_i_change_the_ip_of_node_in_a_cluster/


The commands I used are:

Code:
# Stop the cluster services
systemctl stop pve-cluster
systemctl stop corosync

# Mount the filesystem locally
pmxcfs -l

# Edit the network interfaces file to have the new IP information
# Be sure to replace both the address and gateway
nano /etc/network/interfaces

# Replace any host entries with the new IP addresses
nano /etc/hosts

# Change the DNS server as necessary
nano /etc/resolv.conf

# Edit the corosync file and replace the old IPs with the new IPs for all hosts
# :%s/10\.10\.20\./10.10.0./g   <- vi command to replace all instances
# BE SURE TO INCREMENT THE config_version: x LINE BY ONE TO ENSURE THE CONFIG IS NOT OVERWRITTEN
nano /etc/pve/corosync.conf

# Edit the known hosts file to have the correct IPs
# :%s/10\.10\.20\./10.10.0./g   <- vi command to replace all instances
nano /etc/pve/priv/known_hosts

# If using ceph, edit the ceph configuration file to reflect the new network
# (thanks u/FortunatelyLethal)
# :%s/10\.10\.20\./10.10.0./g   <- vi command to replace all instances
nano /etc/ceph/ceph.conf

# If you want to be granular... fix the IP in /etc/issue
nano /etc/issue

# Verify there aren't any stragglers with the old IP hanging around
cd /etc
grep -R '10\.10\.20\.' *
cd /var
grep -R '10\.10\.20\.' *

# Reboot the system to cleanly restart all the networking and services
reboot

# Referenced pages:
# - https://forum.proxmox.com/threads/change-cluster-nodes-ip-addresses.33406/
# - https://pve.proxmox.com/wiki/Cluster_Manager#_remove_a_cluster_node

Here is my updated ceph.conf:

Code:
  GNU nano 7.2                                                                                                                                              /etc/ceph/ceph.conf                                                                                                                                                   
[global]
        auth_client_required = cephx
        auth_cluster_required = cephx
        auth_service_required = cephx
        cluster_network = 10.10.0.0/24
        fsid = 32c6fb6a-7e96-4863-9ac7-91b270e521e7
        mon_allow_pool_delete = true
        mon_host = 10.10.0.10 10.10.0.11 10.10.0.12
        ms_bind_ipv4 = true
        ms_bind_ipv6 = false
        osd_pool_default_min_size = 2
        osd_pool_default_size = 3
        public_network = 10.10.0.0/24

[client]
        keyring = /etc/pve/priv/$cluster.$name.keyring

[client.crash]
        keyring = /etc/pve/ceph/$cluster.$name.keyring

[mon.bad]
        public_addr = 10.10.0.10

[mon.team]
        public_addr = 10.10.0.11

[mon.work]
        public_addr = 10.10.0.12

Now i have the problem that the Ceph Cluster isn't starting. I think because of the mon - Monitors. They are still reffring to the old IPs.

Code:
Apr 26 11:10:51 bad systemd[1]: ceph-mon@bad.service: Failed with result 'exit-code'.
Apr 26 11:11:02 bad systemd[1]: ceph-mon@bad.service: Scheduled restart job, restart counter is at 6.
Apr 26 11:11:02 bad systemd[1]: Stopped ceph-mon@bad.service - Ceph cluster monitor daemon.
Apr 26 11:11:02 bad systemd[1]: ceph-mon@bad.service: Start request repeated too quickly.
Apr 26 11:11:02 bad systemd[1]: ceph-mon@bad.service: Failed with result 'exit-code'.
Apr 26 11:11:02 bad systemd[1]: Failed to start ceph-mon@bad.service - Ceph cluster monitor daemon.
-- Boot 0ae17430686c49fa89deba25114aef35 --
Apr 26 11:27:27 bad systemd[1]: Started ceph-mon@bad.service - Ceph cluster monitor daemon.
Apr 26 11:27:27 bad ceph-mon[1084]: 2025-04-26T11:27:27.428+0200 78b258e90d40 -1  Processor -- bind unable to bind to v2:10.10.20.11:3300/0: (99) Cannot assign requested address
Apr 26 11:27:27 bad ceph-mon[1084]: 2025-04-26T11:27:27.428+0200 78b258e90d40 -1  Processor -- bind was unable to bind. Trying again in 5 seconds
Apr 26 11:27:32 bad ceph-mon[1084]: 2025-04-26T11:27:32.429+0200 78b258e90d40 -1  Processor -- bind unable to bind to v2:10.10.20.11:3300/0: (99) Cannot assign requested address
Apr 26 11:27:32 bad ceph-mon[1084]: 2025-04-26T11:27:32.429+0200 78b258e90d40 -1  Processor -- bind was unable to bind. Trying again in 5 seconds
Apr 26 11:27:37 bad ceph-mon[1084]: 2025-04-26T11:27:37.430+0200 78b258e90d40 -1  Processor -- bind unable to bind to v2:10.10.20.11:3300/0: (99) Cannot assign requested address
Apr 26 11:27:37 bad ceph-mon[1084]: 2025-04-26T11:27:37.430+0200 78b258e90d40 -1  Processor -- bind was unable to bind after 3 attempts: (99) Cannot assign requested address
Apr 26 11:27:37 bad ceph-mon[1084]: 2025-04-26T11:27:37.430+0200 78b258e90d40 -1 unable to bind monitor to [v2:10.10.20.11:3300/0,v1:10.10.20.11:6789/0]
Apr 26 11:27:37 bad systemd[1]: ceph-mon@bad.service: Main process exited, code=exited, status=1/FAILURE
Apr 26 11:27:37 bad systemd[1]: ceph-mon@bad.service: Failed with result 'exit-code'.
Apr 26 11:27:47 bad systemd[1]: ceph-mon@bad.service: Scheduled restart job, restart counter is at 1.
Apr 26 11:27:47 bad systemd[1]: Stopped ceph-mon@bad.service - Ceph cluster monitor daemon.
Apr 26 11:27:47 bad systemd[1]: Started ceph-mon@bad.service - Ceph cluster monitor daemon.
Apr 26 11:27:47 bad ceph-mon[1357]: 2025-04-26T11:27:47.569+0200 770e920d9d40 -1  Processor -- bind unable to bind to v2:10.10.20.11:3300/0: (99) Cannot assign requested address
Apr 26 11:27:47 bad ceph-mon[1357]: 2025-04-26T11:27:47.569+0200 770e920d9d40 -1  Processor -- bind was unable to bind. Trying again in 5 seconds
Apr 26 11:27:52 bad ceph-mon[1357]: 2025-04-26T11:27:52.570+0200 770e920d9d40 -1  Processor -- bind unable to bind to v2:10.10.20.11:3300/0: (99) Cannot assign requested address
Apr 26 11:27:52 bad ceph-mon[1357]: 2025-04-26T11:27:52.570+0200 770e920d9d40 -1  Processor -- bind was unable to bind. Trying again in 5 seconds
Apr 26 11:27:57 bad ceph-mon[1357]: 2025-04-26T11:27:57.570+0200 770e920d9d40 -1  Processor -- bind unable to bind to v2:10.10.20.11:3300/0: (99) Cannot assign requested address
Apr 26 11:27:57 bad ceph-mon[1357]: 2025-04-26T11:27:57.570+0200 770e920d9d40 -1  Processor -- bind was unable to bind after 3 attempts: (99) Cannot assign requested address
Apr 26 11:27:57 bad ceph-mon[1357]: 2025-04-26T11:27:57.570+0200 770e920d9d40 -1 unable to bind monitor to [v2:10.10.20.11:3300/0,v1:10.10.20.11:6789/0]
Apr 26 11:27:57 bad systemd[1]: ceph-mon@bad.service: Main process exited, code=exited, status=1/FAILURE
Apr 26 11:27:57 bad systemd[1]: ceph-mon@bad.service: Failed with result 'exit-code'.

The Monitors are showing up in the GUI, the Managers not.

1745665105538.png

I wanted to add that with the following command:

Code:
monmaptool --print /tmp/monmap

I can list all monitors. However, they still show the old addresses, and all nodes are still the same:

Code:
root@bad:~# monmaptool --print /tmp/monmap
monmaptool: monmap file /tmp/monmap
epoch 3
fsid 32c6fb6a-7e96-4863-9ac7-91b270e521e7
last_changed 2024-09-02T20:43:23.417612+0200
created 2024-09-02T19:57:18.809703+0200
min_mon_release 18 (reef)
election_strategy: 1
0: [v2:10.10.20.11:3300/0,v1:10.10.20.11:6789/0] mon.bad
1: [v2:10.10.20.12:3300/0,v1:10.10.20.12:6789/0] mon.team
2: [v2:10.10.20.13:3300/0,v1:10.10.20.13:6789/0] mon.work

Now i want to fix it without loosing all my data. If there is no solution i hopefully can go also back.
What should i do next?

Thx
Giusi