[SOLVED] Migrating Proxmox HA-Cluster with Ceph to new IP Subnet

TechByGiusi · Apr 26, 2025

Hey there,

I am in the process of migrating my entire cluster, consisting of three nodes, to a new subnet.

Old addresses:
10.10.20.11/24, 10.10.20.12/24, 10.10.20.13/24

New addresses:
10.10.0.10/24, 10.10.0.11/24, 10.10.0.12/24

I have already updated all necessary files, following a guide I found on Reddit:
https://www.reddit.com/r/Proxmox/comments/10s48vm/how_can_i_change_the_ip_of_node_in_a_cluster/

The commands I used are:

Code:

# Stop the cluster services
systemctl stop pve-cluster
systemctl stop corosync

# Mount the filesystem locally
pmxcfs -l

# Edit the network interfaces file to have the new IP information
# Be sure to replace both the address and gateway
nano /etc/network/interfaces

# Replace any host entries with the new IP addresses
nano /etc/hosts

# Change the DNS server as necessary
nano /etc/resolv.conf

# Edit the corosync file and replace the old IPs with the new IPs for all hosts
# :%s/10\.10\.20\./10.10.0./g   <- vi command to replace all instances
# BE SURE TO INCREMENT THE config_version: x LINE BY ONE TO ENSURE THE CONFIG IS NOT OVERWRITTEN
nano /etc/pve/corosync.conf

# Edit the known hosts file to have the correct IPs
# :%s/10\.10\.20\./10.10.0./g   <- vi command to replace all instances
nano /etc/pve/priv/known_hosts

# If using ceph, edit the ceph configuration file to reflect the new network
# (thanks u/FortunatelyLethal)
# :%s/10\.10\.20\./10.10.0./g   <- vi command to replace all instances
nano /etc/ceph/ceph.conf

# If you want to be granular... fix the IP in /etc/issue
nano /etc/issue

# Verify there aren't any stragglers with the old IP hanging around
cd /etc
grep -R '10\.10\.20\.' *
cd /var
grep -R '10\.10\.20\.' *

# Reboot the system to cleanly restart all the networking and services
reboot

# Referenced pages:
# - https://forum.proxmox.com/threads/change-cluster-nodes-ip-addresses.33406/
# - https://pve.proxmox.com/wiki/Cluster_Manager#_remove_a_cluster_node

Here is my updated ceph.conf:

Code:

  GNU nano 7.2                                                                                                                                              /etc/ceph/ceph.conf                                                                                                                                                     
[global]
        auth_client_required = cephx
        auth_cluster_required = cephx
        auth_service_required = cephx
        cluster_network = 10.10.0.0/24
        fsid = 32c6fb6a-7e96-4863-9ac7-91b270e521e7
        mon_allow_pool_delete = true
        mon_host = 10.10.0.10 10.10.0.11 10.10.0.12
        ms_bind_ipv4 = true
        ms_bind_ipv6 = false
        osd_pool_default_min_size = 2
        osd_pool_default_size = 3
        public_network = 10.10.0.0/24

[client]
        keyring = /etc/pve/priv/$cluster.$name.keyring

[client.crash]
        keyring = /etc/pve/ceph/$cluster.$name.keyring

[mon.bad]
        public_addr = 10.10.0.10

[mon.team]
        public_addr = 10.10.0.11

[mon.work]
        public_addr = 10.10.0.12

Now i have the problem that the Ceph Cluster isn't starting. I think because of the mon - Monitors. They are still reffring to the old IPs.

Code:

Apr 26 11:10:51 bad systemd[1]: ceph-mon@bad.service: Failed with result 'exit-code'.
Apr 26 11:11:02 bad systemd[1]: ceph-mon@bad.service: Scheduled restart job, restart counter is at 6.
Apr 26 11:11:02 bad systemd[1]: Stopped ceph-mon@bad.service - Ceph cluster monitor daemon.
Apr 26 11:11:02 bad systemd[1]: ceph-mon@bad.service: Start request repeated too quickly.
Apr 26 11:11:02 bad systemd[1]: ceph-mon@bad.service: Failed with result 'exit-code'.
Apr 26 11:11:02 bad systemd[1]: Failed to start ceph-mon@bad.service - Ceph cluster monitor daemon.
-- Boot 0ae17430686c49fa89deba25114aef35 --
Apr 26 11:27:27 bad systemd[1]: Started ceph-mon@bad.service - Ceph cluster monitor daemon.
Apr 26 11:27:27 bad ceph-mon[1084]: 2025-04-26T11:27:27.428+0200 78b258e90d40 -1  Processor -- bind unable to bind to v2:10.10.20.11:3300/0: (99) Cannot assign requested address
Apr 26 11:27:27 bad ceph-mon[1084]: 2025-04-26T11:27:27.428+0200 78b258e90d40 -1  Processor -- bind was unable to bind. Trying again in 5 seconds
Apr 26 11:27:32 bad ceph-mon[1084]: 2025-04-26T11:27:32.429+0200 78b258e90d40 -1  Processor -- bind unable to bind to v2:10.10.20.11:3300/0: (99) Cannot assign requested address
Apr 26 11:27:32 bad ceph-mon[1084]: 2025-04-26T11:27:32.429+0200 78b258e90d40 -1  Processor -- bind was unable to bind. Trying again in 5 seconds
Apr 26 11:27:37 bad ceph-mon[1084]: 2025-04-26T11:27:37.430+0200 78b258e90d40 -1  Processor -- bind unable to bind to v2:10.10.20.11:3300/0: (99) Cannot assign requested address
Apr 26 11:27:37 bad ceph-mon[1084]: 2025-04-26T11:27:37.430+0200 78b258e90d40 -1  Processor -- bind was unable to bind after 3 attempts: (99) Cannot assign requested address
Apr 26 11:27:37 bad ceph-mon[1084]: 2025-04-26T11:27:37.430+0200 78b258e90d40 -1 unable to bind monitor to [v2:10.10.20.11:3300/0,v1:10.10.20.11:6789/0]
Apr 26 11:27:37 bad systemd[1]: ceph-mon@bad.service: Main process exited, code=exited, status=1/FAILURE
Apr 26 11:27:37 bad systemd[1]: ceph-mon@bad.service: Failed with result 'exit-code'.
Apr 26 11:27:47 bad systemd[1]: ceph-mon@bad.service: Scheduled restart job, restart counter is at 1.
Apr 26 11:27:47 bad systemd[1]: Stopped ceph-mon@bad.service - Ceph cluster monitor daemon.
Apr 26 11:27:47 bad systemd[1]: Started ceph-mon@bad.service - Ceph cluster monitor daemon.
Apr 26 11:27:47 bad ceph-mon[1357]: 2025-04-26T11:27:47.569+0200 770e920d9d40 -1  Processor -- bind unable to bind to v2:10.10.20.11:3300/0: (99) Cannot assign requested address
Apr 26 11:27:47 bad ceph-mon[1357]: 2025-04-26T11:27:47.569+0200 770e920d9d40 -1  Processor -- bind was unable to bind. Trying again in 5 seconds
Apr 26 11:27:52 bad ceph-mon[1357]: 2025-04-26T11:27:52.570+0200 770e920d9d40 -1  Processor -- bind unable to bind to v2:10.10.20.11:3300/0: (99) Cannot assign requested address
Apr 26 11:27:52 bad ceph-mon[1357]: 2025-04-26T11:27:52.570+0200 770e920d9d40 -1  Processor -- bind was unable to bind. Trying again in 5 seconds
Apr 26 11:27:57 bad ceph-mon[1357]: 2025-04-26T11:27:57.570+0200 770e920d9d40 -1  Processor -- bind unable to bind to v2:10.10.20.11:3300/0: (99) Cannot assign requested address
Apr 26 11:27:57 bad ceph-mon[1357]: 2025-04-26T11:27:57.570+0200 770e920d9d40 -1  Processor -- bind was unable to bind after 3 attempts: (99) Cannot assign requested address
Apr 26 11:27:57 bad ceph-mon[1357]: 2025-04-26T11:27:57.570+0200 770e920d9d40 -1 unable to bind monitor to [v2:10.10.20.11:3300/0,v1:10.10.20.11:6789/0]
Apr 26 11:27:57 bad systemd[1]: ceph-mon@bad.service: Main process exited, code=exited, status=1/FAILURE
Apr 26 11:27:57 bad systemd[1]: ceph-mon@bad.service: Failed with result 'exit-code'.

The Monitors are showing up in the GUI, the Managers not.

Now i want to fix it without loosing all my data. If there is no solution i hopefully can go also back.
What should i do next?

Thx
Giusi

TechByGiusi · Apr 26, 2025

PS: I just wanted to add that with the following command:

Code:

monmaptool --print /tmp/monmap

I can list all monitors. However, they still show the old addresses, and all nodes are still the same:

Code:

root@bad:~# monmaptool --print /tmp/monmap
monmaptool: monmap file /tmp/monmap
epoch 3
fsid 32c6fb6a-7e96-4863-9ac7-91b270e521e7
last_changed 2024-09-02T20:43:23.417612+0200
created 2024-09-02T19:57:18.809703+0200
min_mon_release 18 (reef)
election_strategy: 1
0: [v2:10.10.20.11:3300/0,v1:10.10.20.11:6789/0] mon.bad
1: [v2:10.10.20.12:3300/0,v1:10.10.20.12:6789/0] mon.team
2: [v2:10.10.20.13:3300/0,v1:10.10.20.13:6789/0] mon.work

gurubert · Apr 26, 2025

You cannot just change the IPs in the ceph.conf file.

You need to deploy new MONs first that listen to the new IP addresses. These addresses are recorded in the mon map as you already have seen.

The IPs in ceph.conf are just for all the other processes talking to the Ceph cluster.

TechByGiusi · Apr 27, 2025

gurubert said:
You cannot just change the IPs in the ceph.conf file.

You need to deploy new MONs first that listen to the new IP addresses. These addresses are recorded in the mon map as you already have seen.

The IPs in ceph.conf are just for all the other processes talking to the Ceph cluster.

First of all, thank you for the clarification!
Would you maybe be able to recommend a way to do this properly, or is there a nice step-by-step guide you would suggest to accomplish it?

If i destroy a Monitor via GUI and want to create a new one, there is this error:

(Tested it with the last node - work has currently 10.10.0.12/24)

TechByGiusi · Apr 27, 2025

I reverted all nodes back to the old network first. Then I followed this process:

Code:

# Edit the network interfaces file to have the new IP information
# Be sure to replace both the address and gateway
nano /etc/network/interfaces

# Replace any host entries with the new IP addresses
nano /etc/hosts

# Change the DNS server as necessary
nano /etc/resolv.conf

# Edit the corosync file and replace the old IPs with the new IPs for all hosts
# -> Change only the IP from the node you are currently working on
nano /etc/pve/corosync.conf

# Edit the known hosts file to have the correct IPs (Only when you are on the run to change the master node IP)
# -> Insert the new IP of the Master node
nano /etc/pve/priv/known_hosts

# If using ceph, edit the ceph configuration file to reflect the new network
# -> change cluster_network = 10.10.20.0/24,10.10.0.0/24
# -> change public_network to = 10.10.20.0/24,10.10.0.0/24
nano /etc/ceph/ceph.conf

# Reboot the system to cleanly restart all the networking and services
reboot

# Delete the Monitor of the node that was changed vie GUI

# Edit ceph.conf
nano /etc/pve/ceph.conf
# -> temporarily change public_network to only = 10.10.0.0/24

# Create the MON
pveceph mon create

# Edit ceph.conf back to normal
nano /etc/pve/ceph.conf
# -> change public_network to = 10.10.20.0/24,10.10.0.0/24
# -> make sure you have this section:

[mon.work]
public_addr = 10.10.0.12

# -> Repeat the whole process two more times (if you have 3 nodes)
# -> After the last one the public_network and cluster_network can be changed to only 10.10.0.0/24

# Reboot the system to cleanly restart all the networking and services
reboot

This approach allowed the cluster to stay operational without any downtime.
I’m currently verifying that everything is still working correctly. So far, I’ve done three runs, starting with the last node and finishing with the master node.

Search

Search

[SOLVED] Migrating Proxmox HA-Cluster with Ceph to new IP Subnet

TechByGiusi

New Member

TechByGiusi

New Member

gurubert

Distinguished Member

TechByGiusi

New Member

TechByGiusi

New Member

We value your privacy