ceph gui error timout and not able to connect to monitor after changing all cluster nodes to new ips

aluisell

Member
May 30, 2019
7
1
21
53
Hi,

I had to change the public ip of the nodes in the cluster (3 nodes) to a new network ip, moved from 192.168.7.x to 192.168.29.x
to do that after initial simple node network ip changes I followed the following guide and steps:
https://bookstack.dismyserver.net/b...ge-the-ip-address-of-a-proxmox-clustered-node
to summarize it i run the following commands:

# Stop the cluster services
systemctl stop pve-cluster
systemctl stop corosync

# Mount the filesystem locally
pmxcfs -l

# Edit the network interfaces file to have the new IP information
# Be sure to replace both the address and gateway
vi /etc/network/interfaces

# Replace any host entries with the new IP addresses
vi /etc/hosts

# Change the DNS server as necessary
vi /etc/resolv.conf

# Edit the corosync file and replace the old IPs with the new IPs for all hosts
# :%s/192\.168\.7\./192.168.29./g <- vi command to replace all instances
# BE SURE TO INCREMENT THE config_version: x LINE BY ONE TO ENSURE THE CONFIG IS NOT OVERWRITTEN
vi /etc/pve/corosync.conf

# Edit the known hosts file to have the correct IPs
# :%s/192\.168\.7\./192.168.29./g <- vi command to replace all instances
/etc/pve/priv/known_hosts

# If using ceph, edit the ceph configuration file to reflect the new network
# :%s/192\.168\.7\./192.168.29./g <- vi command to replace all instances
vi /etc/ceph/ceph.conf
vi /etc/issue

# Reboot the system to cleanly restart all the networking and services
reboot

after this I got the cluster working the the gui working.
What is still not working is the ceph gui parts

I did some other search on the web and after doing this

root@pve3:~# systemctl status ceph-mon@pve3
ceph-mon@pve3.service - Ceph cluster monitor daemon
Loaded: loaded (/lib/systemd/system/ceph-mon@.service; enabled; preset: enabled)
Drop-In: /usr/lib/systemd/system/ceph-mon@.service.d
└─ceph-after-pve-cluster.conf
Active: active (running) since Fri 2024-01-05 19:28:56 CET; 1s ago
Main PID: 7206 (ceph-mon)
Tasks: 13
Memory: 13.9M
CPU: 65ms
CGroup: /system.slice/system-ceph\x2dmon.slice/ceph-mon@pve3.service
└─7206 /usr/bin/ceph-mon -f --cluster ceph --id pve3 --setuser ceph --setgroup ceph

Jan 05 19:28:56 pve3 systemd[1]: Started ceph-mon@pve3.service - Ceph cluster monitor daemon.
Jan 05 19:28:56 pve3 ceph-mon[7206]: 2024-01-05T19:28:56.305+0100 7fca24969d40 -1 Processor -- bind unable to bind to v2:192.168.7.42:3300/0: (99) Cannot assign requested >
Jan 05 19:28:56 pve3 ceph-mon[7206]: 2024-01-05T19:28:56.305+0100 7fca24969d40 -1 Processor -- bind was unable to bind. Trying again in 5 seconds

monitors is still binded and try to connect to old ip
root@pve3:~# pveceph mon destroy pve3
got timeout

root@pve3:~# pveceph mon create
Could not connect to ceph cluster despite configured monitors

root@pve1:~# ceph -s
2024-01-05T19:24:57.366+0100 7fe5bdc6d6c0 0 monclient(hunting): authenticate timed out after 300
2024-01-05T19:29:57.367+0100 7fe5bdc6d6c0 0 monclient(hunting): authenticate timed out after 300

I also check osd and target which seems to be fine. (below status from node pve3)

root@pve3:~# systemctl status ceph-osd@2.service
ceph-osd@2.service - Ceph object storage daemon osd.2
Loaded: loaded (/lib/systemd/system/ceph-osd@.service; enabled-runtime; preset: enabled)
Drop-In: /usr/lib/systemd/system/ceph-osd@.service.d
└─ceph-after-pve-cluster.conf
Active: active (running) since Fri 2024-01-05 19:51:08 CET; 14min ago
Process: 15460 ExecStartPre=/usr/libexec/ceph/ceph-osd-prestart.sh --cluster ${CLUSTER} --id 2 (code=exited, status=0/SUCCESS)
Main PID: 15464 (ceph-osd)
Tasks: 8
Memory: 11.9M
CPU: 2.689s
CGroup: /system.slice/system-ceph\x2dosd.slice/ceph-osd@2.service
└─15464 /usr/bin/ceph-osd -f --cluster ceph --id 2 --setuser ceph --setgroup ceph

Jan 05 19:51:08 pve3 systemd[1]: Starting ceph-osd@2.service - Ceph object storage daemon osd.2...
Jan 05 19:51:08 pve3 systemd[1]: Started ceph-osd@2.service - Ceph object storage daemon osd.2.
root@pve3:~# systemctl status ceph-osd
ceph-osd@2.service ceph-osd.target
root@pve3:~# systemctl status ceph-osd.target
● ceph-osd.target - ceph target allowing to start/stop all ceph-osd@.service instances at once
Loaded: loaded (/lib/systemd/system/ceph-osd.target; enabled; preset: enabled)
Active: active since Fri 2024-01-05 19:51:08 CET; 14min ago

Jan 05 19:51:08 pve3 systemd[1]: Reached target ceph-osd.target - ceph target allowing to start/stop all ceph-osd@.service instances at once.


Anything I can do to change binding? thanks for your help.
 
How to change the Mon IPs is described in the CEPH documentation. Including the note that existing Mons are not designed to allow their IP to be changed.

See if you can at least straighten things out a bit with the instructions: https://docs.ceph.com/en/latest/rados/operations/add-or-rm-mons/#changing-a-monitor-s-ip-address

I wouldn't blame you for taking some instructions from the internet and not looking into the possible instructions for the products. I think the disorder itself is a sufficient lesson here that you don't carry out something that you find.
 
Thanks for your help @sb-jw

I had few more issues cause I was not able to connect at all to ceph cluster... basically it's not possible if monitors are down.
At the end I made it working again following below described steps...a bit different from what is described in the ceph article which was anyway fundamental to find a solution.

It would be great to have the procedure to change ip of an hyperconverged cluster with ceph in the pve manual...
as well as also the instruction to update ceph cluster in such a case...



#check cluster status and check monitor status
pvecm status
systemctl status ceph-mon@nodename
#stopping monitor on nodename
systemctl stop ceph-mon@nodename

#create folder to write monitor map
mkdir /monmap
#monitor map extraction from nodename
ceph-mon -i {mon id, usually nodename} --extract-monmap /folder/filename-for-monitor-map-as-your-preference

#to visualize mon map it should show varius info see below example
monmaptool --print /folder/filename-for-monitor-map-used-before

#epoch 6
#fsid 394429c3-28cb-457b-93af-2ff25058634e
#last_changed 2023-11-24T08:44:25.309865+0100
#created 2022-12-19T16:23:57.106805+0100
#min_mon_release 18 (reef)
#election_strategy: 1
#0: v2:192.168.29.40:3300/0 mon.pve1
#1: v2:192.168.29.41:3300/0 mon.pve2
#2: v2:192.168.29.42:3300/0 mon.pve3

#ip cannot be changed, mon info need to be first removed and the added with new info with following commands
#to remove monitor info
monmaptool --rm {nodename1 in my case pve1} --rm {nodename2 in my case pve2}--rm {nodename3 in my case pve3} /folder/filename-for-monitor-map-used-before
#to add new montiro info
monmaptool --add {node name:tcp port as it was in the original file example pve1 192.168.29.40:3300} --add pve2 192.168.29.41:3300 --add pve3 192.168.29.42:3300 /folder/filename-for-monitor-map-used-before

#now new mapfile need to be injected in the monitor map
ceph-mon -i {mon id, usually nodename} --inject-monmap /folder/filename-for-monitor-map-used-before
#now is possible to start node monitor
systemctl start ceph-mon@nodename
#you can check also mon status again
systemctl status ceph-mon@nodename
 
  • Like
Reactions: Kirovski

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!