Fixing broken cluster

jlgarnier

Active Member
May 25, 2021
38
3
28
Auriol, France
Hi Community,

I have two small servers in my homelab: one at 192.168.1.100, the other at 192.168.110. I had a network issue and decided to set a static DHCP address for the second at 192.168.1.120. Unfortunately, this broke the cluster: the UI returns the error message "hostname lookup 'LAB-server5' failed - failed to get address info for: LAB-server5: Name or service not known (500)" and freezes. I then need to manually edit the cluster config file to update the IP address for the second server.

I've read that I could edit the corosync.conf file (https://pve.proxmox.com/wiki/Cluster_Manager#pvecm_edit_corosync_conf), but it seems I don't have the required privileges, although I SSHed as 'root'...

Can anyone tell me what's the appropriate procedure to modify this file? If this is the proper file of course...

Thanks in advance for any help!
 
Done, but this is obviously not enough... How can I edit the a.k.a. "cluster config file" to indicate Server5 has moved to 192.168.1.120? The procedure listed in the wiki doesn't work because any command in /etc/pve is rejected (Permission denied, as root and www-data)...

I can't use pvecm either as it doesn't find server5:
Bash:
root@LAB-server1:~:$ pvecm nodes

Membership information
----------------------
    Nodeid      Votes Name
         1          1 LAB-server1 (local)
so I can't just
Code:
pvecm delnode LAB-server5
...

Thanks in advance for any help!
 
Last edited:
PVE blocks changes to cluster file system:
1. Stop the cluster services on the node you're changing (Server5)
Code:
systemctl stop pve-cluster
systemctl stop corosync
2. Mount pmxcfs in local mode (bypasses cluster lock) pmxcfs -l
3. Edit /etc/pve/corosync.conf and update the ip address (ring0_addr) and increase config_version (just add 1 to the value - other node will catch the newer config
4. Restart everything and regenerate certs:
Code:
killall pmxcfs
systemctl start corosync
systemctl start pve-cluster
pvecm updatecerts --force
5. Double check for any leftovers in /etc/hosts and /etc/network/interfaces
 
  • Like
Reactions: UdoB
Hi @psalkiewicz and thanks for this detailed procedure!

Everything went fine until I entered the
Code:
pvecm updatecerts --force
command, which got timed-out ("got timeout when trying to ensure cluster certificates and base file hierarchy is set up - no quorum (yet) or hung pmxcfs?").

Is there anything I must do on the second server? Should I edit the config file on Server1 too?

Thanks in advance for your help!