Change hostname killed my cluster

Moz

Member
Dec 15, 2022
13
0
6
Hello,
I did a mistake, I changed the hostname of my cluster without thinking it will destroy so many things..

I changed the hostname from ansible with
Code:
- name: Set a hostname
  become: true
  ansible.builtin.hostname:
    name: "{{ local_dns }}" # changing mox01.ether-source.fr to cyclops-alpha.ether-source.fr
Code:
root@cyclops-alpha:~# pvecm add cyclops-alpha.ether-source.fr
ipcc_send_rec[1] failed: Connection refused
ipcc_send_rec[2] failed: Connection refused
ipcc_send_rec[3] failed: Connection refused
Unable to load access control list: Connection refused




Then I changed the hosts file with this template
Code:
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

{% for host in groups['init'] %}
{{ hostvars[host]['ansible_default_ipv4']['address'] }} {{ hostvars[host]['local_dns']}}
{% endfor %}

Well, it's not working anymore now and I would like to recover nodes.
I saw some tutorials, tried so many things, I think I do need some helps :(
Code:
● pveproxy.service - PVE API Proxy Server
     Loaded: loaded (/lib/systemd/system/pveproxy.service; enabled; vendor preset: enabled)
     Active: active (running) since Thu 2022-12-15 23:06:49 CET; 18min ago
    Process: 1210 ExecStartPre=/usr/bin/pvecm updatecerts --silent (code=exited, status=111)
    Process: 1211 ExecStart=/usr/bin/pveproxy start (code=exited, status=0/SUCCESS)
   Main PID: 1212 (pveproxy)
      Tasks: 4 (limit: 76987)
     Memory: 133.9M
        CPU: 15.954s
     CGroup: /system.slice/pveproxy.service
             ├─1212 pveproxy
             ├─2080 pveproxy worker
             ├─2081 pveproxy worker
             └─2082 pveproxy worker

Dec 15 23:24:59 cyclops-alpha.ether-source.fr pveproxy[1212]: starting 2 worker(s)
Dec 15 23:24:59 cyclops-alpha.ether-source.fr pveproxy[1212]: worker 2080 started
Dec 15 23:24:59 cyclops-alpha.ether-source.fr pveproxy[1212]: worker 2081 started
Dec 15 23:24:59 cyclops-alpha.ether-source.fr pveproxy[2080]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key)>
Dec 15 23:24:59 cyclops-alpha.ether-source.fr pveproxy[2081]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key)>
Dec 15 23:24:59 cyclops-alpha.ether-source.fr pveproxy[2079]: worker exit
Dec 15 23:24:59 cyclops-alpha.ether-source.fr pveproxy[1212]: worker 2079 finished
Dec 15 23:24:59 cyclops-alpha.ether-source.fr pveproxy[1212]: starting 1 worker(s)
Dec 15 23:24:59 cyclops-alpha.ether-source.fr pveproxy[1212]: worker 2082 started
Dec 15 23:24:59 cyclops-alpha.ether-source.fr pveproxy[2082]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key)


Code:
root@cyclops-alpha:~# hostname
cyclops-alpha.ether-source.fr
root@cyclops-alpha:~# cat /etc/hostname
cyclops-alpha.ether-source.fr
root@cyclops-alpha:~# cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.1.15 cyclops-alpha cyclops-alpha.ether-source.fr


192.168.1.23 cerberus-alpha.ether-source.fr
192.168.1.189 cerberus-beta.ether-source.fr
192.168.1.161 cerberus-gamma.ether-source.fr
192.168.1.3 mermaid-alpha.ether-source.fr
192.168.1.106 minotor-alpha.ether-source.fr
192.168.1.43 basilisk.ether-source.fr

192.168.1.144 cyclops-beta.ether-source.fr
192.168.1.42 cyclops-gamma.ether-source.fr
192.168.1.138 cyclops-epsilon.ether-source.fr
192.168.1.172 cyclops-zeta.ether-source.fr
192.168.1.198 cyclops-eta.ether-source.fr
192.168.1.113 cyclops-theta.ether-source.fr
192.168.1.41 centaurs-alpha.ether-source.fr
Dunno what i still need to fix to make it up again, if you have any clues
 
I saw some tutorials, tried so many things, I think I do need some helps
The best approach would be to back out your changes to the saved configuration you made before you started.

Beyond that I would recommend reading through this discussion https://askubuntu.com/questions/863132/should-one-use-fqdn-in-etc-hostname-instead-of-hostname - it will highlight the few critical configuration errors you have. There may be more across all nodes that you have not shown.

Other resources:
https://pve.proxmox.com/wiki/Renaming_a_PVE_node
https://forum.proxmox.com/threads/proxmox-rename-node-big-problem.69149/



Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
Last edited:
@bbgeek17 Thanks for the reply !
I trie without success to rollback the previous hostname. I will try on another node (I have 7 actually).
I will just change the hostname to the previos one to see what's gonna happen.
I will read all of thoses link this evening, thanks a lot
 
Well, I think the command

Code:
hostnamectl set-hostname mox03.ether-source.fr
Cannot fix the problem (mox03.ether-source.fr is the original name)
I tottally rewrote /etc/hosts with ansible, I'm pretty sure there is mandatory stuff in.

EDIT:

With a new /etc/hosts


Code:
127.0.0.1 localhost.localdomain localhost
192.168.1.42 mox03.ether-source.fr mox03

# The following lines are desirable for IPv6 capable hosts

::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts
It's working !