I reinstalled a node in the cluster and now the cluster is messy

Nov 8, 2017
99
3
13
33
Muscat
Hi folks
We have a cluster consisting of 14 nodes.
For some reason, I had to remove 3 of these nodes, reinstall Proxmox and bring them back into the cluster. It happened over time for different reasons.

Now the issue is, none of my nodes can SSH to the nodes which have been re-installed.
They find a duplicate fingerprint in their know_hosts
I have run "pvecm updatecerts" on the reinstalled nodes but no changes.
The only workaround is to manually remove those lines from other node's know_hosts file. But if any one of them get rebooted, it will be the same again.

Another issue is, whenever I add a new node, my other nodes will see that as offline and cannot migrate any vm to it., and I have to restart corosync and pve-cluster services, which scares me a lot, because, during the restart, that node appears as offline in the GUI, I don't if VMs will experience any interruption or not.

So would you help me to fix the cluster state again?
 
good morning,
https://pve.proxmox.com/wiki/Cluster_Manager#_remove_a_cluster_node

- After powering off the node hp4, we can safely remove it from the cluster:
hp1# pvecm delnode hp4

pvecm status

- If, for whatever reason, you want this server to join the same cluster again, you have to:

  • reinstall Proxmox VE on it from scratch
  • then join it, as explained in the previous section.
After removal of the node, its SSH fingerprint will still reside in the known_hosts of the other nodes. If you receive an SSH error after rejoining a node with the same IP or hostname, run pvecm updatecerts once on the re-added node to update its fingerprint cluster wide.
 
good morning,
https://pve.proxmox.com/wiki/Cluster_Manager#_remove_a_cluster_node

- After powering off the node hp4, we can safely remove it from the cluster:
hp1# pvecm delnode hp4

pvecm status

- If, for whatever reason, you want this server to join the same cluster again, you have to:

  • reinstall Proxmox VE on it from scratch
  • then join it, as explained in the previous section.
After removal of the node, its SSH fingerprint will still reside in the known_hosts of the other nodes. If you receive an SSH error after rejoining a node with the same IP or hostname, run pvecm updatecerts once on the re-added node to update its fingerprint cluster wide.
Yes I have read that and followed exactly like instructed.
As I mentioned in my statement, I even ran "pvecm updatecerts " in the nodes. But the issue is still there. That's why I',m sharing it here.
 
hi,

can you post the output of pvecm status and pveversion -v
 
hi,

can you post the output of pvecm status and pveversion -v



# pvecm status
Quorum information
------------------
Date: Thu Jan 2 17:43:42 2020
Quorum provider: corosync_votequorum
Nodes: 13
Node ID: 0x00000006
Ring ID: 3/2800
Quorate: Yes

Votequorum information
----------------------
Expected votes: 13
Highest expected: 13
Total votes: 13
Quorum: 7
Flags: Quorate

Membership information
----------------------
Nodeid Votes Name
0x00000003 1 172.27.3.10
0x00000001 1 172.27.3.11
0x00000002 1 172.27.3.12
0x00000004 1 172.27.3.13
0x00000005 1 172.27.3.14
0x00000006 1 172.27.3.15 (local)
0x00000007 1 172.27.3.16
0x00000009 1 172.27.3.17
0x0000000a 1 172.27.3.18
0x0000000b 1 172.27.3.19
0x00000008 1 172.27.3.101
0x0000000c 1 172.27.3.102
0x0000000d 1 172.27.3.104


# pveversion -v
proxmox-ve: 5.4-2 (running kernel: 4.15.18-24-pve)
pve-manager: 5.4-13 (running version: 5.4-13/aee6f0ec)
pve-kernel-4.15: 5.4-12
pve-kernel-4.15.18-24-pve: 4.15.18-52
pve-kernel-4.15.18-21-pve: 4.15.18-48
ceph: 12.2.12-pve1
corosync: 2.4.4-pve1
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: not correctly installed
libjs-extjs: 6.0.1-2
libpve-access-control: 5.1-12
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-56
libpve-guest-common-perl: 2.0-20
libpve-http-server-perl: 2.0-14
libpve-storage-perl: 5.0-44
libqb0: 1.0.3-1~bpo9
lvm2: 2.02.168-pve6
lxc-pve: 3.1.0-7
lxcfs: 3.0.3-pve1
novnc-pve: 1.0.0-3
proxmox-widget-toolkit: 1.0-28
pve-cluster: 5.0-38
pve-container: 2.0-41
pve-docs: 5.4-2
pve-edk2-firmware: 1.20190312-1
pve-firewall: 3.0-22
pve-firmware: 2.0-7
pve-ha-manager: 2.0-9
pve-i18n: 1.1-4
pve-libspice-server1: 0.14.1-2
pve-qemu-kvm: 3.0.1-4
pve-xtermjs: 3.12.0-1
qemu-server: 5.0-54
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
 
do all nodes have the same pve versions?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!