[SOLVED] [PX6] Adding Node to cluster failed

Kephin

Renowned Member
Apr 21, 2015
25
5
68
I tried adding a node to our (existing) cluster of currently four machines.
We tried to do this through GUI.
GUI on new node stopped responding after it was restarting pve-cluster...something. (didn't grab a screenshot)
GUI didn't come back. Server is still reachable over SSH.

Node has been restarted, but this didn't fix.

Node seems to be added at least visually in existing cluster, but cluster shows new node as offline. Edit (17:47: This is only true for the node that i used to add it to.)
Cluster gui is also extremely slow or even times out until I stop corosync on new offending node.

I also had headaches adding the third and fourth machine to the cluster in a according to my memory similiar fashion, but after runnning pvecm updatecerts on these machines they came online and have worked flawlessly since. (at least as far as i've noticed)
This server refuses to run this command due to lack of quorum.
Pveproxy also complained in logs about not finding a certificate, so i looked deeper, found a topic on this forum that suggesting lowering quorum to 1 and then running the same command with -f might do something.
This seems to have worked, pveproxy also seems happy now, but the server still doesn't come online within the cluster.
(rebooted after this)
Gui of the new node does work now, however after logging it only gives errror: Connection error 401: permission denied - invalid PVE ticket.

Corosync on offending node logs these seemingly indefinately:
Code:
Oct 05 17:23:35 PX6PVE5 corosync[9250]:   [TOTEM ] A new membership (5.4377) was formed. Members
Oct 05 17:23:35 PX6PVE5 corosync[9250]:   [QUORUM] Members[1]: 5
Oct 05 17:23:35 PX6PVE5 corosync[9250]:   [MAIN  ] Completed service synchronization, ready to provide service.
Oct 05 17:23:42 PX6PVE5 corosync[9250]:   [TOTEM ] A new membership (5.438b) was formed. Members
Oct 05 17:23:42 PX6PVE5 corosync[9250]:   [QUORUM] Members[1]: 5
Oct 05 17:23:42 PX6PVE5 corosync[9250]:   [MAIN  ] Completed service synchronization, ready to provide service.
Oct 05 17:23:49 PX6PVE5 corosync[9250]:   [TOTEM ] A new membership (5.439f) was formed. Members
Oct 05 17:23:49 PX6PVE5 corosync[9250]:   [QUORUM] Members[1]: 5
Oct 05 17:23:49 PX6PVE5 corosync[9250]:   [MAIN  ] Completed service synchronization, ready to provide service.


Corosync on node that i used to add the new node logs this:

Code:
Oct 05 17:24:07 px6pve1 corosync[32918]:   [TOTEM ] Token has not been received in 4516 ms
Oct 05 17:24:10 px6pve1 corosync[32918]:   [TOTEM ] A new membership (1.43db) was formed. Members
Oct 05 17:24:12 px6pve1 corosync[32918]:   [TOTEM ] Token has not been received in 2213 ms
Oct 05 17:24:15 px6pve1 corosync[32918]:   [TOTEM ] Token has not been received in 4515 ms
Oct 05 17:24:17 px6pve1 corosync[32918]:   [TOTEM ] A new membership (1.43ef) was formed. Members

Also this looks weird to me:

Code:
Oct 05 17:24:17 px6pve1 corosync[32918]:   [QUORUM] Members[4]: 1 2 3 4

vs on the new node:

Oct 05 17:23:42 PX6PVE5 corosync[9250]:   [QUORUM] Members[1]: 5

Is seems to see only itself?

They are in the same network.

The only thing i can feasibly think of performance wise is that the server is currently in parity procedure on RAID and it's therefore somewhat slower.

I'm running out of idea's of things to try and would appreciate some pointers.
 
Last edited:
I'd like to add this is a standard cluster. I haven't configured any HA, and the software is used as it installs from the ISO without any real customisations.
 
I've found out latency may cause a problem. Even though I haven't seperated corosync traffic from the rest, cluster has it's own switch for interconnectivity and pings between them are reliably below 0.260 ms.
Furthermore, pveversion -v output of working node vs new node:

Code:
proxmox-ve: 6.2-1 (running kernel: 5.4.44-2-pve)
pve-manager: 6.2-10 (running version: 6.2-10/a20769ed)
pve-kernel-5.4: 6.2-4
pve-kernel-helper: 6.2-4
pve-kernel-5.4.44-2-pve: 5.4.44-2
pve-kernel-5.4.34-1-pve: 5.4.34-2
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.4-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.16-pve1
libproxmox-acme-perl: 1.0.4
libpve-access-control: 6.1-2
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.1-5
libpve-guest-common-perl: 3.1-1
libpve-http-server-perl: 3.0-6
libpve-storage-perl: 6.2-5
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.2-1
lxcfs: 4.0.3-pve3
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.2-9
pve-cluster: 6.1-8
pve-container: 3.1-11
pve-docs: 6.2-5
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-2
pve-firmware: 3.1-1
pve-ha-manager: 3.0-9
pve-i18n: 2.1-3
pve-qemu-kvm: 5.0.0-11
pve-xtermjs: 4.3.0-1
qemu-server: 6.2-10
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.4-pve1


Code:
pveversion -v
proxmox-ve: 6.2-2 (running kernel: 5.4.65-1-pve)
pve-manager: 6.2-12 (running version: 6.2-12/b287dd27)
pve-kernel-5.4: 6.2-7
pve-kernel-helper: 6.2-7
pve-kernel-5.4.65-1-pve: 5.4.65-1
pve-kernel-5.4.34-1-pve: 5.4.34-2
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.4-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.16-pve1
libproxmox-acme-perl: 1.0.5
libpve-access-control: 6.1-2
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.2-2
libpve-guest-common-perl: 3.1-3
libpve-http-server-perl: 3.0-6
libpve-storage-perl: 6.2-8
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.3-1
lxcfs: 4.0.3-pve3
novnc-pve: 1.1.0-1
proxmox-backup-client: 0.9.0-2
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.2-12
pve-cluster: 6.1-8
pve-container: 3.2-2
pve-docs: 6.2-6
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-3
pve-firmware: 3.1-3
pve-ha-manager: 3.1-1
pve-i18n: 2.2-1
pve-qemu-kvm: 5.1.0-2
pve-xtermjs: 4.7.0-2
qemu-server: 6.2-14
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 0.8.4-pve1
 
We've chosen to re-install this node as i'm running out of time and patience to further troubleshoot this.
Gave it a new name and new IP.
Removed old node from cluster.
Added this new-new node, also fully updated, to the same existing "not quite yet updated" node in our cluster, however now using CLI.
Node was added without any problem...
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!