Proxmox add Node to Cluster fail

celtar

Renowned Member
Feb 10, 2016
17
3
68
56
Hi,
we have a Cluster an normaly we add a new Node with (on the new Node) pvecm add "IP-Address from Cluster Node".
Unfortunetely now with the new Node vms26 -> it does not work.

If you can see in the Picture 2 the new Node ist now node 1 (the old Node 1 ist EOL and has been deleted) and the Cluster is not funktional (see Picture).
- pvecm status # does not show the new Node (on one Cluster Node)
- pvecm nodes # does not show the new Node = vms26
- pvecm status # on vms26 (the new Node) you can see "activity blocked"

If we delete the new Node vms26 the Cluster (see Picture 2) will become fully funktional. (we tried it 2 times). If we reinstall Proxmox and add the new Node - the cluster get fail.
Switch ist 12 Port Cisco 10-Gbit, Servernetwork Cards Intel 550TX-2 only.
proxmox Version pve-manager/7.1-5/6fe299a0

Any Idea?
Thanks and best regards
John


# pvecm status
Cluster information
-------------------
Name: viaclst01
Config Version: 30
Transport: knet
Secure auth: on

Quorum information
------------------
Date: Mon Feb 21 12:01:26 2022
Quorum provider: corosync_votequorum
Nodes: 5
Node ID: 0x00000002
Ring ID: 2.239f
Quorate: Yes

Votequorum information
----------------------
Expected votes: 6
Highest expected: 6
Total votes: 5
Quorum: 4
Flags: Quorate

Membership information
----------------------
Nodeid Votes Name
0x00000002 1 10.79.100.70 (local)
0x00000003 1 10.79.100.72
0x00000004 1 10.79.100.128
0x00000005 1 10.79.100.132
0x00000006 1 10.79.100.134
----
 

Attachments

  • IMG_20220221_115613.jpg
    IMG_20220221_115613.jpg
    357.2 KB · Views: 11
  • 2022-02-21 11_57_12-vms20 - Proxmox Virtual Environment - Vivaldi.png
    2022-02-21 11_57_12-vms20 - Proxmox Virtual Environment - Vivaldi.png
    44 KB · Views: 9
Hello,


did you get any error during the pvecm add <node IP> command?

did you see anything interesting in the syslog/journalctl?

is the vms26 can ping the other nodes?
 
Hi Moayad,

thanks for your help. My colleague and I took another closer look today. Current status is (murphys law is everywhere and we learn every day :) ).

We suspect the following 3 possible causes for the error when adding the node:

1. there was an error with pvecm add new-node-id. Here the 2nd factor was switched on for the user root. There was an error message: 500 Use of Inherited Autoload for non-method PVE::API2::FA.... . We have now disabled the 2 FA auth for root and will try this again.

2. the Intel X550-T2 network card caused problems with some ports on the Cisco switch with DUP! pings. We have updated the firmware of the network card here and will replace this card again as a precaution (is just ordered). In the 1st 10Gbit switch (where the other nodes are connected) there were DUP! errors. In another stacked Cisco switch on another port not and everything ist ok. After replacing the card, we will also check whether the errors are still present. Of course we checked the /etc/network/interfaces

3. All nodes and the SSL keys (pve-ssl.pem) were different on all nodes and partly missing from the new node so -> pve-ssl.key /etc/pve/nodes/<node> Here we have after another forum entry https://forum.proxmox.com/threads/don’t-show-cluster-in-gui.56653/ the keys resynced and aligned.
I am even not sure why the cluster is still ok when we delete the new node. Maybe the ssl-Keys got wrong earlier but we do not know when this happened.

our plan:
1. replace the Intel Network X550-T2 network card and check if the DUP! pings are gone. If not further troubleshooting is required.
2. try again to bring the new node with the new network card into the cluster.
3. if necessary delete the node and reinstall it (is it possible to delete and add again?).

We report about the result.

Thanks and vG
John

PS: unfortunately on our first try our Cluster and VMs went complete down because we forgot to install nfs-client on the new node and we had an NFS Shared Storage. Maybe in near future we change our storage policy to ceph (we are not shure if it is worth) Other "Tuning"/"Best Practice" tasks (diff. Network for Storage Share and Cluster network) may/will follow. By the way, we are satisfied with the configuration for years and our cluster is quite performant enough for our needs.
 
Hi,


1. there was an error with pvecm add new-node-id. Here the 2nd factor was switched on for the user root. There was an error message: 500 Use of Inherited Autoload for non-method PVE::API2::FA.... . We have now disabled the 2 FA auth for root and will try this again.
You can add a node to a cluster using pvecm add <cluster IP> CLI this will ask for OTP code

3. if necessary delete the node and reinstall it (is it possible to delete and add again?).
It's not recommended to re-join the same node to the cluster after you removed it as mentioned in our docu [0]


[0] https://pve.proxmox.com/pve-docs/chapter-pvecm.html#_remove_a_cluster_node
 
Hi,
we tried another testcluster with 4 PCs (1Gbit Cisco Switch only) and also got errors with the 4th Node. We add this 4th node with pvecm add "ClusterIP".

All Nodes are on same Updatelevel. We tried also pvecm updatecerts -f (that sometimes help but just for a while)

From time to time we got Errors like that
1. Try Console from Webgui (a vm not the host)
Code:
Host key verification failed.
TASK ERROR: Failed to run vncproxy.

2. first Contact on 4th Node - the Server wants to add the ssh key (we never remember it for all older older Cluster nodes)

3. sometimes if we try to switch via ssh from another host to the 4th Host the (e.g.) 2nd Node wants to add the ssh key?
you can reproduce it if we switch down another node down for e.g.

4. we see that node4 ist just NR (i do not really understand if it is a problem cause on our live cluster we use a earlier Version

Code:
root@vserver05:~# pvecm status
Cluster information
-------------------
Name:             ccluster01
Config Version:   9
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Fri Mar 18 13:43:42 2022
Quorum provider:  corosync_votequorum
Nodes:            4
Node ID:          0x00000004
Ring ID:          1.3646
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   4
Highest expected: 4
Total votes:      4
Quorum:           3
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes    Qdevice Name
0x00000001          1  NA,NV,NMW 192.168.102.116
0x00000002          1  NA,NV,NMW 192.168.102.117
0x00000003          1  NA,NV,NMW 192.168.102.119
0x00000004          1         NR 192.168.102.121 (local)
0x00000000          0            Qdevice (votes 0)

5. we got Errors if we switch to a vm - Console (on host4) like "Uncaught TypeError : Cannot read Properties of undefined (reading sendCredentials).. cannot read properties (see attached Screenshot)

Any Idea ?

Thanks and best regards
John
 

Attachments

  • 2022-03-18 13_45_45-vserver02 - Proxmox Virtual Environment - Vivaldi.png
    2022-03-18 13_45_45-vserver02 - Proxmox Virtual Environment - Vivaldi.png
    19.9 KB · Views: 5

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!