Manually copy PVE files

Duktek

Active Member
Jan 16, 2020
16
0
41
28
Hi,

I have a cluster with 15 nodes. And i want to add 1 more node to the cluster. But it fail because cluster cannot form a quorum.

Jan 14 22:30:57 new-node corosync[1170558]: warning [MAIN ] Totem is unable to form a cluster because of an operating system or network fault (reason: totem is continuously in gather state). The most common cause of this message is that the local firewall is configured improperly.

After it fail i check again and on every nodes corosync config file already have the new-node as a member. But empty folder on /etc/pve/nodes/new-node/ .

Bluntly i tried to pmxcfs -l and copy all files inside folder /etc/pve from another node to new-node. Kill pmxcfs and start pve-cluster again on new-node.

New-node can quorum with all nodes in the cluster now. I tried to pvecm updatecerts but suddenly there is kernel hung on new-node syslog.

I know this is weird question but is there a chance to manually copy files like that?
 
Bluntly i tried to pmxcfs -l and copy all files inside folder /etc/pve from another node to new-node. Kill pmxcfs and start pve-cluster again on new-node.

This should be avoided if anyhow possible. The timestamps of modification decides what happens on file change conflicts. If you mess manually with that you go into possible dangerous territory as the "new" node can then break all older ones, if something went wrong or was forgotten to copy.

So to be clear, the 15 "older" nodes all have quorum and are working OK?

You could start with comparing the file /etc/corosync/corosync.conf from a working node with the problematic one.
Check also if all nodes are there. If the files differ copy over the good one to the problematic one. Ideally you stop the pve-cluster and corosync services before you do that then start first the corosync one then the pve-cluster one.

Post any suspective logs from the problematic one if it still fails.