Problems adding a new node to the cluster (proxmox6.1)

Miquel Gual Torner

Renowned Member
Aug 31, 2016
15
6
68
45
I have 4 nodes in proxmox 6.1. Trying to add one more does not work.

root@pve-ajt-04:~# pvecm add 172.20.10.61
Please enter superuser (root) password for '172.20.10.61': ******
Establishing API connection with host '172.20.10.61'
The authenticity of host '172.20.10.61' can't be established.
X509 SHA256 key fingerprint is F2:36:DF:AA:6F:74:AB:B2:6E:D5:BD:CF:1E:32:55:4D:2B:05:5E:64:11:FE:D8:6E:F8:2E:89:BF:21:E1:74:B8.
Are you sure you want to continue connecting (yes/no)? yes
Login succeeded.
Request addition of this node
Join request OK, finishing setup locally
stopping pve-cluster service
backup old database to '/var/lib/pve-cluster/backup/config-1576236011.sql.gz'
waiting for quorum...

Does not continue

----
root@pve-ajt-11:~# pvecm delnode pve-ajt-04
root@pve-ajt-11:~# pvecm status
Cluster information
-------------------
Name: pve6
Config Version: 8
Transport: knet
Secure auth: on

Quorum information
------------------
Date: Fri Dec 20 08:22:33 2019
Quorum provider: corosync_votequorum
Nodes: 4
Node ID: 0x00000001
Ring ID: 1.396b4
Quorate: Yes

Votequorum information
----------------------
Expected votes: 4
Highest expected: 4
Total votes: 4
Quorum: 3
Flags: Quorate

Membership information
----------------------
Nodeid Votes Name
0x00000001 1 172.20.10.61 (local)
0x00000002 1 172.20.10.57
0x00000003 1 172.20.10.56
0x00000004 1 172.20.10.55

root@pve-ajt-11:~# cat /etc/pve/.members
{
"nodename": "pve-ajt-11",
"version": 6,
"cluster": { "name": "pve6", "version": 8, "nodes": 4, "quorate": 1 },
"nodelist": {
"pve-ajt-07": { "id": 2, "online": 1, "ip": "172.20.10.57"},
"pve-ajt-06": { "id": 3, "online": 1, "ip": "172.20.10.56"},
"pve-ajt-05": { "id": 4, "online": 1, "ip": "172.20.10.55"},
"pve-ajt-11": { "id": 1, "online": 1, "ip": "172.20.10.61"}
}
}
 
* root@pve-ajt-04:~# systemctl status pvesr.service
pvesr.service - Proxmox VE replication runner
Loaded: loaded (/lib/systemd/system/pvesr.service; static; vendor preset: enabled)
Active: failed (Result: exit-code) since Thu 2019-12-12 09:24:01 CET; 15s ago
Process: 12204 ExecStart=/usr/bin/pvesr run --mail 1 (code=exited, status=2)
de des. 12 09:24:01 pve-ajt-04 pvesr[12204]: error with cfs lock file-replication_cfg: no quorum!

pve-ajt-04:
* /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 1699.

pve-ajt-11 (cluster):
* Dec 12 08:58:15 pve-ajt-11 corosync[1489]: [TOTEM ] Token has not been received in 4515 ms
* Dec 12 08:58:15 pve-ajt-11 pmxcfs[1302]: [status] notice: cpg_send_message retry 60
 
Regarding the problem on the node you removed from the cluster (pve-ajt-04) - check out the reference documentation on removing a node:
https://pve.proxmox.com/pve-docs/chapter-pvecm.html#_remove_a_cluster_node
* Either reinstall it from scratch (since it should have no guests on it)
* Or follow the (not recommended) instructions to separate without reinstallation

Regarding the problem of joining pve-ajt-04 to an existing cluster:
* Check the journal and logs - especially for messages from corosync and pve-cluster/pmxcfs:
** `journalctl -r` (journal in reverse order)
** `journalctl -u corosync -u pve-cluster` (messages from corosync and pve-cluster/pmxcfs)

I hope this helps!
 
de gen. 22 17:59:17 pve-ajt-11 pmxcfs[1321]: [status] crit: cpg_send_message failed: 6
de gen. 22 17:59:17 pve-ajt-11 pmxcfs[1321]: [status] notice: cpg_send_message retried 100 times
de gen. 22 17:59:17 pve-ajt-11 pmxcfs[1321]: [status] notice: cpg_send_message retry 100
de gen. 22 17:59:16 pve-ajt-11 corosync[1573]: [TOTEM ] A new membership (1.3971c) was formed. Members
de gen. 22 17:59:16 pve-ajt-11 pmxcfs[1321]: [status] notice: cpg_send_message retry 90
de gen. 22 17:59:15 pve-ajt-11 pmxcfs[1321]: [status] notice: cpg_send_message retry 80
de gen. 22 17:59:14 pve-ajt-11 pmxcfs[1321]: [status] notice: cpg_send_message retry 70
de gen. 22 17:59:13 pve-ajt-11 corosync[1573]: [TOTEM ] Token has not been received in 4565 ms
de gen. 22 17:59:13 pve-ajt-11 pmxcfs[1321]: [status] notice: cpg_send_message retry 60
de gen. 22 17:59:12 pve-ajt-11 pmxcfs[1321]: [status] notice: cpg_send_message retry 50
de gen. 22 17:59:11 pve-ajt-11 corosync[1573]: [TOTEM ] Token has not been received in 2214 ms
de gen. 22 17:59:11 pve-ajt-11 pmxcfs[1321]: [status] notice: cpg_send_message retry 40
de gen. 22 17:59:10 pve-ajt-11 pmxcfs[1321]: [status] notice: cpg_send_message retry 30
de gen. 22 17:59:09 pve-ajt-11 corosync[1573]: [TOTEM ] A new membership (1.39708) was formed. Members
de gen. 22 17:59:09 pve-ajt-11 pmxcfs[1321]: [status] notice: cpg_send_message retry 20
de gen. 22 17:59:08 pve-ajt-11 pmxcfs[1321]: [status] notice: cpg_send_message retry 10
de gen. 22 17:59:06 pve-ajt-11 corosync[1573]: [TOTEM ] Token has not been received in 4565 ms
de gen. 22 17:59:04 pve-ajt-11 corosync[1573]: [TOTEM ] Token has not been received in 2214 ms
de gen. 22 17:59:02 pve-ajt-11 corosync[1573]: [TOTEM ] A new membership (1.396f4) was formed. Members
de gen. 22 17:59:00 pve-ajt-11 systemd[1]: Starting Proxmox VE replication runner...
de gen. 22 17:58:59 pve-ajt-11 corosync[1573]: [TOTEM ] Token has not been received in 5366 ms
de gen. 22 17:58:57 pve-ajt-11 corosync[1573]: [TOTEM ] Token has not been received in 3016 ms
de gen. 22 17:58:55 pve-ajt-11 corosync[1573]: [KNET ] pmtud: PMTUD link change for host: 5 link: 0 from 469 to 1397
de gen. 22 17:58:55 pve-ajt-11 corosync[1573]: [KNET ] host: host: 5 (passive) best link: 0 (pri: 1)
de gen. 22 17:58:55 pve-ajt-11 corosync[1573]: [KNET ] rx: host: 5 link: 0 is up
de gen. 22 17:58:52 pve-ajt-11 pmxcfs[1321]: [status] notice: received log
de gen. 22 17:58:51 pve-ajt-11 pmxcfs[1321]: [status] notice: update cluster info (cluster name pve6, version = 9)
de gen. 22 17:58:51 pve-ajt-11 corosync[1573]: [KNET ] host: host: 5 has no active links
de gen. 22 17:58:51 pve-ajt-11 corosync[1573]: [KNET ] host: host: 5 (passive) best link: 0 (pri: 1)
de gen. 22 17:58:51 pve-ajt-11 corosync[1573]: [KNET ] host: host: 5 has no active links
de gen. 22 17:58:51 pve-ajt-11 corosync[1573]: [KNET ] host: host: 5 (passive) best link: 0 (pri: 1)
de gen. 22 17:58:51 pve-ajt-11 corosync[1573]: [KNET ] host: host: 5 has no active links
de gen. 22 17:58:51 pve-ajt-11 corosync[1573]: [KNET ] host: host: 5 (passive) best link: 0 (pri: 1)
de gen. 22 17:58:51 pve-ajt-11 corosync[1573]: [TOTEM ] To reconfigure an interface it must be deleted and recreated. A working interface needs to be available to corosync at all times
de gen. 22 17:58:51 pve-ajt-11 corosync[1573]: [TOTEM ] new config has different address for link 0 (addr changed from 172.20.10.61 to 172.20.10.57). Internal value was NOT changed.
de gen. 22 17:58:51 pve-ajt-11 corosync[1573]: [TOTEM ] new config has different address for link 0 (addr changed from 172.20.10.57 to 172.20.10.56). Internal value was NOT changed.
de gen. 22 17:58:51 pve-ajt-11 corosync[1573]: [TOTEM ] new config has different address for link 0 (addr changed from 172.20.10.56 to 172.20.10.55). Internal value was NOT changed.
de gen. 22 17:58:51 pve-ajt-11 corosync[1573]: [TOTEM ] new config has different address for link 0 (addr changed from 172.20.10.55 to 172.20.10.53). Internal value was NOT changed.
de gen. 22 17:58:51 pve-ajt-11 corosync[1573]: [TOTEM ] Configured link number 0: local addr: 172.20.10.61, port=5405
de gen. 22 17:58:51 pve-ajt-11 corosync[1573]: [TOTEM ] Configuring link 0
 
de gen. 22 18:01:27 pve-ajt-11 pmxcfs[1321]: [status] crit: cpg_send_message failed: 6
de gen. 22 18:01:28 pve-ajt-11 corosync[1573]: [TOTEM ] Token has not been received in 4565 ms
de gen. 22 18:01:28 pve-ajt-11 pmxcfs[1321]: [status] notice: cpg_send_message retry 10
de gen. 22 18:01:29 pve-ajt-11 pmxcfs[1321]: [status] notice: cpg_send_message retry 20
de gen. 22 18:01:30 pve-ajt-11 pmxcfs[1321]: [status] notice: cpg_send_message retry 30
de gen. 22 18:01:31 pve-ajt-11 corosync[1573]: [TOTEM ] A new membership (1.39898) was formed. Members
de gen. 22 18:01:31 pve-ajt-11 pmxcfs[1321]: [status] notice: cpg_send_message retry 40
de gen. 22 18:01:32 pve-ajt-11 pmxcfs[1321]: [status] notice: cpg_send_message retry 50
root@pve-ajt-11:~# journalctl -u corosync -u pve-cluster
 
Hello,

I think I found out why:
Before adding this node to the cluster, I added a node that had a different configuration. After that, it wouldn't let me add more nodes.

I think it was because of this partition that the node pve-ajt-05 had and they didn't have the others (lvmthin: local-1T)

root@pve-ajt-05:~# cat /etc/pve/storage.cfg
dir: local
path /var/lib/vz
content iso,vztmpl,backup
maxfiles 2
shared 0

lvmthin: local-lvm
thinpool data
vgname pve
content images,rootdir

lvmthin: local-1T
thinpool local-1T
vgname local-1T
content images,rootdir
nodes pve-ajt-05