Problems adding a new node to the cluster (proxmox6.1)

Miquel Gual Torner

Active Member
Aug 31, 2016
15
5
43
44
I have 4 nodes in proxmox 6.1. Trying to add one more does not work.

root@pve-ajt-04:~# pvecm add 172.20.10.61
Please enter superuser (root) password for '172.20.10.61': ******
Establishing API connection with host '172.20.10.61'
The authenticity of host '172.20.10.61' can't be established.
X509 SHA256 key fingerprint is F2:36:DF:AA:6F:74:AB:B2:6E:D5:BD:CF:1E:32:55:4D:2B:05:5E:64:11:FE:D8:6E:F8:2E:89:BF:21:E1:74:B8.
Are you sure you want to continue connecting (yes/no)? yes
Login succeeded.
Request addition of this node
Join request OK, finishing setup locally
stopping pve-cluster service
backup old database to '/var/lib/pve-cluster/backup/config-1576236011.sql.gz'
waiting for quorum...

Does not continue

----
root@pve-ajt-11:~# pvecm delnode pve-ajt-04
root@pve-ajt-11:~# pvecm status
Cluster information
-------------------
Name: pve6
Config Version: 8
Transport: knet
Secure auth: on

Quorum information
------------------
Date: Fri Dec 20 08:22:33 2019
Quorum provider: corosync_votequorum
Nodes: 4
Node ID: 0x00000001
Ring ID: 1.396b4
Quorate: Yes

Votequorum information
----------------------
Expected votes: 4
Highest expected: 4
Total votes: 4
Quorum: 3
Flags: Quorate

Membership information
----------------------
Nodeid Votes Name
0x00000001 1 172.20.10.61 (local)
0x00000002 1 172.20.10.57
0x00000003 1 172.20.10.56
0x00000004 1 172.20.10.55

root@pve-ajt-11:~# cat /etc/pve/.members
{
"nodename": "pve-ajt-11",
"version": 6,
"cluster": { "name": "pve6", "version": 8, "nodes": 4, "quorate": 1 },
"nodelist": {
"pve-ajt-07": { "id": 2, "online": 1, "ip": "172.20.10.57"},
"pve-ajt-06": { "id": 3, "online": 1, "ip": "172.20.10.56"},
"pve-ajt-05": { "id": 4, "online": 1, "ip": "172.20.10.55"},
"pve-ajt-11": { "id": 1, "online": 1, "ip": "172.20.10.61"}
}
}
 
* root@pve-ajt-04:~# systemctl status pvesr.service
pvesr.service - Proxmox VE replication runner
Loaded: loaded (/lib/systemd/system/pvesr.service; static; vendor preset: enabled)
Active: failed (Result: exit-code) since Thu 2019-12-12 09:24:01 CET; 15s ago
Process: 12204 ExecStart=/usr/bin/pvesr run --mail 1 (code=exited, status=2)
de des. 12 09:24:01 pve-ajt-04 pvesr[12204]: error with cfs lock file-replication_cfg: no quorum!

pve-ajt-04:
* /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 1699.

pve-ajt-11 (cluster):
* Dec 12 08:58:15 pve-ajt-11 corosync[1489]: [TOTEM ] Token has not been received in 4515 ms
* Dec 12 08:58:15 pve-ajt-11 pmxcfs[1302]: [status] notice: cpg_send_message retry 60
 
Regarding the problem on the node you removed from the cluster (pve-ajt-04) - check out the reference documentation on removing a node:
https://pve.proxmox.com/pve-docs/chapter-pvecm.html#_remove_a_cluster_node
* Either reinstall it from scratch (since it should have no guests on it)
* Or follow the (not recommended) instructions to separate without reinstallation

Regarding the problem of joining pve-ajt-04 to an existing cluster:
* Check the journal and logs - especially for messages from corosync and pve-cluster/pmxcfs:
** `journalctl -r` (journal in reverse order)
** `journalctl -u corosync -u pve-cluster` (messages from corosync and pve-cluster/pmxcfs)

I hope this helps!
 
de gen. 22 17:59:17 pve-ajt-11 pmxcfs[1321]: [status] crit: cpg_send_message failed: 6
de gen. 22 17:59:17 pve-ajt-11 pmxcfs[1321]: [status] notice: cpg_send_message retried 100 times
de gen. 22 17:59:17 pve-ajt-11 pmxcfs[1321]: [status] notice: cpg_send_message retry 100
de gen. 22 17:59:16 pve-ajt-11 corosync[1573]: [TOTEM ] A new membership (1.3971c) was formed. Members
de gen. 22 17:59:16 pve-ajt-11 pmxcfs[1321]: [status] notice: cpg_send_message retry 90
de gen. 22 17:59:15 pve-ajt-11 pmxcfs[1321]: [status] notice: cpg_send_message retry 80
de gen. 22 17:59:14 pve-ajt-11 pmxcfs[1321]: [status] notice: cpg_send_message retry 70
de gen. 22 17:59:13 pve-ajt-11 corosync[1573]: [TOTEM ] Token has not been received in 4565 ms
de gen. 22 17:59:13 pve-ajt-11 pmxcfs[1321]: [status] notice: cpg_send_message retry 60
de gen. 22 17:59:12 pve-ajt-11 pmxcfs[1321]: [status] notice: cpg_send_message retry 50
de gen. 22 17:59:11 pve-ajt-11 corosync[1573]: [TOTEM ] Token has not been received in 2214 ms
de gen. 22 17:59:11 pve-ajt-11 pmxcfs[1321]: [status] notice: cpg_send_message retry 40
de gen. 22 17:59:10 pve-ajt-11 pmxcfs[1321]: [status] notice: cpg_send_message retry 30
de gen. 22 17:59:09 pve-ajt-11 corosync[1573]: [TOTEM ] A new membership (1.39708) was formed. Members
de gen. 22 17:59:09 pve-ajt-11 pmxcfs[1321]: [status] notice: cpg_send_message retry 20
de gen. 22 17:59:08 pve-ajt-11 pmxcfs[1321]: [status] notice: cpg_send_message retry 10
de gen. 22 17:59:06 pve-ajt-11 corosync[1573]: [TOTEM ] Token has not been received in 4565 ms
de gen. 22 17:59:04 pve-ajt-11 corosync[1573]: [TOTEM ] Token has not been received in 2214 ms
de gen. 22 17:59:02 pve-ajt-11 corosync[1573]: [TOTEM ] A new membership (1.396f4) was formed. Members
de gen. 22 17:59:00 pve-ajt-11 systemd[1]: Starting Proxmox VE replication runner...
de gen. 22 17:58:59 pve-ajt-11 corosync[1573]: [TOTEM ] Token has not been received in 5366 ms
de gen. 22 17:58:57 pve-ajt-11 corosync[1573]: [TOTEM ] Token has not been received in 3016 ms
de gen. 22 17:58:55 pve-ajt-11 corosync[1573]: [KNET ] pmtud: PMTUD link change for host: 5 link: 0 from 469 to 1397
de gen. 22 17:58:55 pve-ajt-11 corosync[1573]: [KNET ] host: host: 5 (passive) best link: 0 (pri: 1)
de gen. 22 17:58:55 pve-ajt-11 corosync[1573]: [KNET ] rx: host: 5 link: 0 is up
de gen. 22 17:58:52 pve-ajt-11 pmxcfs[1321]: [status] notice: received log
de gen. 22 17:58:51 pve-ajt-11 pmxcfs[1321]: [status] notice: update cluster info (cluster name pve6, version = 9)
de gen. 22 17:58:51 pve-ajt-11 corosync[1573]: [KNET ] host: host: 5 has no active links
de gen. 22 17:58:51 pve-ajt-11 corosync[1573]: [KNET ] host: host: 5 (passive) best link: 0 (pri: 1)
de gen. 22 17:58:51 pve-ajt-11 corosync[1573]: [KNET ] host: host: 5 has no active links
de gen. 22 17:58:51 pve-ajt-11 corosync[1573]: [KNET ] host: host: 5 (passive) best link: 0 (pri: 1)
de gen. 22 17:58:51 pve-ajt-11 corosync[1573]: [KNET ] host: host: 5 has no active links
de gen. 22 17:58:51 pve-ajt-11 corosync[1573]: [KNET ] host: host: 5 (passive) best link: 0 (pri: 1)
de gen. 22 17:58:51 pve-ajt-11 corosync[1573]: [TOTEM ] To reconfigure an interface it must be deleted and recreated. A working interface needs to be available to corosync at all times
de gen. 22 17:58:51 pve-ajt-11 corosync[1573]: [TOTEM ] new config has different address for link 0 (addr changed from 172.20.10.61 to 172.20.10.57). Internal value was NOT changed.
de gen. 22 17:58:51 pve-ajt-11 corosync[1573]: [TOTEM ] new config has different address for link 0 (addr changed from 172.20.10.57 to 172.20.10.56). Internal value was NOT changed.
de gen. 22 17:58:51 pve-ajt-11 corosync[1573]: [TOTEM ] new config has different address for link 0 (addr changed from 172.20.10.56 to 172.20.10.55). Internal value was NOT changed.
de gen. 22 17:58:51 pve-ajt-11 corosync[1573]: [TOTEM ] new config has different address for link 0 (addr changed from 172.20.10.55 to 172.20.10.53). Internal value was NOT changed.
de gen. 22 17:58:51 pve-ajt-11 corosync[1573]: [TOTEM ] Configured link number 0: local addr: 172.20.10.61, port=5405
de gen. 22 17:58:51 pve-ajt-11 corosync[1573]: [TOTEM ] Configuring link 0
 
de gen. 22 18:01:27 pve-ajt-11 pmxcfs[1321]: [status] crit: cpg_send_message failed: 6
de gen. 22 18:01:28 pve-ajt-11 corosync[1573]: [TOTEM ] Token has not been received in 4565 ms
de gen. 22 18:01:28 pve-ajt-11 pmxcfs[1321]: [status] notice: cpg_send_message retry 10
de gen. 22 18:01:29 pve-ajt-11 pmxcfs[1321]: [status] notice: cpg_send_message retry 20
de gen. 22 18:01:30 pve-ajt-11 pmxcfs[1321]: [status] notice: cpg_send_message retry 30
de gen. 22 18:01:31 pve-ajt-11 corosync[1573]: [TOTEM ] A new membership (1.39898) was formed. Members
de gen. 22 18:01:31 pve-ajt-11 pmxcfs[1321]: [status] notice: cpg_send_message retry 40
de gen. 22 18:01:32 pve-ajt-11 pmxcfs[1321]: [status] notice: cpg_send_message retry 50
root@pve-ajt-11:~# journalctl -u corosync -u pve-cluster
 
Hello,

I think I found out why:
Before adding this node to the cluster, I added a node that had a different configuration. After that, it wouldn't let me add more nodes.

I think it was because of this partition that the node pve-ajt-05 had and they didn't have the others (lvmthin: local-1T)

root@pve-ajt-05:~# cat /etc/pve/storage.cfg
dir: local
path /var/lib/vz
content iso,vztmpl,backup
maxfiles 2
shared 0

lvmthin: local-lvm
thinpool data
vgname pve
content images,rootdir

lvmthin: local-1T
thinpool local-1T
vgname local-1T
content images,rootdir
nodes pve-ajt-05
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!