[SOLVED] I can`t add 3rd node

Ivan Gersi · Oct 27, 2019

Guys am I idiot? I hade 7 years exeperiences with Proxmox (from v2) but I`m still failing in simply case.
I`v upgraded 2 nodes cluster (v4).
I made 3rd node with fresh install v6. Next I upgraded 2 v4 nodes to v5.
Node 2 was upgraded to v6 and joind to node 3.
Node 1 was upgraded to v6 but I can`t join to the cluster.
I`ve checked hostname, fingerprints, try to connect via web, cli....etc, no results. I`m little frustrating now.
Some info.

root@pve2:/etc# pveversion
pve-manager/6.0-9/508dcee0 (running kernel: 5.0.21-3-pve)
root@pve3:/etc# pveversion
pve-manager/6.0-4/2a719255 (running kernel: 5.0.15-1-pve
root@pve1:/etc/pve# pveversion
pve-manager/6.0-9/508dcee0 (running kernel: 5.0.21-3-pve)

root@pve1:/etc/pve# ping pve1
PING pve1.poradna.net (10.0.0.21) 56(84) bytes of data.
64 bytes from pve1.poradna.net (10.0.0.21): icmp_seq=1 ttl=64 time=0.068 ms
^C
--- pve1.poradna.net ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.068/0.068/0.068/0.000 ms
root@pve1:/etc/pve# ping pve2
PING pve2.poradna.net (10.0.0.71) 56(84) bytes of data.
64 bytes from pve2.poradna.net (10.0.0.71): icmp_seq=1 ttl=64 time=0.230 ms
^C
--- pve2.poradna.net ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.230/0.230/0.230/0.000 ms
root@pve1:/etc/pve# ping pve3
PING pve3.poradna.net (10.0.0.69) 56(84) bytes of data.
64 bytes from pve3.poradna.net (10.0.0.69): icmp_seq=1 ttl=64 time=0.172 ms
^C
--- pve3.poradna.net ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.172/0.172/0.172/0.000 ms

root@pve3:/etc# pvecm status
Quorum information
------------------
Date: Sun Oct 27 12:52:51 2019
Quorum provider: corosync_votequorum
Nodes: 2
Node ID: 0x00000001
Ring ID: 1/161364
Quorate: Yes

Votequorum information
----------------------
Expected votes: 3
Highest expected: 3
Total votes: 2
Quorum: 2
Flags: Quorate

Membership information
----------------------
Nodeid Votes Name
0x00000001 1 10.0.0.69 (local)
0x00000002 1 10.0.0.71

And finally the problematic node

root@pve1:/etc/pve# pvecm status
Cannot initialize CMAP service
root@pve1:/etc/pve# systemctl status pvecluster
Unit pvecluster.service could not be found.
root@pve1:/etc/pve# systemctl status pve-cluster
● pve-cluster.service - The Proxmox VE cluster filesystem
Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled; vendor preset: enabled)
Active: active (running) since Sun 2019-10-27 12:40:55 CET; 12min ago
Process: 3039 ExecStart=/usr/bin/pmxcfs (code=exited, status=0/SUCCESS)
Process: 3049 ExecStartPost=/usr/bin/pvecm updatecerts --silent (code=exited, status=0/SUCCESS)
Main PID: 3041 (pmxcfs)
Tasks: 5 (limit: 4915)
Memory: 17.3M
CGroup: /system.slice/pve-cluster.service
└─3041 /usr/bin/pmxcfs

Oct 27 12:53:36 pve1 pmxcfs[3041]: [dcdb] crit: cpg_initialize failed: 2
Oct 27 12:53:36 pve1 pmxcfs[3041]: [status] crit: cpg_initialize failed: 2
Oct 27 12:53:42 pve1 pmxcfs[3041]: [quorum] crit: quorum_initialize failed: 2
Oct 27 12:53:42 pve1 pmxcfs[3041]: [confdb] crit: cmap_initialize failed: 2
Oct 27 12:53:42 pve1 pmxcfs[3041]: [dcdb] crit: cpg_initialize failed: 2
Oct 27 12:53:42 pve1 pmxcfs[3041]: [status] crit: cpg_initialize failed: 2
Oct 27 12:53:48 pve1 pmxcfs[3041]: [quorum] crit: quorum_initialize failed: 2
Oct 27 12:53:48 pve1 pmxcfs[3041]: [confdb] crit: cmap_initialize failed: 2
Oct 27 12:53:48 pve1 pmxcfs[3041]: [dcdb] crit: cpg_initialize failed: 2
Oct 27 12:53:48 pve1 pmxcfs[3041]: [status] crit: cpg_initialize failed: 2

root@pve1:/etc/pve# systemctl status corosync
● corosync.service - Corosync Cluster Engine
Loaded: loaded (/lib/systemd/system/corosync.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Sun 2019-10-27 12:40:54 CET; 14min ago
Docs: man:corosync
man:corosync.conf
man:corosync_overview
Process: 3035 ExecStart=/usr/sbin/corosync -f $COROSYNC_OPTIONS (code=exited, status=8)
Main PID: 3035 (code=exited, status=8)

Oct 27 12:40:54 pve1 systemd[1]: Starting Corosync Cluster Engine...
Oct 27 12:40:54 pve1 corosync[3035]: [MAIN ] Corosync Cluster Engine 3.0.2 starting up
Oct 27 12:40:54 pve1 corosync[3035]: [MAIN ] Corosync built-in features: dbus monitoring watchdog systemd xmlconf snmp pie relro bindnow
Oct 27 12:40:54 pve1 corosync[3035]: [MAIN ] parse error in config: No multicast port specified
Oct 27 12:40:54 pve1 corosync[3035]: [MAIN ] Corosync Cluster Engine exiting with status 8 at main.c:1386.
Oct 27 12:40:54 pve1 systemd[1]: corosync.service: Main process exited, code=exited, status=8/n/a
Oct 27 12:40:54 pve1 systemd[1]: corosync.service: Failed with result 'exit-code'.
Oct 27 12:40:54 pve1 systemd[1]: Failed to start Corosync Cluster Engine.

I`v tried remove node and add several times...with the same result.
Corosync config.

logging {
debug: off
to_syslog: yes
}

nodelist {
node {
name: pve1
nodeid: 3
quorum_votes: 1
ring0_addr: 10.0.0.21
}
node {
name: pve2
nodeid: 2
quorum_votes: 1
ring0_addr: 10.0.0.71
ring1_addr: *********
}
node {
name: pve3
nodeid: 1
quorum_votes: 1
ring0_addr: 10.0.0.69
ring1_addr: *********
}
}

quorum {
provider: corosync_votequorum
}

totem {
cluster_name: Poradna
config_version: 11
interface {
linknumber: 0
}
ip_version: ipv4-6
secauth: on
version: 2
}

Recconect to the cluster

root@pve1:/var/log# systemctl stop pve-cluster
root@pve1:/var/log# systemctl stop corosync
root@pve1:/var/log# pmxcfs -l
[main] notice: forcing local mode (although corosync.conf exists)
root@pve1:/var/log# rm /etc/pve/corosync.conf
root@pve1:/var/log# rm /etc/corosync/*
root@pve1:/var/log# killall pmxcfs
root@pve1:/var/log# systemctl start pve-cluster
root@pve1:/var/log# pvecm add 10.0.0.69
Please enter superuser (root) password for '10.0.0.69':
Password for root@10.0.0.69: ********
Establishing API connection with host '10.0.0.69'
The authenticity of host '10.0.0.69' can't be established.
X509 SHA256 key fingerprint is AB:93:F0:C8:FF:99:55:A5:AB:F2:75:48:30:2C:3D:52:79:11:4D:33:60:63:70:46:E4:20:F1:3A:FD:87:F1:5C.
Are you sure you want to continue connecting (yes/no)? yes
Login succeeded.
Request addition of this node
Join request OK, finishing setup locally
stopping pve-cluster service
backup old database to '/var/lib/pve-cluster/backup/config-1572178548.sql.gz'
Job for corosync.service failed because the control process exited with error code.
starting pve-cluster failed: See "systemctl status corosync.service" and "journalctl -xe" for details.

Ivan Gersi · Oct 27, 2019

Ok I remove pve1 from cluster, deleted 2nd rings from corosync.conf and then try to add node again.

root@pve1:/etc/pve# systemctl stop pve-cluster
root@pve1:/etc/pve# systemctl stop corosync
root@pve1:/etc/pve# pmxcfs -l
[main] notice: forcing local mode (although corosync.conf exists)
root@pve1:/etc/pve# rm /etc/pve/corosync.conf
root@pve1:/etc/pve# rm /etc/corosync/*
root@pve1:/etc/pve# killall pmxcfs
root@pve1:/etc/pve# systemctl start pve-cluster
root@pve1:/etc/pve# pvecm add 10.0.0.69
Please enter superuser (root) password for '10.0.0.69':
Password for root@10.0.0.69: ********
Establishing API connection with host '10.0.0.69'
The authenticity of host '10.0.0.69' can't be established.
X509 SHA256 key fingerprint is AB:93:F0:C8:FF:99:55:A5:AB:F2:75:48:30:2C:3D:52:79:11:4D:33:60:63:70:46:E4:20:F1:3A:FD:87:F1:5C.
Are you sure you want to continue connecting (yes/no)? yes
Login succeeded.
Request addition of this node
Join request OK, finishing setup locally
stopping pve-cluster service
backup old database to '/var/lib/pve-cluster/backup/config-1572213508.sql.gz'
shell-init: error retrieving current directory: getcwd: cannot access parent directories: Transport endpoint is not connected
waiting for quorum...OK
(re)generate node files
generate new node certificate
merge authorized SSH keys and known hosts
generated new node certificate, restart pveproxy and pvedaemon services
successfully added node 'pve1' to cluster.
root@pve1:/etc/pve# pvecm status
Quorum information
------------------
Date: Sun Oct 27 22:59:24 2019
Quorum provider: corosync_votequorum
Nodes: 3
Node ID: 0x00000003
Ring ID: 1.27684
Quorate: Yes

Votequorum information
----------------------
Expected votes: 3
Highest expected: 3
Total votes: 3
Quorum: 2
Flags: Quorate

Membership information
----------------------
Nodeid Votes Name
0x00000001 1 10.0.0.69
0x00000002 1 10.0.0.71
0x00000003 1 10.0.0.21 (local)
root@pve1:/etc/pve#

Search

Search

[SOLVED] I can`t add 3rd node

Ivan Gersi

Renowned Member

Ivan Gersi

Renowned Member

We value your privacy