After unsuccessfully adding new node (bad node) to cluster and restarting pve-cluster service on nodes. Pve-cluster service not starting up. As was expected pvecm and other commands not working.
On bad node
Version of PVE is the same on all nodes
Code:
systemctl status pve-cluster.service
● pve-cluster.service - The Proxmox VE cluster filesystem
Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled; vendor preset: enabled)
Active: activating (start) since Tue 2019-10-01 10:02:53 +06; 36s ago
Cntrl PID: 6022 (pmxcfs)
Tasks: 3 (limit: 4915)
Memory: 2.6M
CGroup: /system.slice/pve-cluster.service
├─6022 /usr/bin/pmxcfs
└─6024 /usr/bin/pmxcfs
Қаз 01 10:02:53 hp1 systemd[1]: Starting The Proxmox VE cluster filesystem...
Қаз 01 10:02:53 hp1 pmxcfs[6024]: [status] notice: update cluster info (cluster name cluster, version = 4)
Қаз 01 10:02:54 hp1 pmxcfs[6024]: [dcdb] notice: cpg_join retry 10
Қаз 01 10:02:55 hp1 pmxcfs[6024]: [dcdb] notice: cpg_join retry 20
Қаз 01 10:02:56 hp1 pmxcfs[6024]: [dcdb] notice: cpg_join retry 30
Code:
root@hp1:~# systemctl start pve-cluster.service
Job for pve-cluster.service failed because a timeout was exceeded.
See "systemctl status pve-cluster.service" and "journalctl -xe" for details.
root@hp1:~#
root@hp1:~# journalctl -xe
-- Subject: Automatic restarting of a unit has been scheduled
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- Automatic restarting of the unit pve-cluster.service has been scheduled, as the result for
-- the configured Restart= setting for the unit.
Қаз 01 10:04:24 hp1 systemd[1]: Stopped The Proxmox VE cluster filesystem.
-- Subject: A stop job for unit pve-cluster.service has finished
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- A stop job for unit pve-cluster.service has finished.
--
-- The job identifier is 464212 and the job result is done.
Қаз 01 10:04:24 hp1 systemd[1]: Starting The Proxmox VE cluster filesystem...
-- Subject: A start job for unit pve-cluster.service has begun execution
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- A start job for unit pve-cluster.service has begun execution.
--
-- The job identifier is 464212.
Қаз 01 10:04:24 hp1 systemd[4408]: etc-pve.mount: Succeeded.
-- Subject: Unit succeeded
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- The unit UNIT has successfully entered the 'dead' state.
Қаз 01 10:04:24 hp1 systemd[1]: etc-pve.mount: Succeeded.
-- Subject: Unit succeeded
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- The unit etc-pve.mount has successfully entered the 'dead' state.
Қаз 01 10:04:24 hp1 pmxcfs[6058]: [status] notice: update cluster info (cluster name cluster, version = 4)
Қаз 01 10:04:25 hp1 pmxcfs[6058]: [dcdb] notice: cpg_join retry 10
Қаз 01 10:04:25 hp1 pve-firewall[1174]: status update error: Connection refused
Қаз 01 10:04:26 hp1 pmxcfs[6058]: [dcdb] notice: cpg_join retry 20
Қаз 01 10:04:26 hp1 pvestatd[1176]: ipcc_send_rec[1] failed: Connection refused
Қаз 01 10:04:26 hp1 pvestatd[1176]: ipcc_send_rec[2] failed: Connection refused
Қаз 01 10:04:26 hp1 pvestatd[1176]: ipcc_send_rec[3] failed: Connection refused
Қаз 01 10:04:26 hp1 pvestatd[1176]: ipcc_send_rec[4] failed: Connection refused
Қаз 01 10:04:26 hp1 pvestatd[1176]: status update error: Connection refused
Қаз 01 10:04:27 hp1 pmxcfs[6058]: [dcdb] notice: cpg_join retry 30
Қаз 01 10:04:28 hp1 pmxcfs[6058]: [dcdb] notice: cpg_join retry 40
Қаз 01 10:04:28 hp1 pve-ha-lrm[1210]: loop take too long (90 seconds)
Қаз 01 10:04:28 hp1 pve-ha-crm[1202]: loop take too long (90 seconds)
Қаз 01 10:04:28 hp1 pve-ha-lrm[1210]: updating service status from manager failed: Connection refused
Қаз 01 10:04:28 hp1 corosync[1157]: [TOTEM ] A new membership (1:56716) was formed. Members
Қаз 01 10:04:34 hp1 corosync[1157]: [TOTEM ] A new membership (1:56736) was formed. Members
Қаз 01 10:04:35 hp1 pve-firewall[1174]: status update error: Connection refused
Қаз 01 10:04:36 hp1 pvestatd[1176]: ipcc_send_rec[1] failed: Connection refused
Қаз 01 10:04:36 hp1 pvestatd[1176]: ipcc_send_rec[2] failed: Connection refused
Қаз 01 10:04:36 hp1 pvestatd[1176]: ipcc_send_rec[3] failed: Connection refused
Қаз 01 10:04:36 hp1 pvestatd[1176]: ipcc_send_rec[4] failed: Connection refused
Қаз 01 10:04:36 hp1 pvestatd[1176]: status update error: Connection refused
On bad node
Code:
root@dell2:/etc/pve# pvecm status
Quorum information
------------------
Date: Tue Oct 1 09:54:56 2019
Quorum provider: corosync_votequorum
Nodes: 1
Node ID: 0x00000004
Ring ID: 4/54576
Quorate: No
Votequorum information
----------------------
Expected votes: 4
Highest expected: 4
Total votes: 1
Quorum: 3 Activity blocked
Flags:
Membership information
----------------------
Nodeid Votes Name
0x00000004 1 192.168.1.19 (local)
Code:
Corosync.conf
logging {
debug: off
to_syslog: yes
}
nodelist {
node {
name: dell1
nodeid: 3
quorum_votes: 1
ring0_addr: 192.168.1.38
}
node {
name: dell2
nodeid: 4
quorum_votes: 1
ring0_addr: 192.168.1.19
}
node {
name: hp1
nodeid: 1
quorum_votes: 1
ring0_addr: 192.168.1.4
}
node {
name: hp2
nodeid: 2
quorum_votes: 1
ring0_addr: 192.168.1.10
}
}
quorum {
provider: corosync_votequorum
}
totem {
cluster_name: cluster
config_version: 4
interface {
bindnetaddr: 192.168.1.4
ringnumber: 0
}
ip_version: ipv4
secauth: on
version: 2
}
Version of PVE is the same on all nodes
Code:
root@dell2:/etc/pve# pveversion -v
proxmox-ve: 6.0-2 (running kernel: 5.0.15-1-pve)
pve-manager: 6.0-4 (running version: 6.0-4/2a719255)
pve-kernel-5.0: 6.0-5
pve-kernel-helper: 6.0-5
pve-kernel-5.0.15-1-pve: 5.0.15-1
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.2-pve2
criu: 3.11-3
glusterfs-client: 5.5-3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.10-pve1
libpve-access-control: 6.0-2
libpve-apiclient-perl: 3.0-2
libpve-common-perl: 6.0-2
libpve-guest-common-perl: 3.0-1
libpve-http-server-perl: 3.0-2
libpve-storage-perl: 6.0-5
libqb0: 1.0.5-1
lvm2: 2.03.02-pve3
lxc-pve: 3.1.0-61
lxcfs: 3.0.3-pve60
novnc-pve: 1.0.0-60
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.0-5
pve-cluster: 6.0-4
pve-container: 3.0-3
pve-docs: 6.0-4
pve-edk2-firmware: 2.20190614-1
pve-firewall: 4.0-5
pve-firmware: 3.0-2
pve-ha-manager: 3.0-2
pve-i18n: 2.0-2
pve-qemu-kvm: 4.0.0-3
pve-xtermjs: 3.13.2-1
qemu-server: 6.0-5
smartmontools: 7.0-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.1-pve1