Quorum Activity blocked after node maintenance

ricardoj

Member
Oct 16, 2018
101
8
23
67
Sao Paulo - Brazil
Hi,

I have a cluster with 3 nodes : PVE-01, PVE-02 and PVE-03

On yesterday PVE-02 was shut down for maintenance and when put back into operation was no longer recognized in the cluster.

Here are some info I've got so far :

- root@pve-02:~# pveversion
pve-manager/6.0-8/b6b80da7 (running kernel: 5.0.21-2-pve)

- /etc/pve is in read only mode

- I can ping by IP and name all the nodes

- I can ssh into any node from any node

- root@pve-02:~# pvecm status

Quorum information
------------------
Date: Thu Oct 17 09:32:58 2019
Quorum provider: corosync_votequorum
Nodes: 1
Node ID: 0x00000002
Ring ID: 2/688
Quorate: No

Votequorum information
----------------------
Expected votes: 3
Highest expected: 3
Total votes: 1
Quorum: 2 Activity blocked
Flags:

Membership information
----------------------
Nodeid Votes Name
0x00000002 1 xx.xx.xx.242 (local)

- I see the lock file in /var/lib/pve-cluster/.pmxcfs.lockfile

- PVE-Cluster service has errors

root@pve-02:~# service pve-cluster status
● pve-cluster.service - The Proxmox VE cluster filesystem
Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled; vendor pres
Active: active (running) since Thu 2019-10-17 09:30:57 -03; 6min ago
Process: 4264 ExecStart=/usr/bin/pmxcfs (code=exited, status=0/SUCCESS)
Process: 4389 ExecStartPost=/usr/bin/pvecm updatecerts --silent (code=exited,
Main PID: 4267 (pmxcfs)
Tasks: 6 (limit: 4915)
Memory: 35.5M
CGroup: /system.slice/pve-cluster.service
└─4267 /usr/bin/pmxcfs

Oct 17 09:30:55 pve-02 pmxcfs[4267]: [dcdb] crit: cpg_initialize failed: 2
Oct 17 09:30:55 pve-02 pmxcfs[4267]: [dcdb] crit: can't initialize service
Oct 17 09:30:55 pve-02 pmxcfs[4267]: [status] crit: cpg_initialize failed: 2
Oct 17 09:30:55 pve-02 pmxcfs[4267]: [status] crit: can't initialize service
Oct 17 09:30:57 pve-02 systemd[1]: Started The Proxmox VE cluster filesystem.
Oct 17 09:31:01 pve-02 pmxcfs[4267]: [status] notice: update cluster info (clust
Oct 17 09:31:01 pve-02 pmxcfs[4267]: [dcdb] notice: members: 2/4267
Oct 17 09:31:01 pve-02 pmxcfs[4267]: [dcdb] notice: all data is up to date
Oct 17 09:31:01 pve-02 pmxcfs[4267]: [status] notice: members: 2/4267
Oct 17 09:31:01 pve-02 pmxcfs[4267]: [status] notice: all data is up to date

- file /etc/pve/corosync.conf is the same on all 3 nodes

What else can be done?

Thank you,

Ricardo Jorge
 
Do you still have that issue? What is the state of the corosync service? And what does th journal -u corosync.service (or syslog) show?