pve6to7 : corosync.conf (5) and pmxcfs (6) don't agree about size of nodelist

PointPubMedia

Active Member
Aug 3, 2017
27
1
43
Hi,

We are planning to upgrade from 6 to 7 but on 1 out of 5 nodes, we got this:

Analzying quorum settings and state..
FAIL: 1 nodes are offline!
INFO: configured votes - nodes: 5
INFO: configured votes - qdevice: 0
INFO: current expected votes: 5
INFO: current total votes: 5
FAIL: corosync.conf (5) and pmxcfs (6) don't agree about size of nodelist.
 
Hi,
is /etc/corosync/corosync.conf the same as /etc/pve/corosync.conf on that node? What is the output of cat /etc/pve/corosync.conf and pvecm status?
 
/etc/corosync/corosync.conf and /etc/pve/corosync.conf are the same!

logging { debug: off to_syslog: yes } nodelist { node { name: pve11 nodeid: 2 quorum_votes: 1 ring0_addr: x.x.x.242 ring1_addr: x.x.y.242 } node { name: pve12 nodeid: 4 quorum_votes: 1 ring0_addr: x.x.x.243 ring1_addr: x.x.y.243 } node { name: pve13 nodeid: 5 quorum_votes: 1 ring0_addr: x.x.x.244 ring1_addr: x.x.y.244 } node { name: pve14 nodeid: 6 quorum_votes: 1 ring0_addr: x.x.x.245 ring1_addr: x.x.y.245 } node { name: pve15 nodeid: 3 quorum_votes: 1 ring0_addr: x.x.x.246 ring1_addr: x.x.y.246 } } quorum { provider: corosync_votequorum } totem { cluster_name: QSE-PVE config_version: 9 interface { linknumber: 0 } interface { linknumber: 1 } ip_version: ipv4-6 link_mode: passive secauth: on version: 2 }

Cluster information ------------------- Name: QSE-PVE Config Version: 9 Transport: knet Secure auth: on Quorum information ------------------ Date: Thu Aug 4 06:47:43 2022 Quorum provider: corosync_votequorum Nodes: 5 Node ID: 0x00000004 Ring ID: 2.e4e Quorate: Yes Votequorum information ---------------------- Expected votes: 5 Highest expected: 5 Total votes: 5 Quorum: 3 Flags: Quorate Membership information ---------------------- Nodeid Votes Name 0x00000002 1 x.x.x.242 0x00000003 1 x.x.x.246 0x00000004 1 x.x.x.243 (local) 0x00000005 1 x.x.x.244 0x00000006 1 x.x.x.245
 
What is the output of cat /etc/pve/.members and journalctl -b -u pve-cluster.service?

Does systemctl reload-or-restart pve-cluster.service on the problematic node help?

Was there a sixth node in the past? How did you remove it?
 
{ "nodename": "pve12", "version": 7, "cluster": { "name": "QSE-PVE", "version": 8, "nodes": 6, "quorate": 1 }, "nodelist": { "pve15": { "id": 3, "online": 1, "ip": "x.x.x.246"}, "pve01": { "id": 1, "online": 0}, "pve11": { "id": 2, "online": 1, "ip": "x.x.x.242"}, "pve12": { "id": 4, "online": 1, "ip": "x.x.x.243"}, "pve13": { "id": 5, "online": 1, "ip": "x.x.x.244"}, "pve14": { "id": 6, "online": 1, "ip": "x.x.x.245"} } }

Restart of pve-cluster didn't help!
Aug 4 07:44:29 pve12 systemd[1]: Stopped The Proxmox VE cluster filesystem. Aug 4 07:44:29 pve12 systemd[1]: Starting The Proxmox VE cluster filesystem... Aug 4 07:44:29 pve12 pmxcfs[17008]: [status] notice: update cluster info (cluster name QSE-PVE, version = 8) Aug 4 07:44:29 pve12 pmxcfs[17008]: [status] notice: node has quorum Aug 4 07:44:29 pve12 pmxcfs[17008]: [dcdb] notice: members: 2/9970, 3/25652, 4/17008, 5/17056, 6/12386 Aug 4 07:44:29 pve12 pmxcfs[17008]: [dcdb] notice: starting data syncronisation Aug 4 07:44:29 pve12 pmxcfs[17008]: [dcdb] notice: received sync request (epoch 2/9970/00000005) Aug 4 07:44:29 pve12 pmxcfs[17008]: [status] notice: members: 2/9970, 3/25652, 4/17008, 5/17056, 6/12386 Aug 4 07:44:29 pve12 pmxcfs[17008]: [status] notice: starting data syncronisation Aug 4 07:44:29 pve12 pmxcfs[17008]: [status] notice: received sync request (epoch 2/9970/00000005) Aug 4 07:44:29 pve12 pmxcfs[17008]: [dcdb] notice: received all states Aug 4 07:44:29 pve12 pmxcfs[17008]: [dcdb] notice: leader is 2/9970 Aug 4 07:44:29 pve12 pmxcfs[17008]: [dcdb] notice: synced members: 2/9970, 3/25652, 4/17008, 5/17056, 6/12386 Aug 4 07:44:29 pve12 pmxcfs[17008]: [dcdb] notice: all data is up to date Aug 4 07:44:29 pve12 pmxcfs[17008]: [status] notice: received all states Aug 4 07:44:29 pve12 pmxcfs[17008]: [status] notice: all data is up to date Aug 4 07:44:30 pve12 systemd[1]: Started The Proxmox VE cluster filesystem.
We remove pve01 using the "how to" from proxmox website as we did a lot of time in the past. If I remember correctly, we removed pve19 also and it works fine on all other node!

On all pve11,12,13,14,15 , the only issue it's pve12 that still "see" pve01

In journalctl, we got nothing except a lot of "received log"
 
Last edited:
@fiona

We just upgraded and everything works well... but in the web interface, we are having

1659626805928.png

Is there a way to completely remove pve01 and pve19 ?
 
Do you have any HA services configured currently?

Please provide the output of the following:
Code:
ha-manager status -v
cat /etc/pve/ha/manager_status
cat /etc/pve/nodes/pve19/lrm_status
cat /etc/pve/nodes/pve01/lrm_status
 
quorum OK master pve15 (active, Fri Aug 5 06:48:17 2022) lrm pve01 (maintenance mode, Sat Feb 26 09:37:13 2022) lrm pve11 (idle, Fri Aug 5 06:48:17 2022) lrm pve12 (idle, Fri Aug 5 06:48:19 2022) lrm pve13 (idle, Fri Aug 5 06:48:20 2022) lrm pve14 (idle, Fri Aug 5 06:48:17 2022) lrm pve15 (idle, Fri Aug 5 06:48:17 2022) lrm pve19 (maintenance mode, Sun May 31 16:21:28 2020) full cluster state: { "lrm_status" : { "pve01" : { "mode" : "maintenance", "results" : {}, "state" : "wait_for_agent_lock", "timestamp" : 1645886233 }, "pve11" : { "mode" : "active", "results" : {}, "state" : "wait_for_agent_lock", "timestamp" : 1659696497 }, "pve12" : { "mode" : "active", "results" : {}, "state" : "wait_for_agent_lock", "timestamp" : 1659696499 }, "pve13" : { "mode" : "active", "results" : {}, "state" : "wait_for_agent_lock", "timestamp" : 1659696500 }, "pve14" : { "mode" : "active", "results" : {}, "state" : "wait_for_agent_lock", "timestamp" : 1659696497 }, "pve15" : { "mode" : "active", "results" : {}, "state" : "wait_for_agent_lock", "timestamp" : 1659696497 }, "pve19" : { "mode" : "maintenance", "results" : {}, "state" : "wait_for_agent_lock", "timestamp" : 1590956488 } }, "manager_status" : { "master_node" : "pve15", "node_status" : { "pve01" : "maintenance", "pve11" : "online", "pve12" : "online", "pve13" : "online", "pve14" : "online", "pve15" : "online", "pve19" : "maintenance" }, "service_status" : {}, "timestamp" : 1659696497 }, "quorum" : { "node" : "pve15", "quorate" : "1" } }

cat /etc/pve/ha/manager_status {"timestamp":1659696527,"master_node":"pve15","service_status":{},"node_status":{"pve11":"online","pve15":"online","pve14":"online","pve12":"online","pve13":"online","pve19":"maintenance","pve01":"maintenance"}} cat /etc/pve/nodes/pve19/lrm_status {"mode":"maintenance","timestamp":1590956488,"state":"wait_for_agent_lock","results":{}} cat /etc/pve/nodes/pve01/lrm_status {"mode":"maintenance","timestamp":1645886233,"results":{},"state":"wait_for_agent_lock"}
 
Last edited:
cat /etc/pve/nodes/pve19/lrm_status {"mode":"maintenance","timestamp":1590956488,"state":"wait_for_agent_lock","results":{}}root@pve15:~# cat /etc/pve/nodes/pve01/lrm_status {"mode":"maintenance","timestamp":1645886233,"results":{},"state":"wait_for_agent_lock"}root@pve15:~#
I guess the HA manager thinks that the LRM for these two nodes still exists, because of these left-over files. After removing these files, the manager should switch the LRM status unkown and the nodes should disappear after a while (IIRC an hour).

You might even want to remove the whole directories for the gone nodes after doing a safety check if anything in there is still needed.
 
Yeah I already check and there's nothing good in pve19 and pve01, so I just need to remove /etc/pve/nodes/{pve01,pve19} on each node ?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!