Update documentation

freebee · Apr 16, 2021

Hi.
Here in the documentation of a HA Cluster (https://pve.proxmox.com/wiki/Cluster_Manager) in "Remove a Cluster Node" we have the command:
pvecm delnode hp4
The tutorial says to turnoff the server before remove from node, however he give a error:

pvecm delnode MIRROR-USA-VIR-01-PV01
Killing node 4
Could not kill node (error = CS_ERR_NOT_EXIST)
error during cfs-locked 'file-corosync_conf' operation: command 'corosync-cfgtool -k 4' failed: exit code 1

When look in other comments, i look at /etc/pve/members and corosync.conf and nothing wrong there, so, the problem was with status stuck on gui.
The solution is: systemctl stop pve-ha-crm.service && rm -f /etc/pve/ha/manager_status && systemctl start pve-ha-crm.service

In the DOC is important add 'workarounds for remove node errors':
If the command pvecm delnode give a error and when try again the same command returns a: error during cfs-locked 'file-corosync_conf' operation: Node/IP: hp4 is not a known host of the cluster, verify /etc/pve/.members and /etc/pve/corosync.conf if has the removed node name (hp4 in this case). If not found in this files, execute the command ha-manager status. If "unable to read file '/etc/pve/nodes/hp4/lrm_status'" is returned, execute on each server/node:
systemctl stop pve-ha-crm.service && rm -f /etc/pve/ha/manager_status && systemctl start pve-ha-crm.service
This will fix the GUI.

Moayad · Apr 16, 2021

Hi,

What is the version of your PVE?

t.lamprecht · Apr 16, 2021

freebee said:
pvecm delnode MIRROR-USA-VIR-01-PV01
Killing node 4
Could not kill node (error = CS_ERR_NOT_EXIST)
error during cfs-locked 'file-corosync_conf' operation: command 'corosync-cfgtool -k 4' failed: exit code 1

That normally rather means that the node you removed was already dropped from corosync?

freebee said:
The solution is: systemctl stop pve-ha-crm.service && rm -f /etc/pve/ha/manager_status && systemctl start pve-ha-crm.service

HA has nothing directly to do with corosync, so I'd like to know what the original problem is which that solution solves?

timdonovan · Apr 20, 2021

I have a similar issue. I have a dead node, it shows in the GUI, but I cannot delete it because pvecm nodes doesn't even list it, and pvecm delnode proxmox1 throws:

Code:

Killing node 1
Could not kill node (error = CS_ERR_NOT_EXIST)
error during cfs-locked 'file-corosync_conf' operation: command 'corosync-cfgtool -k 1' failed: exit code 1

There seems to be a bug in the documentation or proxmox here maybe? I can't believe removing a failed node isn't covered by either.

Edit: oh it is a bug in both the documentation and pve: https://forum.proxmox.com/threads/feedback-on-admin-guide-removing-node-from-cluster.87112/

t.lamprecht · Apr 21, 2021

timdonovan said:
Edit: oh it is a bug in both the documentation and pve: https://forum.proxmox.com/threads/feedback-on-admin-guide-removing-node-from-cluster.87112/

Do you have replication jobs? And it's only an issue for the docs, do not see how this is a bug in Proxmox VE?

timdonovan said:
I have a similar issue. I have a dead node, it shows in the GUI,

In general, the removed node also shows up, if there are left-over guest configurations still in /etc/pve/nodes/NAME/ from the remaining nodes. As then, the web-interface gets those VMIDs and has to map them somewhere, the most fitting place is the node name it was on.

So for that case you can delete the VMIDs configurations in there, after ensuring you do not need them any more.

Search

Search

Update documentation

freebee

Well-Known Member

Moayad

Proxmox Staff Member

t.lamprecht

Proxmox Staff Member

timdonovan

Active Member

t.lamprecht

Proxmox Staff Member

We value your privacy