Old deleted node showing on HA lrm

carles89

Renowned Member
May 27, 2015
73
7
73
Hi all,

On one of our customers, we used to have a 2-node cluster. Nodes' name were: "mdc0" and "mdc1".One of the servers failed (mdc0), and we removed it using pvecm delnode mdc0.

I've noticed that this old node is still showing on HA GUI Menu:

upload_2018-4-27_14-12-48.png

Here is the output of /etc/pve/.members (the old node is not listed):
Code:
root@mdc1:~# cat /etc/pve/.members

{
"nodename": "mdc1",
"version": 3,
"cluster": { "name": "cluster", "version": 3, "nodes": 1, "quorate": 1 },
"nodelist": {
  "mdc1": { "id": 2, "online": 1, "ip": "192.168.0.199"}
  }
}

It neither appears on /etc/pve/corosync.conf:
--The only thing I see here is that bindnetaddr is from the old node (mdc0). The current node (mdc1) has 192.168.0.199.--

Code:
root@mdc1:~# cat /etc/pve/corosync.conf
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: mdc1
    nodeid: 2
    quorum_votes: 1
    ring0_addr: mdc1
  }

}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: cluster
  config_version: 3
  ip_version: ipv4
  secauth: on
  version: 2
  interface {
    bindnetaddr: 192.168.0.200
    ringnumber: 0
  }

}

But it shows if I run ha-manager status:

Code:
root@mdc1:~# ha-manager status
quorum OK
master mdc0 (idle, Fri Sep  2 15:31:16 2016)
lrm mdc0 (unable to read lrm status)
lrm mdc1 (idle, Fri Apr 27 14:14:14 2018)

Any ideas?

Thank you!
 

Attachments

  • upload_2018-4-27_14-13-44.png
    upload_2018-4-27_14-13-44.png
    13.3 KB · Views: 10
Hi Dietmar

I have the same issue and I have tried the below suggestion, but it makes no difference

# rm /etc/pve/ha/manager_status

root@pve02:~# ha-manager status
quorum OK
master pve02 (active, Thu Jun 20 10:37:33 2019)
lrm pve01 (active, Thu Jun 20 10:37:35 2019)
lrm pve02 (active, Thu Jun 20 10:37:34 2019)
lrm pve03 (active, Thu Jun 20 10:37:34 2019)
lrm pve04 (active, Thu Jun 20 10:37:41 2019)
lrm pve05 (old timestamp - dead?, Thu Jun 20 09:37:43 2019)
 
I guess you can fix that by simply removing the ha manager status file:

# rm /etc/pve/ha/manager_status
I tried this as well but no fix unfortunately. I already have done the proper removal outlined in the cluster manager doc (pvecm delnode <>), then removed the nodes from the /etc/pve/nodes/ directory, I have restarted the ha cluster services, etc. but still no fix. Still seeing the 3 servers that I removed. I also tried editing the manage_status file but it seems that it's being generated from something else.

1629468010112.png
 
I tried this as well but no fix unfortunately. I already have done the proper removal outlined in the cluster manager doc (pvecm delnode <>), then removed the nodes from the /etc/pve/nodes/ directory, I have restarted the ha cluster services, etc. but still no fix. Still seeing the 3 servers that I removed. I also tried editing the manage_status file but it seems that it's being generated from something else.

View attachment 28806
Okay, I managed to fix it by running
Code:
systemctl restart pve-ha-crm.service
on every remaining node
 
Well, I tried everything listed here I believe, yet I've still got my old nodes in my ha-manager status output:

Code:
root@prox02:~# ha-manager status
unable to read file '/etc/pve/nodes/prox01/lrm_status'
unable to read file '/etc/pve/nodes/prox03/lrm_status'
unable to read file '/etc/pve/nodes/prox06/lrm_status'
quorum OK
master proxstore12 (active, Mon Jan  9 21:12:28 2023)
lrm prox01 (unable to read lrm status)
lrm prox02 (idle, Mon Jan  9 21:12:28 2023)
lrm prox03 (unable to read lrm status)
lrm prox04 (active, Mon Jan  9 21:12:29 2023)
lrm prox06 (unable to read lrm status)
lrm prox11 (idle, Mon Jan  9 21:12:27 2023)
lrm prox13 (idle, Mon Jan  9 21:12:27 2023)
lrm prox14 (active, Mon Jan  9 21:12:28 2023)
lrm proxstore11 (idle, Mon Jan  9 21:12:27 2023)
lrm proxstore12 (idle, Mon Jan  9 21:12:28 2023)
lrm proxstore13 (idle, Mon Jan  9 21:12:27 2023)
service vm:258 (prox04, started)
[... further services omitted ...]

At first they were listed "in maintenance" after I cleanly shut them down and then did a pvecm delnode.

This did not change when restarting the pve-ha-crm services nor when removing /etc/pve/ha/manager_status.
The nodes still existed in /etc/pve/nodes, and now after I'd deleted those directories I'm stuck with the output above.
To make sure I've deleted the status file again and restarted the crm service on all nodes again, but that makes no difference.

So, how do I finally get rid of those entries, and why is it not documented in the documentation for decommissioning nodes? The documentation basically boils down to "just do pvecm delnode" and makes no mention at all of HA cluster components, LRM/CRM. Nor does the HA documentation mention anything about removing a node...


I don't particularly care about the "misleading" output, listing irrelevant nodes, but does it have any negative consequences for the HA quorum? How do the 3 dead nodes play into quorum calculation? Am I going to kill my HA cluster by replacing even more nodes?


EDIT: well, apparently you need to do any or all of the above, then wait for some amount of time. The node list is now correct, without any further changes, config modifications, service or node reboots.

EDIT2: I've replaced the remaining nodes and for those it was sufficient to delete the /etc/pve/nodes/<NODE> folder after the pvecm delnode command, ha-manager status would switch to "unable to read lrm status" for a short while and then the node was properly gone from the ha-manager list.
 
Last edited:
Hey guys,

I edited "/etc/pve/ha/manager status" removing the missing node.

Original
{"timestamp":1675632795,"node_status":{"DELL":gone","horizon":"online","fenix":"online"},"service_status":{},"master_node":"fenix"}

Modify
{"timestamp":1675632795,"node_status":{"horizon":"online","fenix":"online"},"service_status":{},"master_node":"fenix"}
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!