Old deleted node showing on HA lrm

carles89 · Apr 27, 2018

Hi all,

On one of our customers, we used to have a 2-node cluster. Nodes' name were: "mdc0" and "mdc1".One of the servers failed (mdc0), and we removed it using pvecm delnode mdc0.

I've noticed that this old node is still showing on HA GUI Menu:

Here is the output of /etc/pve/.members (the old node is not listed):

Code:

root@mdc1:~# cat /etc/pve/.members

{
"nodename": "mdc1",
"version": 3,
"cluster": { "name": "cluster", "version": 3, "nodes": 1, "quorate": 1 },
"nodelist": {
  "mdc1": { "id": 2, "online": 1, "ip": "192.168.0.199"}
  }
}

It neither appears on /etc/pve/corosync.conf:
--The only thing I see here is that bindnetaddr is from the old node (mdc0). The current node (mdc1) has 192.168.0.199.--

Code:

root@mdc1:~# cat /etc/pve/corosync.conf
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: mdc1
    nodeid: 2
    quorum_votes: 1
    ring0_addr: mdc1
  }

}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: cluster
  config_version: 3
  ip_version: ipv4
  secauth: on
  version: 2
  interface {
    bindnetaddr: 192.168.0.200
    ringnumber: 0
  }

}

But it shows if I run ha-manager status:

Code:

root@mdc1:~# ha-manager status
quorum OK
master mdc0 (idle, Fri Sep  2 15:31:16 2016)
lrm mdc0 (unable to read lrm status)
lrm mdc1 (idle, Fri Apr 27 14:14:14 2018)

Any ideas?

Thank you!

dietmar · Apr 27, 2018

I guess you can fix that by simply removing the ha manager status file:

# rm /etc/pve/ha/manager_status

carles89 · Apr 27, 2018

Thanks Dietmar, it worked!

pieteras.meyer · Jun 20, 2019

Hi Dietmar

I have the same issue and I have tried the below suggestion, but it makes no difference

# rm /etc/pve/ha/manager_status

root@pve02:~# ha-manager status
quorum OK
master pve02 (active, Thu Jun 20 10:37:33 2019)
lrm pve01 (active, Thu Jun 20 10:37:35 2019)
lrm pve02 (active, Thu Jun 20 10:37:34 2019)
lrm pve03 (active, Thu Jun 20 10:37:34 2019)
lrm pve04 (active, Thu Jun 20 10:37:41 2019)
lrm pve05 (old timestamp - dead?, Thu Jun 20 09:37:43 2019)

apatchin · Aug 20, 2021

dietmar said:
I guess you can fix that by simply removing the ha manager status file:

# rm /etc/pve/ha/manager_status

I tried this as well but no fix unfortunately. I already have done the proper removal outlined in the cluster manager doc (pvecm delnode <>), then removed the nodes from the /etc/pve/nodes/ directory, I have restarted the ha cluster services, etc. but still no fix. Still seeing the 3 servers that I removed. I also tried editing the manage_status file but it seems that it's being generated from something else.

apatchin · Aug 20, 2021

apatchin said:
I tried this as well but no fix unfortunately. I already have done the proper removal outlined in the cluster manager doc (pvecm delnode <>), then removed the nodes from the /etc/pve/nodes/ directory, I have restarted the ha cluster services, etc. but still no fix. Still seeing the 3 servers that I removed. I also tried editing the manage_status file but it seems that it's being generated from something else.

View attachment 28806

Okay, I managed to fix it by running

Code:

systemctl restart pve-ha-crm.service

on every remaining node

pwizard · Jan 9, 2023

Well, I tried everything listed here I believe, yet I've still got my old nodes in my ha-manager status output:

Code:

root@prox02:~# ha-manager status
unable to read file '/etc/pve/nodes/prox01/lrm_status'
unable to read file '/etc/pve/nodes/prox03/lrm_status'
unable to read file '/etc/pve/nodes/prox06/lrm_status'
quorum OK
master proxstore12 (active, Mon Jan  9 21:12:28 2023)
lrm prox01 (unable to read lrm status)
lrm prox02 (idle, Mon Jan  9 21:12:28 2023)
lrm prox03 (unable to read lrm status)
lrm prox04 (active, Mon Jan  9 21:12:29 2023)
lrm prox06 (unable to read lrm status)
lrm prox11 (idle, Mon Jan  9 21:12:27 2023)
lrm prox13 (idle, Mon Jan  9 21:12:27 2023)
lrm prox14 (active, Mon Jan  9 21:12:28 2023)
lrm proxstore11 (idle, Mon Jan  9 21:12:27 2023)
lrm proxstore12 (idle, Mon Jan  9 21:12:28 2023)
lrm proxstore13 (idle, Mon Jan  9 21:12:27 2023)
service vm:258 (prox04, started)
[... further services omitted ...]

At first they were listed "in maintenance" after I cleanly shut them down and then did a pvecm delnode.

This did not change when restarting the pve-ha-crm services nor when removing /etc/pve/ha/manager_status.
The nodes still existed in /etc/pve/nodes, and now after I'd deleted those directories I'm stuck with the output above.
To make sure I've deleted the status file again and restarted the crm service on all nodes again, but that makes no difference.

So, how do I finally get rid of those entries, and why is it not documented in the documentation for decommissioning nodes? The documentation basically boils down to "just do pvecm delnode" and makes no mention at all of HA cluster components, LRM/CRM. Nor does the HA documentation mention anything about removing a node...

I don't particularly care about the "misleading" output, listing irrelevant nodes, but does it have any negative consequences for the HA quorum? How do the 3 dead nodes play into quorum calculation? Am I going to kill my HA cluster by replacing even more nodes?

EDIT: well, apparently you need to do any or all of the above, then wait for some amount of time. The node list is now correct, without any further changes, config modifications, service or node reboots.

EDIT2: I've replaced the remaining nodes and for those it was sufficient to delete the /etc/pve/nodes/<NODE> folder after the pvecm delnode command, ha-manager status would switch to "unable to read lrm status" for a short while and then the node was properly gone from the ha-manager list.

liptech · Feb 10, 2023

Hey guys,

I edited "/etc/pve/ha/manager status" removing the missing node.

Original
{"timestamp":1675632795,"node_status":{"DELL":gone","horizon":"online","fenix":"online"},"service_status":{},"master_node":"fenix"}

Modify
{"timestamp":1675632795,"node_status":{"horizon":"online","fenix":"online"},"service_status":{},"master_node":"fenix"}

SteveITS · May 8, 2025

Thanks. For reference to anyone finding this, "some amount of time" is somewhere between about 30 minutes and four hours, which is when I next looked.

Search

Search

Old deleted node showing on HA lrm

carles89

Renowned Member

Attachments

dietmar

Proxmox Staff Member

carles89

Renowned Member

pieteras.meyer

Renowned Member

apatchin

New Member

apatchin

New Member

pwizard

New Member

liptech

Active Member

SteveITS

Active Member

We value your privacy