[SOLVED] Problem with cluster after reinstalling node with failing Disk

sysadminfromhell · Sep 1, 2023

Hello everyone,

so a few days ago the electricity failed in my homeoffice and the small homelab for Home purposes failed.
One node was complelty failing until the point the ssd inside was basically dead. It could boot, sometimes but after a few minutes went offline again due I/O error.
I came up with an Idea to clone the ssd to a new one, this didnt work out because the disk was far beyond repair/cloning.
So all the information about the cluster was gone.

I decided to remove the old node and reinstall it with the same name and IP. I guess this was the mistake I made because now I get a lot of errors due the LRM Status and that annoying and writes the syslog full.
Is there a way to cleanup my mess or do I have to remove the node again and give it a new name + IP to get rid of this error?
I guess the problem here is that the name isnt exactly the same. The hostname now is all lowercase before it was all uppercase.
The folder which he tries to find is also already gone, the lrm status from the dead ssd is also "gone".

What can I do?

Code:

Sep 01 10:57:51 NucMox03 pve-ha-crm[32257]: unable to read file '/etc/pve/nodes/dead/lrm_status'
Sep 01 10:58:01 NucMox03 pve-ha-crm[32257]: unable to read file '/etc/pve/nodes/dead/lrm_status'
Sep 01 10:58:11 NucMox03 pve-ha-crm[32257]: unable to read file '/etc/pve/nodes/dead/lrm_status'
Sep 01 10:58:21 NucMox03 pve-ha-crm[32257]: unable to read file '/etc/pve/nodes/dead/lrm_status'
Sep 01 10:58:31 NucMox03 pve-ha-crm[32257]: unable to read file '/etc/pve/nodes/dead/lrm_status'
Sep 01 10:58:41 NucMox03 pve-ha-crm[32257]: unable to read file '/etc/pve/nodes/dead/lrm_status'
Sep 01 10:58:51 NucMox03 pve-ha-crm[32257]: unable to read file '/etc/pve/nodes/dead/lrm_status'
Sep 01 10:59:01 NucMox03 pve-ha-crm[32257]: unable to read file '/etc/pve/nodes/dead/lrm_status'
Sep 01 10:59:11 NucMox03 pve-ha-crm[32257]: unable to read file '/etc/pve/nodes/dead/lrm_status'
Sep 01 10:59:21 NucMox03 pve-ha-crm[32257]: unable to read file '/etc/pve/nodes/dead/lrm_status'
Sep 01 10:59:31 NucMox03 pve-ha-crm[32257]: unable to read file '/etc/pve/nodes/dead/lrm_status'

sysadminfromhell · Sep 1, 2023

Ah yes before I did a pvecm delnode "dead" and then after the fresh install (about 1 or 2 hours) a pvecm add

sysadminfromhell · Sep 1, 2023

Nevermind, I foudn a fix, todo

Code:

systemctl stop pve-ha-crm.service && rm -f /etc/pve/ha/manager_status && systemctl start pve-ha-crm.service

on all nodes at the same time.

Search

Search

[SOLVED] Problem with cluster after reinstalling node with failing Disk

sysadminfromhell

Member

sysadminfromhell

Member

sysadminfromhell

Member

We value your privacy