[SOLVED] Problem with cluster after reinstalling node with failing Disk

Feb 3, 2022
62
5
13
28
Hello everyone,

so a few days ago the electricity failed in my homeoffice and the small homelab for Home purposes failed.
One node was complelty failing until the point the ssd inside was basically dead. It could boot, sometimes but after a few minutes went offline again due I/O error.
I came up with an Idea to clone the ssd to a new one, this didnt work out because the disk was far beyond repair/cloning.
So all the information about the cluster was gone.

I decided to remove the old node and reinstall it with the same name and IP. I guess this was the mistake I made because now I get a lot of errors due the LRM Status and that annoying and writes the syslog full.
Is there a way to cleanup my mess or do I have to remove the node again and give it a new name + IP to get rid of this error?
I guess the problem here is that the name isnt exactly the same. The hostname now is all lowercase before it was all uppercase.
The folder which he tries to find is also already gone, the lrm status from the dead ssd is also "gone".

What can I do?

Code:
Sep 01 10:57:51 NucMox03 pve-ha-crm[32257]: unable to read file '/etc/pve/nodes/dead/lrm_status'
Sep 01 10:58:01 NucMox03 pve-ha-crm[32257]: unable to read file '/etc/pve/nodes/dead/lrm_status'
Sep 01 10:58:11 NucMox03 pve-ha-crm[32257]: unable to read file '/etc/pve/nodes/dead/lrm_status'
Sep 01 10:58:21 NucMox03 pve-ha-crm[32257]: unable to read file '/etc/pve/nodes/dead/lrm_status'
Sep 01 10:58:31 NucMox03 pve-ha-crm[32257]: unable to read file '/etc/pve/nodes/dead/lrm_status'
Sep 01 10:58:41 NucMox03 pve-ha-crm[32257]: unable to read file '/etc/pve/nodes/dead/lrm_status'
Sep 01 10:58:51 NucMox03 pve-ha-crm[32257]: unable to read file '/etc/pve/nodes/dead/lrm_status'
Sep 01 10:59:01 NucMox03 pve-ha-crm[32257]: unable to read file '/etc/pve/nodes/dead/lrm_status'
Sep 01 10:59:11 NucMox03 pve-ha-crm[32257]: unable to read file '/etc/pve/nodes/dead/lrm_status'
Sep 01 10:59:21 NucMox03 pve-ha-crm[32257]: unable to read file '/etc/pve/nodes/dead/lrm_status'
Sep 01 10:59:31 NucMox03 pve-ha-crm[32257]: unable to read file '/etc/pve/nodes/dead/lrm_status'
 
Last edited:
Nevermind, I foudn a fix, todo
Code:
systemctl stop pve-ha-crm.service && rm -f /etc/pve/ha/manager_status && systemctl start pve-ha-crm.service
on all nodes at the same time.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!