ha-crm unable to read file

Jun 5, 2019
2
0
21
We've been working on upgrading our proxmox/ceph cluster to new machines and 10G for ceph. Another system engineer was in the process of removing 2 nodes from the cluster. He followed a tutorial that suggested manually removing the old nodes from /etc/pve/nodes. This was obviously false information, the tutorial's source was a forum thread here from 2014.

Now, the ha-manager and the HA tab in gui won't won't work, throwing the error "got unexpected error - unable to read file '/etc/pve/nodes/stor1/lrm_status'". I've been browsing around the forums and it seems that removing the /etc/pve/ha/manager_status file might help me, but i'm quite hesistant since the last thing i tried fenced the whole cluster.

Any guidance on what to do?
 
He followed a tutorial that suggested manually removing the old nodes from /etc/pve/nodes.
(...)
Now, the ha-manager and the HA tab in gui won't won't work, throwing the error "got unexpected error - unable to read file '/etc/pve/nodes/stor1/lrm_status'".
I could not reproduce a problem with HA. The lrm_status files get restored automatically on all nodes. The cluster is broken nonetheless.

Please execute
Code:
pvecm status
Code:
ha-manager config
Code:
tree /etc/pve/nodes
and post the result formatted as code.
 
It looks like the HA fixed itself. We added another VM to the HA, using the HA option in the VM, not the HA tab in datacenter. Checked with
Code:
journalctl -u pve-ha-crm -u pve-ha-lrm -u corosync
(some details omitted)
Code:
pve-ha-crm[7197]: got unexpected error - unable to read file '/etc/pve/nodes/stor1/lrm_status'
pve-ha-crm[7197]: got unexpected error - unable to read file '/etc/pve/nodes/stor1/lrm_status'
pve-ha-crm[7197]: deleting gone node 'stor1', not a cluster member anymore.
pve-ha-crm[7197]: deleting gone node 'stor1', not a cluster member anymore.
pve-ha-crm[7197]: adding new service 'vm:xxx' on node 'xxx'
I could not reproduce a problem with HA. The lrm_status files get restored automatically on all nodes. The cluster is broken nonetheless.
He did remove the nodes first with pvecm delnode, so the cluster was fine.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!