ha-crm unable to read file

Wout Van den Ende · Aug 7, 2019

We've been working on upgrading our proxmox/ceph cluster to new machines and 10G for ceph. Another system engineer was in the process of removing 2 nodes from the cluster. He followed a tutorial that suggested manually removing the old nodes from /etc/pve/nodes. This was obviously false information, the tutorial's source was a forum thread here from 2014.

Now, the ha-manager and the HA tab in gui won't won't work, throwing the error "got unexpected error - unable to read file '/etc/pve/nodes/stor1/lrm_status'". I've been browsing around the forums and it seems that removing the /etc/pve/ha/manager_status file might help me, but i'm quite hesistant since the last thing i tried fenced the whole cluster.

Any guidance on what to do?

Dominic · Aug 8, 2019

Wout Van den Ende said:
He followed a tutorial that suggested manually removing the old nodes from /etc/pve/nodes.
(...)
Now, the ha-manager and the HA tab in gui won't won't work, throwing the error "got unexpected error - unable to read file '/etc/pve/nodes/stor1/lrm_status'".

I could not reproduce a problem with HA. The lrm_status files get restored automatically on all nodes. The cluster is broken nonetheless.

Please execute

Code:

pvecm status

Code:

ha-manager config

Code:

tree /etc/pve/nodes

and post the result formatted as code.

Wout Van den Ende · Aug 8, 2019

It looks like the HA fixed itself. We added another VM to the HA, using the HA option in the VM, not the HA tab in datacenter. Checked with

Code:

journalctl -u pve-ha-crm -u pve-ha-lrm -u corosync

(some details omitted)

Code:

pve-ha-crm[7197]: got unexpected error - unable to read file '/etc/pve/nodes/stor1/lrm_status'
pve-ha-crm[7197]: got unexpected error - unable to read file '/etc/pve/nodes/stor1/lrm_status'
pve-ha-crm[7197]: deleting gone node 'stor1', not a cluster member anymore.
pve-ha-crm[7197]: deleting gone node 'stor1', not a cluster member anymore.
pve-ha-crm[7197]: adding new service 'vm:xxx' on node 'xxx'

Dominic said:
I could not reproduce a problem with HA. The lrm_status files get restored automatically on all nodes. The cluster is broken nonetheless.

He did remove the nodes first with pvecm delnode, so the cluster was fine.

Search

Search

ha-crm unable to read file

Wout Van den Ende

Active Member

Dominic

Proxmox Retired Staff

Wout Van den Ende

Active Member

We value your privacy