Add node to cluster and not reboot all nodes

freebee · May 21, 2020

Hi.
I have 4 nodes and today i wll add one more.
Last time i do this, all nodes reboot without reason.
So, how i can add a node to cluster without reboot all other nodes ?.
Some secure procedure to not reboot all servers.
Best regards.

freebee · May 21, 2020

I add now and again every node in cluster down....
That way, proxmox is unstable.

budy · May 21, 2020

This is surely not a regular behaviour. I have also setup two PVE clusters and also added nodes later on, and none of the other nodes rebooted itself. If your other cluster nodes rebootet, then there should be logs, which explain, what happened and why.

E.g., I'd check the corosync logs…

freebee · May 24, 2020

I'm cheking the logs. This is output:

ha-manager status --verbose
unable to read file '/etc/pve/nodes//lrm_status'
quorum OK
master PV-02-XXX (active, Sun May 24 13:01:57 2020)
lrm (unable to read lrm status)
lrm PV-01-XXX (active, Sun May 24 13:02:00 2020)
lrm PV-02-XXX (active, Sun May 24 13:01:57 2020)
lrm PV-03-XXX (active, Sun May 24 13:02:00 2020)
lrm PV-04-XXX (active, Sun May 24 13:02:02 2020)
lrm PV-05-XXX (active, Sun May 24 13:02:02 2020)

.....

full cluster state:
unable to read file '/etc/pve/nodes//lrm_status'
{
"lrm_status" : {
"" : {
"mode" : "unknown"
}

What is lrm ?.
I check in all nodes, every node see the contents of other:
cat /etc/pve/nodes/PV-XXX-XXX/lrm_status

freebee · May 24, 2020

The VMs normally do live migration. Everything is normal until i add a new node, then all host servers reboot at same time.

budy · May 25, 2020

freebee said:
unable to read file '/etc/pve/nodes//lrm_status'

Well, /etc/pve hosts PVE's cluster file system, which is an instance of a FUSE filesystem and that got somehow screwed. I'd check my network connections thoroughly - if proxmox can't access this file system, it'll likely fence itself and reboot.

freebee · Jun 11, 2020

The problem is Ghost node.
cat /etc/pve/ha/manager_status :
{"node_status":{"":"fence","PV-...

https://forum.proxmox.com/threads/lrm-unable-to-read-lrm-status.65415
https://forum.proxmox.com/threads/cannot-delete-ghost-node.13752

Suggestion:
When generating /etc/pve/ha/manager_status check if there is any empty string, or create from the cluster configuration file eliminating any chance of empty string, inducing the cluster to instability. Other suggestion is a diagnostic script for the cluster, which seeks to analyze the configuration files and compare with the results found in the generated statuses, scanning for errors. Knowing the status files structure, would be more easy to implement.

Search

Search

Add node to cluster and not reboot all nodes

freebee

Active Member

freebee

Active Member

budy

Active Member

freebee

Active Member

freebee

Active Member

budy

Active Member

freebee

Active Member