Add node to cluster and not reboot all nodes

freebee

Active Member
May 8, 2020
66
4
28
41
Hi.
I have 4 nodes and today i wll add one more.
Last time i do this, all nodes reboot without reason.
So, how i can add a node to cluster without reboot all other nodes ?.
Some secure procedure to not reboot all servers.
Best regards.
 
I add now and again every node in cluster down....
That way, proxmox is unstable.
 
This is surely not a regular behaviour. I have also setup two PVE clusters and also added nodes later on, and none of the other nodes rebooted itself. If your other cluster nodes rebootet, then there should be logs, which explain, what happened and why.

E.g., I'd check the corosync logs…
 
I'm cheking the logs. This is output:

ha-manager status --verbose
unable to read file '/etc/pve/nodes//lrm_status'
quorum OK
master PV-02-XXX (active, Sun May 24 13:01:57 2020)
lrm (unable to read lrm status)
lrm PV-01-XXX (active, Sun May 24 13:02:00 2020)
lrm PV-02-XXX (active, Sun May 24 13:01:57 2020)
lrm PV-03-XXX (active, Sun May 24 13:02:00 2020)
lrm PV-04-XXX (active, Sun May 24 13:02:02 2020)
lrm PV-05-XXX (active, Sun May 24 13:02:02 2020)

.....

full cluster state:
unable to read file '/etc/pve/nodes//lrm_status'
{
"lrm_status" : {
"" : {
"mode" : "unknown"
}

What is lrm ?.
I check in all nodes, every node see the contents of other:
cat /etc/pve/nodes/PV-XXX-XXX/lrm_status
 
The VMs normally do live migration. Everything is normal until i add a new node, then all host servers reboot at same time.
 
unable to read file '/etc/pve/nodes//lrm_status'

Well, /etc/pve hosts PVE's cluster file system, which is an instance of a FUSE filesystem and that got somehow screwed. I'd check my network connections thoroughly - if proxmox can't access this file system, it'll likely fence itself and reboot.
 
The problem is Ghost node.
cat /etc/pve/ha/manager_status :
{"node_status":{"":"fence","PV-...

https://forum.proxmox.com/threads/lrm-unable-to-read-lrm-status.65415
https://forum.proxmox.com/threads/cannot-delete-ghost-node.13752

Suggestion:
When generating /etc/pve/ha/manager_status check if there is any empty string, or create from the cluster configuration file eliminating any chance of empty string, inducing the cluster to instability. Other suggestion is a diagnostic script for the cluster, which seeks to analyze the configuration files and compare with the results found in the generated statuses, scanning for errors. Knowing the status files structure, would be more easy to implement.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!