Restarting PVE on all nodes in cluster

G

gfoster

Guest
Hi Folks,

I've run into a few problems with CMAN/Corosync lately where the cluster will lose quorum and/or go offline. All nodes except the master show red in the GUI, although sometimes many nodes are registered via corosync (and show as such in /var/log/cluster/corosync.log). We've switched from multicast to unicast, and are still seeing issues sometimes when changes are made (such as adding a node), where the cluster will go offline.

Currently I have (26 nodes in the cluster, and I understand that (16) is the recommended limit. Are there some recommended tuning parameters for a cluster of this size ?

In the event where things become confused (and yes, I agree they shouldn't), what is the recommended process for bringing all PVE related processes down, or restarting them on nodes which are currently having problems?

Side note: Sometimes when executing commands via SSH from the master, I'm still prompted for a password. Is this due to keys being stored in (/etc/pve/priv), which may not be available when pve-cluster is offline ?

Thanks in advance,
Greg.
 
I've run into a few problems with CMAN/Corosync lately where the cluster will lose quorum and/or go offline. All nodes except the master show red in the GUI, although sometimes many nodes are registered via corosync (and show as such in /var/log/cluster/corosync.log). We've switched from multicast to unicast, and are still seeing issues sometimes when changes are made (such as adding a node), where the cluster will go offline.

Currently I have (26 nodes in the cluster...

Wow - you are using unicast with 26 Nodes! The recommended limit is 4 Nodes for unicast.
 
gfoster,

I don't know if this would work for you, but given that I just posted an "exploration" thread about Proxmox and Salt, using Salt may be an option for you in restarting services from one master Salt node. Preferably, you'd be running your Salt master outside of your Proxmox cluster. The Salt minion would need run on each of your Proxmox nodes.

You could even write a Salt module to monitor certain services and fire restart them automatically if they fail.

Check it out and see if it would meet your needs and possibly solve your problem.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!