all servers reboot by themselves - URGENT

svacaroaia

Member
Oct 4, 2012
36
0
6
Hi,
I have a Proxmox cluster with 4 servers

In order to solve a packet loss issue I decided to upgrade to the latest so I ran
aptitude update
aptitude full-upgrade
on one of the server then rebooted

Unfortunately this created a "chain reaction" as all the other servers stated to reboot by themselves

The only "culpirt" I can think of is fencing but everything seems alright
Any help/suggestions will be truly appreciated
...and how can I stop the chain rebootingas I can barely have time to type 3 or 4 commands before doing it again ???

Here is my cluster.conf

<cluster config_version="30" name="bl02-cluster01">
<cman expected_votes="1" keyfile="/var/lib/pve-cluster/corosync.authkey"/>
<fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
<clusternodes>
<clusternode name="blh02-14" nodeid="1" votes="1">
<fence>
<method name="1">
<device action="reboot" name="bla02-14"/>
</method>
</fence>
</clusternode>
<clusternode name="blh02-13" nodeid="2" votes="1">
<fence>
<method name="1">
<device action="reboot" name="bla02-13"/>
</method>
</fence>
</clusternode>
<clusternode name="blh02-10" nodeid="3" votes="1">
<fence>
<method name="1">
<device action="reboot" name="bla02-10"/>
</method>
</fence>
</clusternode>
<clusternode name="blh02-11" nodeid="4" votes="1">
<fence>
<method name="1">
<device action="reboot" name="bla02-11"/>
</method>
</fence>
</clusternode>
</clusternodes>
<fencedevices>
<fencedevice agent="fence_ilo" ipaddr="10.x.x.45" login="iloadmin" name="bla02-10" passwd="ffff"/>
<fencedevice agent="fence_ilo" ipaddr="10.x.x.46" login="iloadmin" name="bla02-11" passwd="fff"/>
<fencedevice agent="fence_ilo" ipaddr="10.x.x.48" login="iloadmin" name="bla02-13" passwd="fff"/>
<fencedevice agent="fence_ilo" ipaddr="10.x.x.49" login="iloadmin" name="bla02-14" passwd="fff"/>
</fencedevices>
<rm>
<pvevm autostart="1" vmid="104"/>
<pvevm autostart="1" vmid="302"/>
<pvevm autostart="1" vmid="304"/>
<pvevm autostart="1" vmid="306"/>
<pvevm autostart="1" vmid="307"/>
<pvevm autostart="1" vmid="308"/>
<pvevm autostart="1" vmid="309"/>
<pvevm autostart="1" vmid="311"/>
<pvevm autostart="1" vmid="312"/>
<pvevm autostart="1" vmid="310"/>
<pvevm autostart="1" vmid="315"/>
<pvevm autostart="1" vmid="314"/>
</rm>
</cluster>
 
You set expected_votes to 1, which is simply wrong and very dangerous.

Code:
<cman expected_votes="1" keyfile="/var/lib/pve-cluster/corosync.authkey"/>
 
Thanks for your prompt response

I set it up like that when I tested the cluster with only 2 nodes

Am I right to assume that its value should be n -1 wher n is the number of servers in the cluster ?

Steven
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!