4 Node cluster fails to gain quorum on a node that reboots

telemagic

New Member
Jan 20, 2013
18
2
1
Hi guys.I have a 4 node HA cluster with a FC array as backend storage.This is running on 4 x HP BL460c G1 blade servers in a HP C7000 Chassis.Each node has 2 x quad core xeon with 16gb ram. The nodes had 2 x 72gb sas drives which they boot from and a 4tb FC array for VM/CT storage.Each node has a pair of gig network adapters each connected to a seperate cisco gig switch, these are linked together via a trunk between the 2 switches.I have bond0 configure with eth0 & eth1 using balance-tlb.Everything works fine. However when I go to reboot a node for updates etc. I run /etc/init.d/rgmanager stop and that works correctly, the node reboots.When it boots back up when cman loads at boot time it gets waiting for quorum, sits there for a minute and fails.This causes rgmanager to fail to start and cluster communication to not come up properly.I see this in dmesgdlm: no local IP address has been setdlm: cannot start dlm lowcomms -107dlm: no local IP address has been setdlm: cannot start dlm lowcomms -107If I run the following/etc/init.d/cman restart/etc/init.d/pve-cluster restart/etc/init.d rgmanager startEverything is fine.This seems to be an issue of the network not coming up in time for cman to gain quorum on boot but not sure.Any ideas? If it is network not coming up in time can anyone tell me how to delay cman on boot?Regards
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!