Switch fails on a node cluster

megap

New Member
Oct 1, 2014
20
0
1
Hello all.

I have a 3 node cluster HA working with idrac/IPMI fencing. It is working when some node has a problem.

At this moment, for testing, 3 nodes are connected to one switch and I'm testing what happens if all is working and suddenly switch is powered off.

This is the state of the cluster before removing power to the switch:

Code:
 clustatCluster Status for hcuscluster @ Wed May 27 07:29:24 2015
Member Status: Quorate


 Member Name                             ID   Status
 ------ ----                             ---- ------
 node1                                   1 Online, Local, rgmanager
 node2                                   2 Online, rgmanager
 node3                                   3 Online, rgmanager


 Service Name                   Owner (Last)                   State
 ------- ----                   ----- ------                   -----
 pvevm:100                      node2                      started

PVEVM has 192.168.1.144 and ping it's working:

Code:
C:\Users\>ping 192.168.1.144

Haciendo ping a 192.168.1.144 con 32 bytes de datos:
Respuesta desde 192.168.1.144: bytes=32 tiempo<1m TTL=64
Respuesta desde 192.168.1.144: bytes=32 tiempo<1m TTL=64
Respuesta desde 192.168.1.144: bytes=32 tiempo<1m TTL=64
Respuesta desde 192.168.1.144: bytes=32 tiempo<1m TTL=64

At this point I cut the power to the switch and wait for some minute and power on again the switch.

The new info on clustat is this:
Code:
clustatTimed out waiting for a response from Resource Group Manager
Cluster Status for hcuscluster @ Wed May 27 07:49:44 2015
Member Status: Quorate


 Member Name                             ID   Status
 ------ ----                             ---- ------
 node1                                   1 Online, Local
 node2                                   2 Online
 node3                                   3 Online

And pvevm 100 not starts again,ping is not possible host because is not accesible. I can't start manually, too. Is not possible that VM goes online when the switch is power on again? .

At this point is not possible to restart cman:

Code:
# /etc/init.d/cman restartStopping cluster:
   Leaving fence domain... found dlm lockspace /sys/kernel/dlm/rgmanager
fence_tool: cannot leave due to active systems
[FAILED]

Not possible restart rgmanager, it hangs on:
Code:
~# /etc/init.d/rgmanager restart
Stopping Cluster Service Manager:

Is necessary to restart all nodes to get the inicial config and pvevm running?

Please, someone can help me How I can restart all services withouth restart nodes and how to pvevm goes online automated when switch is turn on again?

Thank you all.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!