Two node configuration shut down both servers

Guido Veenstra

New Member
Feb 25, 2016
2
0
1
32
Hi,

Previous week the power in ouroffice failed and after it came back, we had some client computers with broken hardware. Our server hardware looked fine at that moment. We started the nodes and everything was working as before.

Now a week later the servers shut down unexpectedly. And my question is how this can happen.

Our setup is as follow:

2 SuperMicro servers with 64 GB ECC RAM and a fiber connection between them to sync the network raid arrays.

The servers are also connected to the client network over an UTP cable. This connection is also used to connect to IPMI to get the system state.

Server 1: 192.168.0.201 (IPMI 192.168.0.200) over eth0 and eth1 (both on switch 1)
Server 2: 192.168.0.203 (IPMI 192.168.0.202) over eth0 and eth1 (both on switch 1)

Data: 10.0.0.1 and 10.0.0.2 (no switch, direct fiber connection between the 2 servers)

Logs:

Proxmox 1:

Feb 25 09:09:08 prox1 corosync[3828]: [TOTEM ] A processor failed, forming new configuration.
Feb 25 09:09:10 prox1 corosync[3828]: [CLM ] CLM CONFIGURATION CHANGE
Feb 25 09:09:10 prox1 corosync[3828]: [CLM ] New Configuration:
Feb 25 09:09:10 prox1 corosync[3828]: [CLM ] #011r(0) ip(192.168.0.201)
Feb 25 09:09:10 prox1 corosync[3828]: [CLM ] Members Left:
Feb 25 09:09:10 prox1 corosync[3828]: [CLM ] #011r(0) ip(192.168.0.203)
Feb 25 09:09:10 prox1 corosync[3828]: [CLM ] Members Joined:
Feb 25 09:09:10 prox1 corosync[3828]: [QUORUM] Members[1]: 1
Feb 25 09:09:10 prox1 corosync[3828]: [CLM ] CLM CONFIGURATION CHANGE
Feb 25 09:09:10 prox1 corosync[3828]: [CLM ] New Configuration:
Feb 25 09:09:10 prox1 corosync[3828]: [CLM ] #011r(0) ip(192.168.0.201)
Feb 25 09:09:10 prox1 corosync[3828]: [CLM ] Members Left:
Feb 25 09:09:10 prox1 corosync[3828]: [CLM ] Members Joined:
Feb 25 09:09:10 prox1 corosync[3828]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
Feb 25 09:09:10 prox1 kernel: dlm: closing connection to node 2
Feb 25 09:09:10 prox1 rgmanager[4199]: State change: prox2 DOWN
Feb 25 09:09:10 prox1 corosync[3828]: [CPG ] chosen downlist: sender r(0) ip(192.168.0.201) ; members(old:2 left:1)
Feb 25 09:09:10 prox1 pmxcfs[3596]: [dcdb] notice: members: 1/3596
Feb 25 09:09:10 prox1 pmxcfs[3596]: [dcdb] notice: members: 1/3596
Feb 25 09:09:10 prox1 corosync[3828]: [MAIN ] Completed service synchronization, ready to provide service.
Feb 25 09:09:10 prox1 fenced[3906]: fencing node prox2
Feb 25 09:09:10 prox1 rgmanager[928142]: [pvevm] VM 120 is running
Feb 25 09:09:10 prox1 rgmanager[928162]: [pvevm] VM 110 is running
Feb 25 09:09:10 prox1 kernel: eth0: received packet with own address as source address
Feb 25 09:09:10 prox1 fence_ipmilan: Parse error: Ignoring unknown option 'nodename=prox2
Feb 25 09:09:14 prox1 shutdown[928210]: shutting down for system halt

Proxmox 2:

Feb 25 09:09:08 prox2 corosync[3804]: [TOTEM ] A processor failed, forming new configuration.
Feb 25 09:09:10 prox2 corosync[3804]: [CLM ] CLM CONFIGURATION CHANGE
Feb 25 09:09:10 prox2 corosync[3804]: [CLM ] New Configuration:
Feb 25 09:09:10 prox2 corosync[3804]: [CLM ] #011r(0) ip(192.168.0.203)
Feb 25 09:09:10 prox2 corosync[3804]: [CLM ] Members Left:
Feb 25 09:09:10 prox2 corosync[3804]: [CLM ] #011r(0) ip(192.168.0.201)
Feb 25 09:09:10 prox2 corosync[3804]: [CLM ] Members Joined:
Feb 25 09:09:10 prox2 corosync[3804]: [QUORUM] Members[1]: 2
Feb 25 09:09:10 prox2 corosync[3804]: [CLM ] CLM CONFIGURATION CHANGE
Feb 25 09:09:10 prox2 corosync[3804]: [CLM ] New Configuration:
Feb 25 09:09:10 prox2 corosync[3804]: [CLM ] #011r(0) ip(192.168.0.203)
Feb 25 09:09:10 prox2 corosync[3804]: [CLM ] Members Left:
Feb 25 09:09:10 prox2 corosync[3804]: [CLM ] Members Joined:
Feb 25 09:09:10 prox2 corosync[3804]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
Feb 25 09:09:10 prox2 rgmanager[4156]: State change: prox1 DOWN
Feb 25 09:09:10 prox2 kernel: dlm: closing connection to node 1
Feb 25 09:09:10 prox2 corosync[3804]: [CPG ] chosen downlist: sender r(0) ip(192.168.0.203) ; members(old:2 left:1)
Feb 25 09:09:10 prox2 pmxcfs[3526]: [dcdb] notice: members: 2/3526
Feb 25 09:09:10 prox2 pmxcfs[3526]: [dcdb] notice: members: 2/3526
Feb 25 09:09:10 prox2 corosync[3804]: [MAIN ] Completed service synchronization, ready to provide service.
Feb 25 09:09:10 prox2 fenced[3862]: fencing node prox1
Feb 25 09:09:10 prox2 fence_ipmilan: Parse error: Ignoring unknown option 'nodename=prox1
Feb 25 09:09:10 prox2 kernel: eth0: received packet with own address as source address
Feb 25 09:09:10 prox2 corosync[3804]: [CLM ] CLM CONFIGURATION CHANGE
Feb 25 09:09:10 prox2 corosync[3804]: [CLM ] New Configuration:
Feb 25 09:09:10 prox2 corosync[3804]: [CLM ] #011r(0) ip(192.168.0.203)
Feb 25 09:09:10 prox2 corosync[3804]: [CLM ] Members Left:
Feb 25 09:09:10 prox2 corosync[3804]: [CLM ] Members Joined:
Feb 25 09:09:10 prox2 corosync[3804]: [CLM ] CLM CONFIGURATION CHANGE
Feb 25 09:09:10 prox2 corosync[3804]: [CLM ] New Configuration:
Feb 25 09:09:10 prox2 corosync[3804]: [CLM ] #011r(0) ip(192.168.0.203)
Feb 25 09:09:10 prox2 corosync[3804]: [CLM ] Members Left:
Feb 25 09:09:10 prox2 corosync[3804]: [CLM ] Members Joined:
Feb 25 09:09:10 prox2 corosync[3804]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
Feb 25 09:09:10 prox2 corosync[3804]: [CPG ] chosen downlist: sender r(0) ip(192.168.0.203) ; members(old:1 left:0)
Feb 25 09:09:10 prox2 corosync[3804]: [MAIN ] Completed service synchronization, ready to provide service.
Feb 25 09:09:14 prox2 shutdown[1038915]: shutting down for system halt

Who can help us? Can it be that we have a server hardware failure?
Thanks for any help!
 
Hi,
why do your Host fence your machine, could it be that the network has problems?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!