CMAN fails after update to 3.3

proxtest · Sep 30, 2014

Aijaijai! I do an update to the new version on 28 sep and now both nodes have a fail when booting:

Sun Sep 28 10:42:11 2014: Starting cluster:
Sun Sep 28 10:42:11 2014: Checking if cluster has been disabled at boot... [ OK ]
Sun Sep 28 10:42:11 2014: Checking Network Manager... [ OK ]
Sun Sep 28 10:42:11 2014: Global setup... [ OK ]
Sun Sep 28 10:42:11 2014: Loading kernel modules... [ OK ]
Sun Sep 28 10:42:11 2014: Mounting configfs... [ OK ]
Sun Sep 28 10:42:11 2014: Starting cman... tempfile:13: element device: Relax-NG validity error : Invalid attribute nodename for element device
Sun Sep 28 10:42:16 2014: Relax-NG validity error : Extra element fence in interleave
Sun Sep 28 10:42:16 2014: tempfile:6: element clusternodes: Relax-NG validity error : Element clusternode failed to validate content
Sun Sep 28 10:42:16 2014: tempfile:7: element clusternode: Relax-NG validity error : Element clusternodes has extra content: clusternode
Sun Sep 28 10:42:16 2014: Configuration fails to validate
Sun Sep 28 10:42:16 2014: [ OK ]
Sun Sep 28 10:42:16 2014: Starting qdiskd... [ OK ]
Sun Sep 28 10:42:26 2014: Waiting for quorum... [ OK ]
Sun Sep 28 10:42:26 2014: Starting fenced... [ OK ]
Sun Sep 28 10:42:26 2014: Starting dlm_controld... [ OK ]
Sun Sep 28 10:42:27 2014: Tuning DLM kernel config... [ OK ]
Sun Sep 28 10:42:27 2014: Unfencing self... [ OK ]
Sun Sep 28 10:42:27 2014: Joining fence domain... [ OK ]
Sun Sep 28 10:42:28 2014: Starting PVE firewall logger: pvefw-logger.
Sun Sep 28 10:42:29 2014: Starting OpenVZ: ..done
Sun Sep 28 10:42:29 2014: Bringing up interface venet0: ..done
Sun Sep 28 10:42:29 2014: Starting Cluster Service Manager: [ OK ]

root@node1:~# pveversion --verbose
proxmox-ve-2.6.32: 3.2-136 (running kernel: 2.6.32-32-pve)
pve-manager: 3.3-1 (running version: 3.3-1/a06c9f73)
pve-kernel-2.6.32-32-pve: 2.6.32-136
pve-kernel-2.6.32-31-pve: 2.6.32-132
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.7-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.10-1
pve-cluster: 3.0-15
qemu-server: 3.1-34
pve-firmware: 1.1-3
libpve-common-perl: 3.0-19
libpve-access-control: 3.0-15
libpve-storage-perl: 3.0-23
pve-libspice-server1: 0.12.4-3
vncterm: 1.1-8
vzctl: 4.0-1pve6
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 2.1-5
ksm-control-daemon: 1.1-1
glusterfs-client: 3.5.2-1

Before the update everything was fine and i think now everything still running but whats about the failure when starting the nodes?

root@node1:~# /etc/init.d/cman status
cluster is running.

My cluster.conf

root@node1:~# cat /etc/pve/cluster.conf
<?xml version="1.0"?>
<cluster config_version="67" name="BlDMZ">
<cman expected_votes="3" keyfile="/var/lib/pve-cluster/corosync.authkey"/>
<quorumd allow_kill="0" interval="1" label="proxmoxquorum" tko="10" votes="1"/>
<totem token="54000"/>
<clusternodes>
<clusternode name="node2" nodeid="1" votes="1">
<fence>
<method name="1">
<device name="fenceNode2"/>
</method>
<method name="2">
<device name="human" nodename="node2"/>
</method>
</fence>
</clusternode>
<clusternode name="node1" nodeid="2" votes="1">
<fence>
<method name="1">
<device name="fenceNode1"/>
</method>
<method name="2">
<device name="human" nodename="node1"/>
</method>
</fence>
</clusternode>
</clusternodes>
<fencedevices>
<fencedevice agent="fence_ipmilan" auth="md5" ipaddr="192.168.1.11" login="clusterp" name="fenceNode1" passwd="-----" power_wait="5"/>
<fencedevice agent="fence_ipmilan" auth="md5" ipaddr="192.168.1.12" login="clusterp" name="fenceNode2" passwd="-----" power_wait="5"/>
<fencedevice agent="fence_manual" name="human"/>
</fencedevices>
<rm>
<pvevm autostart="1" vmid="101"/>
</rm>
</cluster>

dietmar · Sep 30, 2014

Code:

# ccs_config_validate -f t.cfg 
tempfile:13: element device: Relax-NG validity error : Invalid attribute nodename for element device
Relax-NG validity error : Extra element fence in interleave
tempfile:6: element clusternodes: Relax-NG validity error : Element clusternode failed to validate content
tempfile:7: element clusternode: Relax-NG validity error : Element clusternodes has extra content: clusternode
Configuration fails to validate

so the problem is

Code:

<device name="human" nodename="node2"/>
and
<device name="human" nodename="node1"/>

I would remove that whole fence_manual thing (what is the purpose of those entries?)

proxtest · Sep 30, 2014

dietmar said:
so the problem is

Code:

<device name="human" nodename="node2"/> and <device name="human" nodename="node1"/>

I would remove that whole fence_manual thing (what is the purpose of those entries?)

I can't exactly remember, fencing was working fine when i config and try it. I think it was the backup if IPMI-Fencing fails, i think i remember some problems when automatic fencing not working, so if IPMI fails i can do it manual? It works when i try it! And i don't change anything since Aug 23!

root@node1:~# ls -la /etc/cluster/
insgesamt 12
drwxr-xr-x 2 root root 4096 Aug 23 15:00 .
drwxr-xr-x 104 root root 4096 Sep 28 10:39 ..
-rw-r----- 1 root root 1293 Aug 23 15:00 cluster.conf
root@node1:~#

dietmar · Sep 30, 2014

You can always do manual fencing (no configuration required). See

# man fence_ack_manual

proxtest · Oct 1, 2014

dietmar said:
You can always do manual fencing (no configuration required). See

# man fence_ack_manual

Ok i removed it today and it works. Hope never need fence_manual again.

Thanks.

Search

Search

CMAN fails after update to 3.3

proxtest

Active Member

dietmar

Proxmox Staff Member

proxtest

Active Member

dietmar

Proxmox Staff Member

proxtest

Active Member

We value your privacy