Last update break cman

G

geekitus

Guest
The last update break my cluster, any idea ?


root@prpweb14:~# pvecm status
cman_tool: Cannot open connection to cman, is it running ?

------------

root@titoo:~# /etc/init.d/cman restart
Stopping cluster:
Stopping dlm_controld... [ OK ]
Stopping fenced... [ OK ]
Stopping cman... [ OK ]
Unloading kernel modules... [ OK ]
Unmounting configfs... [ OK ]
Starting cluster:
Checking if cluster has been disabled at boot... [ OK ]
Checking Network Manager... [ OK ]
Global setup... [ OK ]
Loading kernel modules... [ OK ]
Mounting configfs... [ OK ]
Starting cman... Cannot find node name in cluster.conf
Unable to get the configuration
Cannot find node name in cluster.conf
cman_tool: corosync daemon didn't start Check cluster logs for details
[FAILED]

------------


root@toto:~# cat /etc/pve/cluster.conf
<?xml version="1.0"?>
<cluster name="ClusterProd" config_version="8">
<cman keyfile="/var/lib/pve-cluster/corosync.authkey">
</cman>
<clusternodes>
</clusternodes>
</cluster>


------------

root@toto:~# dpkg -l |grep pve
ii clvm 2.02.88-2pve1 Cluster LVM Daemon for lvm2
ii corosync-pve 1.4.1-1 Standards-based cluster framework (daemon and modules)
ii dmsetup 2:1.02.67-2pve1 Linux Kernel Device Mapper userspace library
ii fence-agents-pve 3.1.7-1 fence agents for redhat cluster suite
ii libcorosync4-pve 1.4.1-1 Standards-based cluster framework (libraries)
ii libdevmapper1.02.1 2:1.02.67-2pve1 Linux Kernel Device Mapper userspace library
ii libopenais3-pve 1.1.4-2 Standards-based cluster framework (libraries)
ii libpve-access-control 1.0-15 Proxmox VE access control library
ii libpve-common-perl 1.0-14 Proxmox VE base library
ii libpve-storage-perl 2.0-12 Proxmox VE storage management library
ii lvm2 2.02.88-2pve1 Linux Logical Volume Manager
ii openais-pve 1.1.4-2 Standards-based cluster framework (daemon and modules)
ii pve-cluster 1.0-23 Cluster Infrastructure for Proxmox Virtual Environment
ii pve-firmware 1.0-15 Binary firmware code for the pve-kernel
ii pve-headers-2.6.32-6-pve 2.6.32-55 The Proxmox PVE Kernel Headers
ii pve-kernel-2.6.32-6-pve 2.6.32-55 The Proxmox PVE Kernel Image
ii pve-kernel-2.6.32-7-pve 2.6.32-60 The Proxmox PVE Kernel Image
ii pve-manager 2.0-33 The Proxmox Virtual Environment
ii pve-qemu-kvm 1.0-3 Full virtualization on x86 hardware
ii redhat-cluster-pve 3.1.8-3 Red Hat cluster suite
ii resource-agents-pve 3.9.2-3 resource agents for redhat cluster suite
ii vzctl 3.0.30-2pve1 OpenVZ - server virtualization solution - control tools

---------------

tail /var/log/cluster/corosync.log

Feb 23 13:41:54 corosync [SERV ] Unloading all Corosync service engines.
Feb 23 13:41:54 corosync [SERV ] Service engine unloaded: corosync extended virtual synchrony service
Feb 23 13:41:54 corosync [SERV ] Service engine unloaded: corosync configuration service
Feb 23 13:41:54 corosync [SERV ] Service engine unloaded: corosync cluster closed process group service v1.01
Feb 23 13:41:54 corosync [SERV ] Service engine unloaded: corosync cluster config database access v1.01
Feb 23 13:41:54 corosync [SERV ] Service engine unloaded: corosync profile loading service
Feb 23 13:41:54 corosync [SERV ] Service engine unloaded: openais cluster membership service B.01.01
Feb 23 13:41:54 corosync [SERV ] Service engine unloaded: openais checkpoint service B.01.01
Feb 23 13:41:54 corosync [SERV ] Service engine unloaded: openais event service B.01.01
Feb 23 13:41:54 corosync [SERV ] Service engine unloaded: openais distributed locking service B.03.01
Feb 23 13:41:54 corosync [SERV ] Service engine unloaded: openais message service B.03.01
Feb 23 13:41:54 corosync [SERV ] Service engine unloaded: corosync CMAN membership service 2.90
Feb 23 13:41:54 corosync [SERV ] Service engine unloaded: corosync cluster quorum service v0.1
Feb 23 13:41:54 corosync [SERV ] Service engine unloaded: openais timer service A.01.01
Feb 23 13:41:54 corosync [MAIN ] Corosync Cluster Engine exiting with status 0 at main.c:1858.
 
You changed the hostname?

Thank for for reply,

I only change the hostname in my previous message, not on my server :)


-> I have tried to reboot an other node of my cluster : it work fine after reboot
-> I have tried to update the same node and reboot : it crash with the same error.

So i think that this error is caused by the update of last night.

Thank for for reply,
 
Thank for for reply,

-> I have tried to reboot an other node of my cluster : it work fine after reboot
-> I have tried to update the same node and reboot : it crash with the same error.

Well, this is not a crash, you get a detailed error message: "Cannot find node name in cluster.conf"

The /etc/pve/cluster.conf you posted does not even contain a single node!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!