cman does not stop during normal operation if quorum/connection is lost (that is new to me).
The forum lost my last 2 posts... so forgive me if it appears and then this is duplicate.
Here's what it looked like yesterday:
Code:
root@bcvm2:~# clustat
Could not connect to CMAN: Connection refused
root@bcvm2:~# /etc/init.d/cman status
Found stale pid file
The first node that dropped out had this in a log:
Code:
# gunzip -c /var/log/cluster/corosync.log.1.gz | less
[...]
Aug 29 19:19:32 corosync [TOTEM ] FAILED TO RECEIVE
[...]
So I believe that some random packet failed and caused the node cluster communication to fail, and then rather than retrying, cman crashed or ended intentionally. Based on your post, I guess it wasn't intentional.
And then when trying to restart cman on a server (to regain quorum on the first server that was still connected).
Code:
root@bcvm2:~# /etc/init.d/cman start
Starting cluster:
Checking if cluster has been disabled at boot... [ OK ]
Checking Network Manager... [ OK ]
Global setup... [ OK ]
Loading kernel modules... [ OK ]
Mounting configfs... [ OK ]
Starting cman... [ OK ]
Starting qdiskd... [ OK ]
Waiting for quorum... Timed-out waiting for cluster
[FAILED]
root@bcvm2:~# clustat
Cluster Status for bcproxmox1 @ Thu Aug 30 10:27:40 2012
Member Status: Inquorate
Member Name ID Status
------ ---- ---- ------
bcvm2 1 Online, Local
bcvm3 2 Offline
bcvm1 3 Offline
/dev/loop1 0 Offline, Quorum Disk
(And I'm guessing you'll have something to say about a loop device qdisk (for NFS), but it seems free of any side effects, and I can't use iSCSI without adding a new server; and the first time I had this exact same problem, I had no qdisk or loop device)
And I have lots more logs to share, if you'd like to deal with this in another thread.