Quourum dissolved.

pant-dm

New Member
Apr 16, 2015
27
0
1
Hello.

I have 4 node cluster with FC LVM shared storage. After rebooting one of the nodes, I lost the cluster.
/var/log/cluster/corosync.log and /var/log/cluster/rgmanager.log says 'Quourum dissolved'.

In accordance with recommendations https://pve.proxmox.com/pipermail/pve-user/2015-April/008682.html I tried to restart the service cman. But on two of the four nodes this did not work.

root@vnode1:~#
root@vnode1:~# service cman restart
Stopping cluster:
Leaving fence domain... found dlm lockspace /sys/kernel/dlm/rgmanager
fence_tool: cannot leave due to active systems
[FAILED]
root@vnode1:~#

This looks like a big problem, requiring a server restart with a hypervisor.
https://forum.proxmox.com/threads/problem-with-cman-and-rgmanager.16319/

All virtual machines work fine and I do not want to reboot the proxmox servers, causing them to stop.

Is there any way to solve this problem without having to reboot the servers. For example such http://www.rsinfominds.com/leaving-fence-domain-found-dlm-lockspace-syskerneldlmrgmanager/

Please somebody help!!!

Sincerely, pant-dm.

P.S.: Of course, I am ready to provide all the necessary additional information.
 
Post more info about your setup, e.g. the output of:

> pveversion -v
 
Hooray! Someone became interested!

So in detail:

IBM BladeCenter H
4 - BladeServers HS23 (LACP to Nortel-IBM Switch)
Fibre Channel Storage IBM DS 3520 (LVM Shared Storage)

pveversion -v output :

problem node (cman can't restart)
root@vnode1:~#
root@vnode1:~# pveversion -v
proxmox-ve-2.6.32: 3.4-150 (running kernel: 2.6.32-37-pve)
pve-manager: 3.4-3 (running version: 3.4-3/2fc72fee)
pve-kernel-2.6.32-37-pve: 2.6.32-150
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.7-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.10-2
pve-cluster: 3.0-16
qemu-server: 3.4-3
pve-firmware: 1.1-4
libpve-common-perl: 3.0-24
libpve-access-control: 3.0-16
libpve-storage-perl: 3.0-32
pve-libspice-server1: 0.12.4-3
vncterm: 1.1-8
vzctl: 4.0-1pve6
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 2.2-8
ksm-control-daemon: 1.1-1
glusterfs-client: 3.5.2-1
root@vnode1:~#

no problem node (cman restart correctly)
root@vnode2:~#
root@vnode2:~# pveversion -v
proxmox-ve-2.6.32: 3.4-150 (running kernel: 2.6.32-37-pve)
pve-manager: 3.4-3 (running version: 3.4-3/2fc72fee)
pve-kernel-2.6.32-37-pve: 2.6.32-150
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.7-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.10-2
pve-cluster: 3.0-16
qemu-server: 3.4-3
pve-firmware: 1.1-4
libpve-common-perl: 3.0-24
libpve-access-control: 3.0-16
libpve-storage-perl: 3.0-32
pve-libspice-server1: 0.12.4-3
vncterm: 1.1-8
vzctl: 4.0-1pve6
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 2.2-8
ksm-control-daemon: 1.1-1
glusterfs-client: 3.5.2-1
root@vnode2:~#

problem node (cman can't restart)
root@vnode3:~#
root@vnode3:~# pveversion -v
proxmox-ve-2.6.32: 3.4-150 (running kernel: 2.6.32-37-pve)
pve-manager: 3.4-8 (running version: 3.4-8/5f8f4e78)
pve-kernel-2.6.32-37-pve: 2.6.32-150
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.7-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.10-3
pve-cluster: 3.0-18
qemu-server: 3.4-6
pve-firmware: 1.1-4
libpve-common-perl: 3.0-24
libpve-access-control: 3.0-16
libpve-storage-perl: 3.0-33
pve-libspice-server1: 0.12.4-3
vncterm: 1.1-8
vzctl: 4.0-1pve6
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 2.2-11
ksm-control-daemon: 1.1-1
glusterfs-client: 3.5.2-1
root@vnode3:~#

no problem node (cman restart correctly)
root@vnode4:~#
root@vnode4:~# pveversion -v
proxmox-ve-2.6.32: 3.4-150 (running kernel: 2.6.32-37-pve)
pve-manager: 3.4-3 (running version: 3.4-3/2fc72fee)
pve-kernel-2.6.32-37-pve: 2.6.32-150
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.7-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.10-2
pve-cluster: 3.0-16
qemu-server: 3.4-3
pve-firmware: 1.1-4
libpve-common-perl: 3.0-24
libpve-access-control: 3.0-16
libpve-storage-perl: 3.0-32
pve-libspice-server1: 0.12.4-3
vncterm: 1.1-8
vzctl: 4.0-1pve6
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 2.2-8
ksm-control-daemon: 1.1-1
glusterfs-client: 3.5.2-1
root@vnode4:~#

Thank you for your attention to my problem.
 
Last edited:
Thanks for the advice. But after carefully reading the article https://pve.proxmox.com/wiki/Upgrade_from_3.x_to_4.0 on migration from version 3 to version 4, I found that the necessary conditions are:

1. healthy cluster
2. no VM or CT running (note: VM live migration from 3.4 to 4.x node or vice versa NOT possible)

Now, how can I restore the cluster and the quorum? And hence restore the ability to backup and migrate to new version.

May be have ideas how to return a quorum and a cluster?
 
Updating the packages will also require a restart of the node. Therefore, without rebooting and stopping virtual machines, I can not do without. How sad...

Prepared a plan for stopping virtual machines and launched the procedure for its negotiation. I hope in a week or two I will be given such an opportunity.

I really hoped that there would be a way to restore the cluster quorum without restarting the node.

P.S .: maybe there is a way to see what's preventing the cman / rgmanager restart and fix it without stopping the virtual machines and restarting the node.

Thank you.
 
If you know how, its possible. But as it looks this is productive system I suggest a deeper analysis of the real problem, e.g. from our enterprise support team (if you have subscription for your hosts).