Quourum dissolved.

pant-dm · May 10, 2017

Hello.

I have 4 node cluster with FC LVM shared storage. After rebooting one of the nodes, I lost the cluster.
/var/log/cluster/corosync.log and /var/log/cluster/rgmanager.log says 'Quourum dissolved'.

In accordance with recommendations https://pve.proxmox.com/pipermail/pve-user/2015-April/008682.html I tried to restart the service cman. But on two of the four nodes this did not work.

root@vnode1:~#
root@vnode1:~# service cman restart
Stopping cluster:
Leaving fence domain... found dlm lockspace /sys/kernel/dlm/rgmanager
fence_tool: cannot leave due to active systems
[FAILED]
root@vnode1:~#

This looks like a big problem, requiring a server restart with a hypervisor.
https://forum.proxmox.com/threads/problem-with-cman-and-rgmanager.16319/

All virtual machines work fine and I do not want to reboot the proxmox servers, causing them to stop.

Is there any way to solve this problem without having to reboot the servers. For example such http://www.rsinfominds.com/leaving-fence-domain-found-dlm-lockspace-syskerneldlmrgmanager/

Please somebody help!!!

Sincerely, pant-dm.

P.S.: Of course, I am ready to provide all the necessary additional information.

pant-dm · May 11, 2017

Any ideas?

tom · May 11, 2017

Post more info about your setup, e.g. the output of:

> pveversion -v

pant-dm · May 11, 2017

Hooray! Someone became interested!

So in detail:

IBM BladeCenter H
4 - BladeServers HS23 (LACP to Nortel-IBM Switch)
Fibre Channel Storage IBM DS 3520 (LVM Shared Storage)

pveversion -v output :

problem node (cman can't restart)

root@vnode1:~#
root@vnode1:~# pveversion -v
proxmox-ve-2.6.32: 3.4-150 (running kernel: 2.6.32-37-pve)
pve-manager: 3.4-3 (running version: 3.4-3/2fc72fee)
pve-kernel-2.6.32-37-pve: 2.6.32-150
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.7-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.10-2
pve-cluster: 3.0-16
qemu-server: 3.4-3
pve-firmware: 1.1-4
libpve-common-perl: 3.0-24
libpve-access-control: 3.0-16
libpve-storage-perl: 3.0-32
pve-libspice-server1: 0.12.4-3
vncterm: 1.1-8
vzctl: 4.0-1pve6
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 2.2-8
ksm-control-daemon: 1.1-1
glusterfs-client: 3.5.2-1
root@vnode1:~#

no problem node (cman restart correctly)

root@vnode2:~#
root@vnode2:~# pveversion -v
proxmox-ve-2.6.32: 3.4-150 (running kernel: 2.6.32-37-pve)
pve-manager: 3.4-3 (running version: 3.4-3/2fc72fee)
pve-kernel-2.6.32-37-pve: 2.6.32-150
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.7-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.10-2
pve-cluster: 3.0-16
qemu-server: 3.4-3
pve-firmware: 1.1-4
libpve-common-perl: 3.0-24
libpve-access-control: 3.0-16
libpve-storage-perl: 3.0-32
pve-libspice-server1: 0.12.4-3
vncterm: 1.1-8
vzctl: 4.0-1pve6
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 2.2-8
ksm-control-daemon: 1.1-1
glusterfs-client: 3.5.2-1
root@vnode2:~#

problem node (cman can't restart)

root@vnode3:~#
root@vnode3:~# pveversion -v
proxmox-ve-2.6.32: 3.4-150 (running kernel: 2.6.32-37-pve)
pve-manager: 3.4-8 (running version: 3.4-8/5f8f4e78)
pve-kernel-2.6.32-37-pve: 2.6.32-150
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.7-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.10-3
pve-cluster: 3.0-18
qemu-server: 3.4-6
pve-firmware: 1.1-4
libpve-common-perl: 3.0-24
libpve-access-control: 3.0-16
libpve-storage-perl: 3.0-33
pve-libspice-server1: 0.12.4-3
vncterm: 1.1-8
vzctl: 4.0-1pve6
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 2.2-11
ksm-control-daemon: 1.1-1
glusterfs-client: 3.5.2-1
root@vnode3:~#

no problem node (cman restart correctly)

root@vnode4:~#
root@vnode4:~# pveversion -v
proxmox-ve-2.6.32: 3.4-150 (running kernel: 2.6.32-37-pve)
pve-manager: 3.4-3 (running version: 3.4-3/2fc72fee)
pve-kernel-2.6.32-37-pve: 2.6.32-150
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.7-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.10-2
pve-cluster: 3.0-16
qemu-server: 3.4-3
pve-firmware: 1.1-4
libpve-common-perl: 3.0-24
libpve-access-control: 3.0-16
libpve-storage-perl: 3.0-32
pve-libspice-server1: 0.12.4-3
vncterm: 1.1-8
vzctl: 4.0-1pve6
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 2.2-8
ksm-control-daemon: 1.1-1
glusterfs-client: 3.5.2-1
root@vnode4:~#

Thank you for your attention to my problem.

tom · May 11, 2017

Proxmox VE 3.4 is out of support since more than one year, also security updates are not available any more.

So you should also plan the upgrade path to 4.4 or even to 5.0 (not yet released, currently beta).

see also:
https://pve.proxmox.com/wiki/Category:Proxmox_VE_3.x

And:
https://pve.proxmox.com/wiki/Upgrade_from_3.x_to_4.0

pant-dm · May 11, 2017

Thanks for the advice. But after carefully reading the article https://pve.proxmox.com/wiki/Upgrade_from_3.x_to_4.0 on migration from version 3 to version 4, I found that the necessary conditions are:

1. healthy cluster
2. no VM or CT running (note: VM live migration from 3.4 to 4.x node or vice versa NOT possible)

Now, how can I restore the cluster and the quorum? And hence restore the ability to backup and migrate to new version.

May be have ideas how to return a quorum and a cluster?

tom · May 11, 2017

Of course, you should fix current issues first.

Your packages inside the 3.x family are also outdated, check:

https://pve.proxmox.com/wiki/Downlo...Proxmox_Virtual_Environment_3.x_to_latest_3.4

pant-dm · May 12, 2017

Updating the packages will also require a restart of the node. Therefore, without rebooting and stopping virtual machines, I can not do without. How sad...

Prepared a plan for stopping virtual machines and launched the procedure for its negotiation. I hope in a week or two I will be given such an opportunity.

I really hoped that there would be a way to restore the cluster quorum without restarting the node.

P.S .: maybe there is a way to see what's preventing the cman / rgmanager restart and fix it without stopping the virtual machines and restarting the node.

Thank you.

tom · May 12, 2017

If you know how, its possible. But as it looks this is productive system I suggest a deeper analysis of the real problem, e.g. from our enterprise support team (if you have subscription for your hosts).

Search

Search

Quourum dissolved.

pant-dm

New Member

pant-dm

New Member

tom

Proxmox Staff Member

pant-dm

New Member

tom

Proxmox Staff Member

pant-dm

New Member

tom

Proxmox Staff Member

pant-dm

New Member

tom

Proxmox Staff Member

We value your privacy