Quorum lost when adding new 7th node

Dragonn

Member
May 23, 2020
21
4
23
Prague
Hello there,

I would like to ask for some help and guidance to debug issues, when our cluster lose corosync quorum (and reboots completely) when adding new node to cluster happened already twice on 7 node cluster (adding 4th and now with 7th node).

Deployment context:
  • every server is fresh Debian Buster installation + custom packages installation & environment configuration
  • custom kernel (based on 5.4.48)
  • installed package proxmox-ve=6.2-1 from pve-no-subscription repository
  • creating cluster (and joining nodes) always via pvecm command
  • corosync with 2 links (VLANs)
Unrelated context (as I suppose):
  • Dell blade servers, uplink to 2 switches in active-backup bonding, VLANs on top
  • using OpenVSwitch for PVE node networking (eno1+eno2 bonding -> vmbr0 -> two internal vlan interfaces - management and ceph storage)
  • all VMs registered in HA manager
  • VMs use external Ceph storage
  • pve-firewall not used, iptables stateful rules allows every traffic from both networks used in cluster (nothing should be dropped, but I cannot be 100% sure since we don't monitor that)
  • nodes in cluster was not added in alphabetical order
  • second corosync link was added after first cluster reboot (when adding 4th node) following this https://pve.proxmox.com/pve-docs/chapter-pvecm.html#_adding_redundant_links_to_an_existing_cluster
What have I done:
  • added server ovirt7 into cluster, using ovirt9 to join the cluster
  • command issued at 2020-09-01 08:22:21
Code:
-> pvecm add ovirt9 -link0 10.30.20.19 -link1 10.30.40.57
Please enter superuser (root) password for 'ovirt9': ********
Establishing API connection with host 'ovirt9'
The authenticity of host 'ovirt9' can't be established.
X509 SHA256 key fingerprint is 84:A8:E0:22:6E:01:8A:AF:4B:C8:A1:14:7A:40:02:C4:6A:72:0C:40:1E:5D:35:24:04:C0:86:85:BD:CF:0D:5C.
Are you sure you want to continue connecting (yes/no)? yes
Login succeeded.
check cluster join API version
Request addition of this node
Join request OK, finishing setup locally
stopping pve-cluster service
backup old database to '/var/lib/pve-cluster/backup/config-1598941349.sql.gz'
waiting for quorum... <<<EDIT: here it stucked until whole cluster rebooted>>> OK
(re)generate node files
generate new node certificate
merge authorized SSH keys and known hosts
generated new node certificate, restart pveproxy and pvedaemon services
successfully added node 'ovirt7' to cluster.

What happened:
  • node added into cluster
  • corosync configuration was updated, but never reached the quorum
  • my SSH connection to few servers was alive, servers were able to ping each other without any packetloss with minimal latency (<1ms), pvecm status was timing out (returned without any output)
  • every node except new one rebooted due to fencing watchdog
  • networking logs from switches shows only Link down & Link up due to reboot
Logs and configs:
  • current version of /etc/corosync/corosync.conf -- see attachment
  • diff against previous version of corosync.conf (got from internal backups):
Code:
--- a/etc/corosync/corosync.conf
+++ b/etc/corosync/corosync.conf
@@ -11,6 +11,13 @@ nodelist {
     ring0_addr: 10.30.20.137
     ring1_addr: 10.30.40.60
   }
+  node {
+    name: ovirt7
+    nodeid: 7
+    quorum_votes: 1
+    ring0_addr: 10.30.20.19
+    ring1_addr: 10.30.40.57
+  }
   node {
     name: ovirt8
     nodeid: 6
@@ -54,7 +61,7 @@ quorum {

totem {
   cluster_name: ovirt
-  config_version: 14
+  config_version: 15
   interface {
     linknumber: 0
   }

  • syslog from server ovirt9 (which new node was connecting to) -- see attachment
  • syslog from server ovirt7 (new one) -- see attachment

FInal words and thoughts:
  • I assume you will advice me to use physicaly separated corosync links, but I am unable to achieve this with my available hardware. I am using it this way, because it's possible we will split both vlans on way from chassis (the only common link will be server NIC and switch in blade chassis)
  • since this is fresh event from today, I might be able to gather more details from servers is needed
  • I would be really greatful for any comment, advice or guidance, since I don't understand what happened. As far as I know everything went typical way, but somewhy corosync was unable to get traffic.
  • Even corosync complaining about address changes happened when adding 6th node and everything went well

Thank you everyone who read the whole post until here :)
 

Attachments

  • ovirt9_syslog.2020-09-01-cut.log
    7.8 KB · Views: 5
  • corosync.conf.txt
    1.1 KB · Views: 6
  • ovirt7_syslog.2020-09-01.cut.log
    33.1 KB · Views: 3
Last edited:
SInce you run this in a blade center where you don't have many chances to use separate physical links for corosync, you should try to configure QOS on the network in the blade center to prioritize the corosync packets / VLAN.

To avoid the fencing of the cluster while applying changes to the corosync setup you can stop the two services that are responsible for HA.

The steps are the same as during an upgrade. First stop the LRM on all nodes and then the CRM.
Code:
systemctl stop pve-ha-lrm
systemctl stop pve-ha-crm

Once corosync does seem to run fine again, you can start these services in the same order. First the LRM on all nodes and then the CRM.


AFAIR there used to be a problem with corosync if a newly added node would not be at the end of the list alphabetically, resulting in what can be observed in the log where it mixes up the IP -> Node mapping.

Which version of corosync do you run? pveversion -v.
 
  • Like
Reactions: Moayad
Thank you for pointing out network QOS, I didn't think about that before and will definately consult that with network colleagues. Current network setup is not final, we plan upgrading to MLAG, but we are not there yet.

I am currently not afraid of filling up network, but I know it can happen really easily. We have fairly good network monitoring so we notice when packets are dropped somewhere in network.

Thank you also for idea with disabling HA, definately will try it when adding next new node.

Latest installed node has following versions:
Code:
A ovirt7[root](15:05:37)-(~)
-> pveversion -v
proxmox-ve: 6.2-1 (running kernel: 5.4.48-ls-2)
pve-manager: 6.2-11 (running version: 6.2-11/22fb4983)
pve-kernel-5.4: 6.2-5
pve-kernel-helper: 6.2-5
pve-kernel-5.4.55-1-pve: 5.4.55-1
ceph-fuse: 14.2.10-pve1
corosync: 3.0.4-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: residual config
ifupdown2: 3.0.0-1+pve2
libjs-extjs: 6.0.1-10
libknet1: 1.16-pve1
libproxmox-acme-perl: 1.0.4
libpve-access-control: 6.1-2
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.2-1
libpve-guest-common-perl: 3.1-2
libpve-http-server-perl: 3.0-6
libpve-storage-perl: 6.2-6
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-3
lxc-pve: 4.0.3-1
lxcfs: 4.0.3-pve3
novnc-pve: 1.1.0-1
openvswitch-switch: 2.12.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.2-10
pve-cluster: 6.1-8
pve-container: 3.1-13
pve-docs: 6.2-5
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-2
pve-firmware: 3.1-2
pve-ha-manager: 3.0-9
pve-i18n: 2.1-3
pve-qemu-kvm: 5.0.0-13
pve-xtermjs: 4.7.0-2
qemu-server: 6.2-14
smartmontools: 6.6-1
spiceterm: 3.1-1
vncterm: 1.6-2

I have noticed that currently software versions slightly differs between nodes, but corosync versions are consistent.

Sadly (as it can be seen in corosync.conf) I have been adding nodes into cluster almost perfectly in alphabeticaly backwards order and this issue occured only when adding nodes 4 and 7. That really confuses me, to be honest :) but I can imagine there is some race condition going underneath.
 
Update: It happened again and it's still broken right now.

History:
  • added nodes ovirt6 -> ovirt5
    • disabling HA as workaround, everything was okay
  • removed nodes ovirt97, ovirt98, ovirt99
    • nodes powered off, discs cleaned
    • removed nodes from cluster ovecm delnode ovirt97
    • restarted corosync on 2 random nodes

And now, added new node ovirt1 (with HA disabled) into working 6-node cluster, and corosync is unable to form quorum. Corosync configuration file is correctly synced over all nodes, old nodes complains in same way as I mentioned in my first post.

New node is still waiting until quorum is created
Code:
-> pvecm add ovirt5 -link0 10.30.20.142 -link1 10.30.40.51
Please enter superuser (root) password for 'ovirt5': ********
Establishing API connection with host 'ovirt5'
The authenticity of host 'ovirt5' can't be established.
X509 SHA256 key fingerprint is 58:78:E9:71:DD:D8:FD:FB:61:3A:79:91:9A:43:2D:D4:1E:09:A2:8F:A9:6E:8E:FD:52:1E:93:8E:FE:A2:BF:79.
Are you sure you want to continue connecting (yes/no)? yes
Login succeeded.
check cluster join API version
Request addition of this node
Join request OK, finishing setup locally
stopping pve-cluster service
backup old database to '/var/lib/pve-cluster/backup/config-1599631485.sql.gz'
waiting for quorum...

Other nodes looks like every 10s new node is initiating quorum vote, it's unable to join cluster. It's hapenning already for about 2 hours.
Code:
Sep  9 09:49:29 ovirt10 corosync[1520]:   [TOTEM ] A new membership (4.d4b) was formed. Members
Sep  9 09:49:29 ovirt10 corosync[1520]:   [QUORUM] Members[6]: 4 5 6 7 8 9
Sep  9 09:49:29 ovirt10 corosync[1520]:   [MAIN  ] Completed service synchronization, ready to provide service.
Sep  9 09:49:39 ovirt10 corosync[1520]:   [TOTEM ] A new membership (4.d4f) was formed. Members
Sep  9 09:49:39 ovirt10 corosync[1520]:   [QUORUM] Members[6]: 4 5 6 7 8 9
Sep  9 09:49:39 ovirt10 corosync[1520]:   [MAIN  ] Completed service synchronization, ready to provide service.
Sep  9 09:49:48 ovirt10 corosync[1520]:   [TOTEM ] A new membership (4.d53) was formed. Members
Sep  9 09:49:48 ovirt10 corosync[1520]:   [QUORUM] Members[6]: 4 5 6 7 8 9
Sep  9 09:49:48 ovirt10 corosync[1520]:   [MAIN  ] Completed service synchronization, ready to provide service.
Sep  9 09:49:57 ovirt10 corosync[1520]:   [TOTEM ] A new membership (4.d57) was formed. Members
Sep  9 09:49:57 ovirt10 corosync[1520]:   [QUORUM] Members[6]: 4 5 6 7 8 9
Sep  9 09:49:57 ovirt10 corosync[1520]:   [MAIN  ] Completed service synchronization, ready to provide service.
Sep  9 09:50:07 ovirt10 corosync[1520]:   [TOTEM ] A new membership (4.d5b) was formed. Members
Sep  9 09:50:07 ovirt10 corosync[1520]:   [QUORUM] Members[6]: 4 5 6 7 8 9
Sep  9 09:50:07 ovirt10 corosync[1520]:   [MAIN  ] Completed service synchronization, ready to provide service.
Sep  9 09:50:16 ovirt10 corosync[1520]:   [TOTEM ] A new membership (4.d5f) was formed. Members
Sep  9 09:50:16 ovirt10 corosync[1520]:   [QUORUM] Members[6]: 4 5 6 7 8 9
Sep  9 09:50:16 ovirt10 corosync[1520]:   [MAIN  ] Completed service synchronization, ready to provide service.
Sep  9 09:50:26 ovirt10 corosync[1520]:   [TOTEM ] A new membership (4.d63) was formed. Members
Sep  9 09:50:26 ovirt10 corosync[1520]:   [QUORUM] Members[6]: 4 5 6 7 8 9
Sep  9 09:50:26 ovirt10 corosync[1520]:   [MAIN  ] Completed service synchronization, ready to provide service.

I suppose that restarting of all corosync nodes would probably fix the situation, but I am currently looking for some long-term solution for this issue.

Is there anything I can do to give you more debug information? I have already created tcpdump from new node, but I don't know corosync and my wireshard version neither.


Idea to fix situation ... if corosync needs new nodes to be appended to end only, why configuration file is sorted by hostname? It could be sorted by node ID, reusing of IDs could be forbidden and cannot be choosen manualy. Is there any real usecase why nodeid should be choosen manually?


Thanks in advance :)
 
Sequential restart of corosync daemons in cluster fixed the quorum issue.

Thanks, this indeed works!

I'm using Proxmox VE 6.3-2 installed from ISO.

One of our cluster node's system partition driver failed. The node is not the one with the highest node id.

I deleted the node from the cluster, reinstalled the system on a new SSD.

When join the node to thecluster using pvecm add, I also got stuck on "waiting for quorum".
Restarting corosync on every cluster node seqentially (including the newly added one) finally let the join proceed and succeeded.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!