Corosync issue when restarting some hypervisors

stalio

New Member
Jul 27, 2021
2
0
1
57
I have been using proxmox for more than 10 years now and encountered a number of issues that I was always more or less able to solve, today
I want to share a problem that is happening to me for which I can't find a solution. I think that might be a real problem that is worth investigating.

We have a 24 node pve 6 cluster, months ago updated from pve 5. As we had heating issues in our data center we powered off 11 nodes, leaving 13 of them up. Quorum was never lost.
Yesterday I wanted to turn on some of the hypervisors that were previously powered off but they can not join the corosync cluster anymore.

This the message i find on working corosync members:

Code:
Jul 27 09:34:48 hnode21 corosync[1993]:   [TOTEM ] Message received from 192.168.145.119 has bad magic number (probably sent by unencrypted Kronosnet).. Ignoring

This is the status of the "good" part of the cluster:

Code:
root@hnode21:~# pvecm status
Cluster information
-------------------
Name:             u-lite-v2
Config Version:   43
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Tue Jul 27 10:04:10 2021
Quorum provider:  corosync_votequorum
Nodes:            13
Node ID:          0x00000015
Ring ID:          1.a9e
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   24
Highest expected: 24
Total votes:      13
Quorum:           13 
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 192.168.145.102
0x00000002          1 192.168.145.103
0x00000003          1 192.168.145.101
0x00000004          1 192.168.145.100
0x00000006          1 192.168.145.106
0x00000008          1 192.168.145.120
0x00000009          1 192.168.145.118
0x0000000b          1 192.168.145.108
0x0000000d          1 192.168.145.110
0x0000000f          1 192.168.145.116
0x00000012          1 192.168.145.113
0x00000015          1 192.168.145.121 (local)
0x00000018          1 192.168.145.115


This is what a node I just powered on sees:

Code:
root@hnode19:~# pvecm status
Cluster information
-------------------
Name:             u-lite-v2
Config Version:   43
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Tue Jul 27 10:02:32 2021
Quorum provider:  corosync_votequorum
Nodes:            1
Node ID:          0x0000000a
Ring ID:          a.b0e
Quorate:          No

Votequorum information
----------------------
Expected votes:   24
Highest expected: 24
Total votes:      1
Quorum:           13 Activity blocked
Flags:           

Membership information
----------------------
    Nodeid      Votes Name
0x0000000a          1 192.168.145.119 (local)

Note that the Ring ID is different. The corosync.conf, renamed corosync.txt is attached. Same on all nodes.
 

Attachments

your config has

Code:
  cluster_name: u-lite-v2
  config_version: 43
  crypto_cipher: none
  crypto_hash: none
  interface {
    bindnetaddr: 192.168.145.101
    ringnumber: 0
  }
  ip_version: ipv4
  secauth: on
  version: 2

which is kind of conflicting (secauth vs crypto_*). are the corosync versions identical on both partitions of the cluster? my guess is that they are not, and that the powered-down one interprets the config as "don't use crypto", and the powered-on one gives the "secauth" higher priority and thus enables encryption, so both partitions can't talk with eachother.[/code]
 
Thanks for the hint. I understood what happened and seem to be able to fix the issue.
It all goes back to when I did the pve 5 to pve 6 upgrade and needed to force

Code:
transport: udp

for totem

I then modified the corosync.conf file but never restarted the corosync processes, so totem is still using udp. I went back to the old configuration file and things seem ok now. At a lower priority I'll have to switch udp off.

Stefano.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!