Cluster Node lost connection

MisterY

Renowned Member
Oct 10, 2016
140
4
83
37
I have a 2-Node Cluster with a Qdevice. It worked fine for quite a long time, however, suddenly the VMs on Node 2 get a "?" and the node "unkown".

BUT I can access the VMs/Shell/disks/zfs/EVERYTHING of Node 2from the webgui of Node 1.
I already rebooted the nodes.
I can ping node 2 from 1 and vice versa.

Node 1:
Code:
service corosync status
● corosync.service - Corosync Cluster Engine
     Loaded: loaded (/lib/systemd/system/corosync.service; enabled; preset: enabled)
     Active: active (running) since Tue 2024-06-25 15:04:47 CEST; 2min 24s ago
       Docs: man:corosync
             man:corosync.conf
             man:corosync_overview
   Main PID: 103887 (corosync)
      Tasks: 9 (limit: 309055)
     Memory: 138.2M
        CPU: 1.642s
     CGroup: /system.slice/corosync.service
             └─103887 /usr/sbin/corosync -f

Jun 25 15:07:03 supermicro corosync[103887]:   [KNET  ] rx: Packet rejected from 192.168.50.21:5>
Strange: it tells me "192.168.50.21", but I changed (after the error) the ip of node 2 to 192.168.0.21 (of course on node 1 as well)

Again Node 1:
Code:
pvecm status
Cluster information
-------------------
Name:             GeoCluster
Config Version:   8
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Tue Jun 25 15:18:03 2024
Quorum provider:  corosync_votequorum
Nodes:            1
Node ID:          0x00000001
Ring ID:          1.72c
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   2
Highest expected: 2
Total votes:      2
Quorum:           2 
Flags:            Quorate Qdevice

Membership information
----------------------
    Nodeid      Votes    Qdevice Name
0x00000001          1    A,V,NMW 192.168.0.12 (local)
0x00000000          1            Qdevice

Again Node 1:

Code:
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: sm2
    nodeid: 2
    quorum_votes: 1
    ring0_addr: 192.168.0.21
  }
  node {
    name: supermicro
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 192.168.0.12
  }
}

quorum {
  device {
    model: net
    net {
      algorithm: ffsplit
      host: 192.168.50.15
      tls: on
    }
    votes: 1
  }
  provider: corosync_votequorum
}

totem {
  cluster_name: GeoCluster
  config_version: 8
  interface {
    linknumber: 0
  }
  ip_version: ipv4-6
  link_mode: passive
  secauth: on
  version: 2
}
Node 2 first had here 192.168.50.21 (which was wrong, but the webgui didn't let me choose another one). I changed it AFTER the error.

And here Node 2:

Code:
pvecm status
Cluster information
-------------------
Name:             GeoCluster
Config Version:   7
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Tue Jun 25 15:17:56 2024
Quorum provider:  corosync_votequorum
Nodes:            1
Node ID:          0x00000002
Ring ID:          2.731
Quorate:          No

Votequorum information
----------------------
Expected votes:   2
Highest expected: 2
Total votes:      1
Quorum:           2 Activity blocked
Flags:           

Membership information
----------------------
    Nodeid      Votes Name
0x00000002          1 192.168.50.21 (local)
Why is there still 192.168.50.21?

Any ideas?
 
Update: When I change everything of Node 2 on Node 1 back to "192.168.50.21" I get a connection somewhat,
but corosync service status give:

Code:
host: 2 has no active links
But that is not true.

Node 2 tells me now "cannot initialize cmap service" and corosync service failes because

Code:
[quorum] crit: quorum initialize failed: 2

and node 1 has no active links (again: not true)

Update 2: When I change Node 1 to 192.168.50.12 then the cluster tries to connect, but corosync.service on node 1 restarts every second due to failing.
 
Last edited:
okay, after a reboot I get error 595.

I noticed that here:

/etc/pve/.members

the IP is still wrong:

{
"nodename": "supermicro",
"version": 5,
"cluster": { "name": "Cluster", "version": 10, "nodes": 2, "quorate": 1 },
"nodelist": {
"sm2": { "id": 2, "online": 0, "ip": "192.168.50.21"},
"supermicro": { "id": 1, "online": 1, "ip": "192.168.50.12"}
}
}

The IP should be 192.168.40.21 and 192.168.40.12

How can I change that?