Cluster Node lost connection

MisterY

Well-Known Member
Oct 10, 2016
140
4
58
37
I have a 2-Node Cluster with a Qdevice. It worked fine for quite a long time, however, suddenly the VMs on Node 2 get a "?" and the node "unkown".

BUT I can access the VMs/Shell/disks/zfs/EVERYTHING of Node 2from the webgui of Node 1.
I already rebooted the nodes.
I can ping node 2 from 1 and vice versa.

Node 1:
Code:
service corosync status
● corosync.service - Corosync Cluster Engine
     Loaded: loaded (/lib/systemd/system/corosync.service; enabled; preset: enabled)
     Active: active (running) since Tue 2024-06-25 15:04:47 CEST; 2min 24s ago
       Docs: man:corosync
             man:corosync.conf
             man:corosync_overview
   Main PID: 103887 (corosync)
      Tasks: 9 (limit: 309055)
     Memory: 138.2M
        CPU: 1.642s
     CGroup: /system.slice/corosync.service
             └─103887 /usr/sbin/corosync -f

Jun 25 15:07:03 supermicro corosync[103887]:   [KNET  ] rx: Packet rejected from 192.168.50.21:5>
Strange: it tells me "192.168.50.21", but I changed (after the error) the ip of node 2 to 192.168.0.21 (of course on node 1 as well)

Again Node 1:
Code:
pvecm status
Cluster information
-------------------
Name:             GeoCluster
Config Version:   8
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Tue Jun 25 15:18:03 2024
Quorum provider:  corosync_votequorum
Nodes:            1
Node ID:          0x00000001
Ring ID:          1.72c
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   2
Highest expected: 2
Total votes:      2
Quorum:           2 
Flags:            Quorate Qdevice

Membership information
----------------------
    Nodeid      Votes    Qdevice Name
0x00000001          1    A,V,NMW 192.168.0.12 (local)
0x00000000          1            Qdevice

Again Node 1:

Code:
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: sm2
    nodeid: 2
    quorum_votes: 1
    ring0_addr: 192.168.0.21
  }
  node {
    name: supermicro
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 192.168.0.12
  }
}

quorum {
  device {
    model: net
    net {
      algorithm: ffsplit
      host: 192.168.50.15
      tls: on
    }
    votes: 1
  }
  provider: corosync_votequorum
}

totem {
  cluster_name: GeoCluster
  config_version: 8
  interface {
    linknumber: 0
  }
  ip_version: ipv4-6
  link_mode: passive
  secauth: on
  version: 2
}
Node 2 first had here 192.168.50.21 (which was wrong, but the webgui didn't let me choose another one). I changed it AFTER the error.

And here Node 2:

Code:
pvecm status
Cluster information
-------------------
Name:             GeoCluster
Config Version:   7
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Tue Jun 25 15:17:56 2024
Quorum provider:  corosync_votequorum
Nodes:            1
Node ID:          0x00000002
Ring ID:          2.731
Quorate:          No

Votequorum information
----------------------
Expected votes:   2
Highest expected: 2
Total votes:      1
Quorum:           2 Activity blocked
Flags:           

Membership information
----------------------
    Nodeid      Votes Name
0x00000002          1 192.168.50.21 (local)
Why is there still 192.168.50.21?

Any ideas?
 
Update: When I change everything of Node 2 on Node 1 back to "192.168.50.21" I get a connection somewhat,
but corosync service status give:

Code:
host: 2 has no active links
But that is not true.

Node 2 tells me now "cannot initialize cmap service" and corosync service failes because

Code:
[quorum] crit: quorum initialize failed: 2

and node 1 has no active links (again: not true)

Update 2: When I change Node 1 to 192.168.50.12 then the cluster tries to connect, but corosync.service on node 1 restarts every second due to failing.
 
Last edited:
okay, after a reboot I get error 595.

I noticed that here:

/etc/pve/.members

the IP is still wrong:

{
"nodename": "supermicro",
"version": 5,
"cluster": { "name": "Cluster", "version": 10, "nodes": 2, "quorate": 1 },
"nodelist": {
"sm2": { "id": 2, "online": 0, "ip": "192.168.50.21"},
"supermicro": { "id": 1, "online": 1, "ip": "192.168.50.12"}
}
}

The IP should be 192.168.40.21 and 192.168.40.12

How can I change that?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!