Issue Corosync 3 nodes cluster

korigan

New Member
Jul 3, 2024
1
0
1
Hello,

I have an issue with corosync. I have 3 nodes 1 in wireguard with my box and the other in the same network.
The system work but sometime I have an issue with corosync the node n°2 (pve) in the local network doest not work.
The pve node is still avalable in ssh.

I have tried to reboot corosync and pve-cluster x time and delete the conf file etc but it not working.

pvecm status
Code:
Cluster information
-------------------
Name:             cluster
Config Version:   14
Transport:        knet
Secure auth:      on

Cannot initialize CMAP service

/etc/pve/corosync.conf

Code:
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: piedacoulisse2
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 192.168.0.40
  }
  node {
    name: pve
    nodeid: 2
    quorum_votes: 1
    ring0_addr: 192.168.0.97
  }
  node {
    name: pvejf
    nodeid: 3
    quorum_votes: 1
    ring0_addr: 192.168.27.68
  }
}

quorum {
  provider: corosync_votequorum
}

/etc/hosts
Code:
/etc/hosts
127.0.0.1 localhost.localdomain localhost
192.168.0.97 proxmox.domain.com pve

# The following lines are desirable for IPv6 capable hosts

::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts


journalctl -u corosync.service -b
Code:
Jul 04 00:12:41 pve corosync[10761]:   [MAIN  ] Corosync Cluster Engine  starting up
Jul 04 00:12:41 pve corosync[10761]:   [MAIN  ] Corosync built-in features: dbus monitoring watchdog systemd xmlconf vqsim nozzle snmp pie relro bindnow
Jul 04 00:12:41 pve corosync[10761]:   [MAIN  ] Could not open /etc/corosync/authkey: No such file or directory
Jul 04 00:12:41 pve systemd[1]: corosync.service: Main process exited, code=exited, status=8/n/a
Jul 04 00:12:41 pve corosync[10761]:   [MAIN  ] Corosync Cluster Engine exiting with status 8 at main.c:1417.
Jul 04 00:12:41 pve systemd[1]: corosync.service: Failed with result 'exit-code'.
Jul 04 00:12:41 pve systemd[1]: Failed to start corosync.service - Corosync Cluster Engine.

systemctl status corosync
Code:
× corosync.service - Corosync Cluster Engine
     Loaded: loaded (/lib/systemd/system/corosync.service; enabled; preset: enabled)
     Active: failed (Result: exit-code) since Thu 2024-07-04 00:12:41 CEST; 1min 30s ago
       Docs: man:corosync
             man:corosync.conf
             man:corosync_overview
    Process: 10761 ExecStart=/usr/sbin/corosync -f $COROSYNC_OPTIONS (code=exited, status=8)
   Main PID: 10761 (code=exited, status=8)
        CPU: 13ms

Jul 04 00:12:41 pve systemd[1]: Starting corosync.service - Corosync Cluster Engine...
Jul 04 00:12:41 pve corosync[10761]:   [MAIN  ] Corosync Cluster Engine  starting up
Jul 04 00:12:41 pve corosync[10761]:   [MAIN  ] Corosync built-in features: dbus monitoring watchdog systemd xmlconf vqsim nozzle snmp pie relro bindnow
Jul 04 00:12:41 pve corosync[10761]:   [MAIN  ] Could not open /etc/corosync/authkey: No such file or directory
Jul 04 00:12:41 pve systemd[1]: corosync.service: Main process exited, code=exited, status=8/n/a
Jul 04 00:12:41 pve corosync[10761]:   [MAIN  ] Corosync Cluster Engine exiting with status 8 at main.c:1417.
Jul 04 00:12:41 pve systemd[1]: corosync.service: Failed with result 'exit-code'.
Jul 04 00:12:41 pve systemd[1]: Failed to start corosync.service - Corosync Cluster Engine.

Resolve : I copied the /etc/corosync/authkey file from node 1 to node 2.
If someone have an explanation ...
 
Last edited: