Problem with corosync and one of three nodes

xmftech

New Member
Jan 11, 2025
18
0
1
Good afternoon, I recently added a third host to my cluster. Until now I had one host, and another standby, they do not share containers or VMs and High Availability does not interest me because it is a Home Lab and I make the relevant backups. To make it work for me I modified the corosync.conf file and gave 2 votes in the Quorum to the host that is 24/7 controlling the home automation and other services. So far no problem.

But when adding a third host that will probably have to act as a Firewall with virtualized OPNsense, in one of the restarts of the host that is 24/7 it did not start. I managed to solve it by putting 3 votes in the quorum for this host but I do not know if it is due to this or the addition of the third host now when I run

Code:
systemctl status corosync

It shows me the following

Code:
● corosync.service - Corosync Cluster Engine
     Loaded: loaded (/lib/systemd/system/corosync.service; enabled; preset: enabled)
     Active: active (running) since Mon 2025-08-11 16:47:47 CEST; 14min ago
       Docs: man:corosync
             man:corosync.conf
             man:corosync_overview
   Main PID: 25113 (corosync)
      Tasks: 9 (limit: 13951)
     Memory: 132.8M
        CPU: 11.943s
     CGroup: /system.slice/corosync.service
             └─25113 /usr/sbin/corosync -f

d’ag. 11 16:47:47 host1 corosync[25113]:   [WD    ] Watchdog not enabled by configuration
d’ag. 11 16:47:47 host1 corosync[25113]:   [WD    ] resource load_15min missing a recovery key.
d’ag. 11 16:47:47 host1 corosync[25113]:   [WD    ] resource memory_used missing a recovery key.
d’ag. 11 16:47:47 host1 corosync[25113]:   [KNET  ] host: host: 1 has no active links
d’ag. 11 16:47:47 host1 corosync[25113]:   [KNET  ] host: host: 1 has no active links
d’ag. 11 16:47:47 host1 corosync[25113]:   [KNET  ] host: host: 1 has no active links
d’ag. 11 16:47:47 host1 corosync[25113]:   [KNET  ] host: host: 3 has no active links
d’ag. 11 16:47:47 host1 corosync[25113]:   [KNET  ] host: host: 3 has no active links
d’ag. 11 16:47:47 host1 corosync[25113]:   [KNET  ] host: host: 3 has no active links

Where can I start looking?

Thank you
 
and gave 2 votes in the Quorum to the host that is 24/7 controlling the home automation and other services. So far no problem.
I am not convinced that it works stable by just rising the number of votes of specific nodes. You are just lying to that important Quorum manager.

I managed to solve it by putting 3 votes in the quorum for this host
This confirms my suspicion.

My recommendation is to set all corosync related tweaks to default. You have three nodes, so set them up as intended and tweak later - when everything is stable and works as intended.

d’ag. 11 16:47:47 host1 corosync[25113]: [KNET ] host: host: 1 has no active links
d’ag. 11 16:47:47 host1 corosync[25113]: [KNET ] host: host: 3 has no active links

That's of course the culprit.

Start by posting the full output (including the initiating command) of
  • pveversion of all three nodes
  • pvecm status of one intact and the problem-node
  • ip a of all three nodes
  • cat /etc/network/interfaces of all three nodes
Run corosync-cfgtool (without parameter) to get some hint of options to use. E.g. run corosync-cfgtool -s on each node.

----

Beside corosync PVE itself requires the same understanding of hostnames and IP-addresses. If you do not utilize a central DNS server the usual way is to fill up /etc/hosts with the same identical information on all nodes.

If your nodes are named pveh, pvei, pvej then for H in pveh pvei pvej ; do echo "---------- $H: "; ping -c 1 $H ; done must run flawlessly on all cluster members. To query the PVE version I can use for H in pveh pvei pvej ; do echo "---------- $H: "; ssh $H pveversion ; done - without being asked for any password.
 
  • Like
Reactions: Johannes S and fba
To make it work for me I modified the corosync.conf file and gave 2 votes in the Quorum to the host that is 24/7 controlling the home automation and other services. So far no problem.
Instead of changing the amount of votes, use a Qdevice to get an uneven number of votes as described here.
Increasing votes of a single node like that just makes the entire cluster fail, if this node fails. So just don't configure this kind of single-point-of-failure.
 
  • Like
Reactions: Johannes S and UdoB
I am not convinced that it works stable by just rising the number of votes of specific nodes. You are just lying to that important Quorum manager.


This confirms my suspicion.

My recommendation is to set all corosync related tweaks to default. You have three nodes, so set them up as intended and tweak later - when everything is stable and works as intended.



That's of course the culprit.

Start by posting the full output (including the initiating command) of
  • pveversion of all three nodes
  • pvecm status of one intact and the problem-node
  • ip a of all three nodes
  • cat /etc/network/interfaces of all three nodes
Run corosync-cfgtool (without parameter) to get some hint of options to use. E.g. run corosync-cfgtool -s on each node.

----

Beside corosync PVE itself requires the same understanding of hostnames and IP-addresses. If you do not utilize a central DNS server the usual way is to fill up /etc/hosts with the same identical information on all nodes.

If your nodes are named pveh, pvei, pvej then for H in pveh pvei pvej ; do echo "---------- $H: "; ping -c 1 $H ; done must run flawlessly on all cluster members. To query the PVE version I can use for H in pveh pvei pvej ; do echo "---------- $H: "; ssh $H pveversion ; done - without being asked for any password.
pvecm status output:

Code:
root@host1:~# pvecm status
Cluster information
-------------------
Name:             PRHOSTS
Config Version:   4
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Tue Aug 12 16:05:52 2025
Quorum provider:  corosync_votequorum
Nodes:            3
Node ID:          0x00000002
Ring ID:          1.60c
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   5
Highest expected: 5
Total votes:      5
Quorum:           3 
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 192.168.1.239
0x00000002          3 192.168.1.240 (local)
0x00000003          1 192.168.1.241

ip a output:

Code:
root@host1:~# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host noprefixroute
       valid_lft forever preferred_lft forever
2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master vmbr0 state UP group default qlen 1000
    link/ether 84:47:09:41:37:b6 brd ff:ff:ff:ff:ff:ff
3: enp2s0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast master vmbr1 state DOWN group default qlen 1000
    link/ether 84:47:09:41:37:b7 brd ff:ff:ff:ff:ff:ff
4: wlp0s20f3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 64:32:a8:fc:a6:9d brd ff:ff:ff:ff:ff:ff
5: vmbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 84:47:09:41:37:b6 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.240/24 scope global vmbr0
       valid_lft forever preferred_lft forever
    inet6 fe80::8647:9ff:fe41:37b6/64 scope link
       valid_lft forever preferred_lft forever
6: vmbr1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
    link/ether 84:47:09:41:37:b7 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::8647:9ff:fe41:37b7/64 scope link
       valid_lft forever preferred_lft forever
7: veth109i0@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vmbr0 state UP group default qlen 1000
    link/ether fe:ab:75:38:c3:43 brd ff:ff:ff:ff:ff:ff link-netnsid 0
8: tap101i0: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master vmbr0 state UNKNOWN group default qlen 1000
    link/ether 7e:bf:9f:7f:a3:a3 brd ff:ff:ff:ff:ff:ff
9: veth110i0@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vmbr0 state UP group default qlen 1000
    link/ether fe:a6:5b:8b:3a:4a brd ff:ff:ff:ff:ff:ff link-netnsid 1
10: veth103i0@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vmbr0 state UP group default qlen 1000
    link/ether fe:d2:51:44:34:3a brd ff:ff:ff:ff:ff:ff link-netnsid 2
11: veth108i0@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vmbr0 state UP group default qlen 1000
    link/ether fe:ba:be:a8:2b:79 brd ff:ff:ff:ff:ff:ff link-netnsid 3
12: veth116i0@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vmbr0 state UP group default qlen 1000
    link/ether fe:3d:de:ed:cf:51 brd ff:ff:ff:ff:ff:ff link-netnsid 4
13: veth113i0@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vmbr0 state UP group default qlen 1000
    link/ether fe:14:05:f1:f5:4a brd ff:ff:ff:ff:ff:ff link-netnsid 5
14: veth114i0@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vmbr0 state UP group default qlen 1000
    link/ether fe:8d:86:57:e7:bd brd ff:ff:ff:ff:ff:ff link-netnsid 6
15: veth118i0@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vmbr0 state UP group default qlen 1000
    link/ether fe:16:58:23:44:64 brd ff:ff:ff:ff:ff:ff link-netnsid 7
16: veth117i0@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vmbr0 state UP group default qlen 1000
    link/ether fe:cf:dd:8c:89:4b brd ff:ff:ff:ff:ff:ff link-netnsid 8
17: tap119i0: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master fwbr119i0 state UNKNOWN group default qlen 1000
    link/ether d2:d2:1d:1e:23:ed brd ff:ff:ff:ff:ff:ff
18: fwbr119i0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether a2:c2:70:05:b9:eb brd ff:ff:ff:ff:ff:ff
19: fwpr119p0@fwln119i0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vmbr0 state UP group default qlen 1000
    link/ether e6:9e:b0:85:7a:eb brd ff:ff:ff:ff:ff:ff
20: fwln119i0@fwpr119p0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master fwbr119i0 state UP group default qlen 1000
    link/ether a2:c2:70:05:b9:eb brd ff:ff:ff:ff:ff:ff
21: veth124i0@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vmbr0 state UP group default qlen 1000
    link/ether fe:c4:71:3d:66:c6 brd ff:ff:ff:ff:ff:ff link-netnsid 9
22: veth115i0@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vmbr0 state UP group default qlen 1000
    link/ether fe:01:9e:f6:a5:43 brd ff:ff:ff:ff:ff:ff link-netnsid 10

Code:
root@host0:~# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host noprefixroute
       valid_lft forever preferred_lft forever
2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master vmbr0 state UP group default qlen 1000
    link/ether a8:b8:e0:0a:14:c5 brd ff:ff:ff:ff:ff:ff
3: enp2s0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq master vmbr1 state DOWN group default qlen 1000
    link/ether a8:b8:e0:0a:14:c6 brd ff:ff:ff:ff:ff:ff
4: enp3s0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether a8:b8:e0:0a:14:c7 brd ff:ff:ff:ff:ff:ff
5: enp4s0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether a8:b8:e0:0a:14:c8 brd ff:ff:ff:ff:ff:ff
6: enp5s0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether a8:b8:e0:0a:14:c9 brd ff:ff:ff:ff:ff:ff
7: enp7s0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether a8:b8:e0:0a:14:ca brd ff:ff:ff:ff:ff:ff
8: vmbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether a8:b8:e0:0a:14:c5 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.241/24 scope global vmbr0
       valid_lft forever preferred_lft forever
    inet6 fe80::aab8:e0ff:fe0a:14c5/64 scope link
       valid_lft forever preferred_lft forever
9: vmbr1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
    link/ether a8:b8:e0:0a:14:c6 brd ff:ff:ff:ff:ff:ff

Code:
root@PRHOST:~# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host noprefixroute
       valid_lft forever preferred_lft forever
2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master vmbr0 state UP group default qlen 1000
    link/ether 08:f1:ea:b0:0d:28 brd ff:ff:ff:ff:ff:ff
    altname enp2s0f0
3: eno2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq master vmbr1 state DOWN group default qlen 1000
    link/ether 08:f1:ea:b0:0d:29 brd ff:ff:ff:ff:ff:ff
    altname enp2s0f1
4: vmbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 08:f1:ea:b0:0d:28 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.239/24 scope global vmbr0
       valid_lft forever preferred_lft forever
    inet6 fe80::af1:eaff:feb0:d28/64 scope link
       valid_lft forever preferred_lft forever
5: vmbr1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
    link/ether 08:f1:ea:b0:0d:29 brd ff:ff:ff:ff:ff:ff
6: zteb4pzo6h: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2800 qdisc pfifo_fast state UNKNOWN group default qlen 1000
    link/ether ea:51:c9:e2:1d:f0 brd ff:ff:ff:ff:ff:ff
    inet 10.147.17.175/24 brd 10.147.17.255 scope global zteb4pzo6h
       valid_lft forever preferred_lft forever
    inet6 fe80::e851:c9ff:fee2:1df0/64 scope link
       valid_lft forever preferred_lft forever

cat /etc/network/interfaces output:

Code:
root@host1:~# cat /etc/network/interfaces
# network interface settings; autogenerated
# Please do NOT modify this file directly, unless you know what
# you're doing.
#
# If you want to manage parts of the network configuration manually,
# please utilize the 'source' or 'source-directory' directives to do
# so.
# PVE will preserve these directives, but will NOT read its network
# configuration from sourced files, so do not attempt to move any of
# the PVE managed interfaces into external files!

auto lo
iface lo inet loopback

auto enp1s0
iface enp1s0 inet manual
# Activar WOL
post-up /usr/sbin/ethtool -s enp1s0 wol g

auto enp2s0
iface enp2s0 inet manual

auto vmbr0
iface vmbr0 inet static
        address 192.168.1.240/24
        gateway 192.168.1.1
        bridge-ports enp1s0
        bridge-stp off
        bridge-fd 0

iface wlp0s20f3 inet manual

auto vmbr1
iface vmbr1 inet manual
        bridge-ports enp2s0
        bridge-stp off
        bridge-fd 0

source /etc/network/interfaces.d/*

Code:
root@host0:~# cat /etc/network/interfaces
# network interface settings; autogenerated
# Please do NOT modify this file directly, unless you know what
# you're doing.
#
# If you want to manage parts of the network configuration manually,
# please utilize the 'source' or 'source-directory' directives to do
# so.
# PVE will preserve these directives, but will NOT read its network
# configuration from sourced files, so do not attempt to move any of
# the PVE managed interfaces into external files!

auto lo
iface lo inet loopback

iface enp1s0 inet manual

iface enp2s0 inet manual

iface enp3s0 inet manual

iface enp4s0 inet manual

iface enp5s0 inet manual

iface enp7s0 inet manual

auto vmbr0
iface vmbr0 inet static
        address 192.168.1.241/24
        gateway 192.168.1.1
        bridge-ports enp1s0
        bridge-stp off
        bridge-fd 0

auto vmbr1
iface vmbr1 inet manual
        bridge-ports enp2s0
        bridge-stp off
        bridge-fd 0

source /etc/network/interfaces.d/*

Code:
root@PRHOST:~# cat /etc/network/interfaces
# network interface settings; autogenerated
# Please do NOT modify this file directly, unless you know what
# you're doing.
#
# If you want to manage parts of the network configuration manually,
# please utilize the 'source' or 'source-directory' directives to do
# so.
# PVE will preserve these directives, but will NOT read its network
# configuration from sourced files, so do not attempt to move any of
# the PVE managed interfaces into external files!

auto lo
iface lo inet loopback

iface eno1 inet manual
#LAN1

iface eno2 inet manual
#LAN2

auto vmbr0
iface vmbr0 inet static
        address 192.168.1.239/24
        gateway 192.168.1.1
        bridge-ports eno1
        bridge-stp off
        bridge-fd 0
#LAN LAN1

auto vmbr1
iface vmbr1 inet manual
        bridge-ports eno2
        bridge-stp off
        bridge-fd 0
#WAN LAN2

source /etc/network/interfaces.d/*
 
pvecm status output:

[...]
pveversion output:

Code:
root@host1:~# pveversion -v
proxmox-ve: 8.4.0 (running kernel: 6.8.12-13-pve)
pve-manager: 8.4.10 (running version: 8.4.10/293f4abc4b22fa08)
proxmox-kernel-helper: 8.1.4
proxmox-kernel-6.8.12-13-pve-signed: 6.8.12-13
proxmox-kernel-6.8: 6.8.12-13
proxmox-kernel-6.8.12-12-pve-signed: 6.8.12-12
ceph-fuse: 17.2.7-pve3
corosync: 3.1.9-pve1
criu: 3.17.1-2+deb12u1
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx11
intel-microcode: 3.20250512.1~deb12u1
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-5
libknet1: 1.30-pve2
libproxmox-acme-perl: 1.6.0
libproxmox-backup-qemu0: 1.5.2
libproxmox-rs-perl: 0.3.5
libpve-access-control: 8.2.2
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.1.2
libpve-cluster-perl: 8.1.2
libpve-common-perl: 8.3.4
libpve-guest-common-perl: 5.2.2
libpve-http-server-perl: 5.2.2
libpve-network-perl: 0.11.2
libpve-rs-perl: 0.9.4
libpve-storage-perl: 8.3.7
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.6.0-2
proxmox-backup-client: 3.4.4-1
proxmox-backup-file-restore: 3.4.4-1
proxmox-backup-restore-image: 0.7.0
proxmox-firewall: 0.7.1
proxmox-kernel-helper: 8.1.4
proxmox-mail-forward: 0.3.3
proxmox-mini-journalreader: 1.5
proxmox-offline-mirror-helper: 0.6.7
proxmox-widget-toolkit: 4.3.13
pve-cluster: 8.1.2
pve-container: 5.3.0
pve-docs: 8.4.1
pve-edk2-firmware: 4.2025.02-4~bpo12+1
pve-esxi-import-tools: 0.7.4
pve-firewall: 5.1.2
pve-firmware: 3.16-3
pve-ha-manager: 4.0.7
pve-i18n: 3.4.5
pve-qemu-kvm: 9.2.0-7
pve-xtermjs: 5.5.0-2
qemu-server: 8.4.1
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.8-pve1

Code:
root@host0:~# pveversion -v
proxmox-ve: 8.4.0 (running kernel: 6.8.12-13-pve)
pve-manager: 8.4.10 (running version: 8.4.10/293f4abc4b22fa08)
proxmox-kernel-helper: 8.1.4
proxmox-kernel-6.8.12-13-pve-signed: 6.8.12-13
proxmox-kernel-6.8: 6.8.12-13
proxmox-kernel-6.8.12-9-pve-signed: 6.8.12-9
ceph-fuse: 17.2.8-pve2
corosync: 3.1.9-pve1
criu: 3.17.1-2+deb12u1
frr-pythontools: 10.2.3-1+pve1
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx11
intel-microcode: 3.20250211.1~deb12u1
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-5
libknet1: 1.30-pve2
libproxmox-acme-perl: 1.6.0
libproxmox-backup-qemu0: 1.5.2
libproxmox-rs-perl: 0.3.5
libpve-access-control: 8.2.2
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.1.2
libpve-cluster-perl: 8.1.2
libpve-common-perl: 8.3.4
libpve-guest-common-perl: 5.2.2
libpve-http-server-perl: 5.2.2
libpve-network-perl: 0.11.2
libpve-rs-perl: 0.9.4
libpve-storage-perl: 8.3.7
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.6.0-2
proxmox-backup-client: 3.4.4-1
proxmox-backup-file-restore: 3.4.4-1
proxmox-backup-restore-image: 0.7.0
proxmox-firewall: 0.7.1
proxmox-kernel-helper: 8.1.4
proxmox-mail-forward: 0.3.3
proxmox-mini-journalreader: 1.5
proxmox-offline-mirror-helper: 0.6.7
proxmox-widget-toolkit: 4.3.13
pve-cluster: 8.1.2
pve-container: 5.3.0
pve-docs: 8.4.1
pve-edk2-firmware: 4.2025.02-4~bpo12+1
pve-esxi-import-tools: 0.7.4
pve-firewall: 5.1.2
pve-firmware: 3.16-3
pve-ha-manager: 4.0.7
pve-i18n: 3.4.5
pve-qemu-kvm: 9.2.0-7
pve-xtermjs: 5.5.0-2
qemu-server: 8.4.1
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.8-pve1

Code:
root@PRHOST:~# pveversion -v
proxmox-ve: 8.4.0 (running kernel: 6.8.12-13-pve)
pve-manager: 8.4.10 (running version: 8.4.10/293f4abc4b22fa08)
proxmox-kernel-helper: 8.1.4
proxmox-kernel-6.8.12-13-pve-signed: 6.8.12-13
proxmox-kernel-6.8: 6.8.12-13
proxmox-kernel-6.8.12-12-pve-signed: 6.8.12-12
proxmox-kernel-6.8.12-4-pve-signed: 6.8.12-4
ceph-fuse: 17.2.7-pve3
corosync: 3.1.9-pve1
criu: 3.17.1-2+deb12u1
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx11
intel-microcode: 3.20250211.1~deb12u1
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-5
libknet1: 1.30-pve2
libproxmox-acme-perl: 1.6.0
libproxmox-backup-qemu0: 1.5.2
libproxmox-rs-perl: 0.3.5
libpve-access-control: 8.2.2
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.1.2
libpve-cluster-perl: 8.1.2
libpve-common-perl: 8.3.4
libpve-guest-common-perl: 5.2.2
libpve-http-server-perl: 5.2.2
libpve-network-perl: 0.11.2
libpve-rs-perl: 0.9.4
libpve-storage-perl: 8.3.7
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.6.0-2
proxmox-backup-client: 3.4.4-1
proxmox-backup-file-restore: 3.4.4-1
proxmox-backup-restore-image: 0.7.0
proxmox-firewall: 0.7.1
proxmox-kernel-helper: 8.1.4
proxmox-mail-forward: 0.3.3
proxmox-mini-journalreader: 1.5
proxmox-offline-mirror-helper: 0.6.7
proxmox-widget-toolkit: 4.3.13
pve-cluster: 8.1.2
pve-container: 5.3.0
pve-docs: 8.4.1
pve-edk2-firmware: 4.2025.02-4~bpo12+1
pve-esxi-import-tools: 0.7.4
pve-firewall: 5.1.2
pve-firmware: 3.16-3
pve-ha-manager: 4.0.7
pve-i18n: 3.4.5
pve-qemu-kvm: 9.2.0-7
pve-xtermjs: 5.5.0-2
qemu-server: 8.4.1
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.8-pve1

corosync-cfgtool -s output:

Code:
root@host1:~# corosync-cfgtool -s
Local node ID 2, transport knet
LINK ID 0 udp
        addr    = 192.168.1.240
        status:
                nodeid:          1:     connected
                nodeid:          2:     localhost
                nodeid:          3:     connected

Code:
root@host0:~# corosync-cfgtool -s
Local node ID 3, transport knet
LINK ID 0 udp
        addr    = 192.168.1.241
        status:
                nodeid:          1:     connected
                nodeid:          2:     connected
                nodeid:          3:     localhost

Code:
root@PRHOST:~# corosync-cfgtool -s
Local node ID 1, transport knet
LINK ID 0 udp
        addr    = 192.168.1.239
        status:
                nodeid:          1:     localhost
                nodeid:          2:     connected
                nodeid:          3:     connected

systemctl status corosync output:

Code:
root@host1:~# systemctl status corosync
● corosync.service - Corosync Cluster Engine
     Loaded: loaded (/lib/systemd/system/corosync.service; enabled; preset: enabled)
     Active: active (running) since Mon 2025-08-11 18:41:03 CEST; 21h ago
       Docs: man:corosync
             man:corosync.conf
             man:corosync_overview
   Main PID: 103131 (corosync)
      Tasks: 9 (limit: 13951)
     Memory: 132.4M
        CPU: 10min 41.785s
     CGroup: /system.slice/corosync.service
             └─103131 /usr/sbin/corosync -f

d’ag. 11 18:41:03 host1 corosync[103131]:   [KNET  ] host: host: 1 has no active links
d’ag. 11 18:41:03 host1 corosync[103131]:   [KNET  ] host: host: 3 has no active links
d’ag. 11 18:41:03 host1 corosync[103131]:   [KNET  ] host: host: 3 has no active links
d’ag. 11 18:41:03 host1 corosync[103131]:   [KNET  ] host: host: 3 has no active links
d’ag. 11 18:41:35 host1 corosync[103131]:   [KNET  ] host: host: 3 has no active links
d’ag. 11 18:41:35 host1 corosync[103131]:   [KNET  ] host: host: 3 has no active links
d’ag. 11 18:41:37 host1 corosync[103131]:   [KNET  ] host: host: 3 has no active links
d’ag. 11 18:41:42 host1 corosync[103131]:   [KNET  ] host: host: 1 has no active links
d’ag. 11 19:50:54 host1 corosync[103131]:   [KNET  ] host: host: 3 has no active links
d’ag. 11 19:50:58 host1 corosync[103131]:   [KNET  ] host: host: 1 has no active links

Code:
root@host0:~# systemctl status corosync
● corosync.service - Corosync Cluster Engine
     Loaded: loaded (/lib/systemd/system/corosync.service; enabled; preset: enabled)
     Active: active (running) since Tue 2025-08-12 15:53:56 CEST; 23min ago
       Docs: man:corosync
             man:corosync.conf
             man:corosync_overview
   Main PID: 895 (corosync)
      Tasks: 9 (limit: 18834)
     Memory: 135.4M
        CPU: 20.856s
     CGroup: /system.slice/corosync.service
             └─895 /usr/sbin/corosync -f

d’ag. 12 15:53:58 host0 corosync[895]:   [KNET  ] pmtud: Global data MTU changed to: 1397
d’ag. 12 16:05:37 host0 corosync[895]:   [KNET  ] link: Resetting MTU for link 0 because host 1 joined
d’ag. 12 16:05:37 host0 corosync[895]:   [KNET  ] host: host: 1 has 1 active links
d’ag. 12 16:05:37 host0 corosync[895]:   [QUORUM] Sync members[3]: 1 2 3
d’ag. 12 16:05:37 host0 corosync[895]:   [QUORUM] Sync joined[1]: 1
d’ag. 12 16:05:37 host0 corosync[895]:   [TOTEM ] A new membership (1.60c) was formed. Members joined: 1
d’ag. 12 16:05:37 host0 corosync[895]:   [QUORUM] Members[3]: 1 2 3
d’ag. 12 16:05:37 host0 corosync[895]:   [MAIN  ] Completed service synchronization, ready to provide service.
d’ag. 12 16:05:37 host0 corosync[895]:   [KNET  ] pmtud: PMTUD link change for host: 1 link: 0 from 469 to 1397
d’ag. 12 16:05:37 host0 corosync[895]:   [KNET  ] pmtud: Global data MTU changed to: 1397

Code:
root@PRHOST:~# systemctl status corosync
● corosync.service - Corosync Cluster Engine
     Loaded: loaded (/lib/systemd/system/corosync.service; enabled; preset: enabled)
     Active: active (running) since Tue 2025-08-12 16:05:35 CEST; 11min ago
       Docs: man:corosync
             man:corosync.conf
             man:corosync_overview
   Main PID: 1299 (corosync)
      Tasks: 9 (limit: 9257)
     Memory: 141.2M
        CPU: 12.096s
     CGroup: /system.slice/corosync.service
             └─1299 /usr/sbin/corosync -f

d’ag. 12 16:05:37 PRHOST corosync[1299]:   [KNET  ] host: host: 3 has 1 active links
d’ag. 12 16:05:37 PRHOST corosync[1299]:   [KNET  ] pmtud: PMTUD link change for host: 2 link: 0 from 469 to 1397
d’ag. 12 16:05:37 PRHOST corosync[1299]:   [KNET  ] pmtud: PMTUD link change for host: 3 link: 0 from 469 to 1397
d’ag. 12 16:05:37 PRHOST corosync[1299]:   [KNET  ] pmtud: Global data MTU changed to: 1397
d’ag. 12 16:05:37 PRHOST corosync[1299]:   [QUORUM] Sync members[3]: 1 2 3
d’ag. 12 16:05:37 PRHOST corosync[1299]:   [QUORUM] Sync joined[2]: 2 3
d’ag. 12 16:05:37 PRHOST corosync[1299]:   [TOTEM ] A new membership (1.60c) was formed. Members joined: 2 3
d’ag. 12 16:05:37 PRHOST corosync[1299]:   [QUORUM] This node is within the primary component and will provide service.
d’ag. 12 16:05:37 PRHOST corosync[1299]:   [QUORUM] Members[3]: 1 2 3
d’ag. 12 16:05:37 PRHOST corosync[1299]:   [MAIN  ] Completed service synchronization, ready to provide service.

/etc/pve/corosync.conf file (is the same in all three nodes):

Code:
root@host1:~# cat /etc/pve/corosync.conf
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: PRHOST
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 192.168.1.239
  }
  node {
    name: host0
    nodeid: 3
    quorum_votes: 1
    ring0_addr: 192.168.1.241
  }
  node {
    name: host1
    nodeid: 2
    quorum_votes: 3
    ring0_addr: 192.168.1.240
  }
}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: PRHOSTS
  config_version: 4
  interface {
#    linknumber: 0
    ringnumber: 0
    bindnetaddr: 192.168.1.0
    mcastport: 5405
  }
#  ip_version: ipv4-6
  ip_version: ipv4
  link_mode: active
  secauth: on
  version: 2
}

I added these lines trying as suggestion of ChatGPT:

ringnumber: 0
bindnetaddr: 192.168.1.0
mcastport: 5405

NODE host0 ONLY HAVE enp1s0 and vmbr0 physically connected
NODE host1 ONLY HAVE enp1s0 and vmbr0 physically connected
NODE PRHOST ONLY HAVE eno1 and vmbr0 physically connected
 
Don't trust ChatGPT or any other AI without asking the "have you verified your suggestion?" question. These suggestions here are for the former corosync version, not the current one.
The corosync.conf without these changes seemed correct to me. Whenever you make an update to this file, ensure to increase the value of config_version: by one (at least).
And for the basics:
  • corosync should run on a separate physical network interface as it is very sensitive to latency, jitter etc.
  • reduce the votes of PRHOST to 1, because at the moment, if this node fails, entire cluster has an error state. The other two nodes just don't matter with this configuration.
  • the status output you posted for corosync is from totally different timestamps, making it less useful.
 
  • Like
Reactions: Johannes S and UdoB
Thanks for the vast amount of output :-)

I do not see anything obviously wrong. As @fba noted, "host: 3 has no active links" is from the day before; those messages are gone now? Now everything works?

Take @fba's hint serious, three votes for one node kills the chance to activate the mechanism "Quorum" is made for.

That said... I have to admit that I have had a setup with weighted votes too, some years ago. As long as it is a small cluster with "the Administrator" looking for it continuously and as long as there are not really important services running. (Currently I have so many nodes that "vote=1" is fine for me, even when I shutdown some nodes to save some energy.)

But please, as I have already said: first build it with default settings, be sure its behavior is stable, establish backup procedures, test a restore, test failure scenarios like "disk lost" etc. Then optimize it...

Most important in a Homelab: have fun! :-)
 
  • Like
Reactions: Johannes S and fba
Don't trust ChatGPT or any other AI without asking the "have you verified your suggestion?" question. These suggestions here are for the former corosync version, not the current one.
The corosync.conf without these changes seemed correct to me. Whenever you make an update to this file, ensure to increase the value of config_version: by one (at least).
And for the basics:
  • corosync should run on a separate physical network interface as it is very sensitive to latency, jitter etc.
  • reduce the votes of PRHOST to 1, because at the moment, if this node fails, entire cluster has an error state. The other two nodes just don't matter with this configuration.
  • the status output you posted for corosync is from totally different timestamps, making it less useful.

The output now is the same on host1:

Code:
root@host1:~# systemctl status corosync
● corosync.service - Corosync Cluster Engine
     Loaded: loaded (/lib/systemd/system/corosync.service; enabled; preset: enabled)
     Active: active (running) since Mon 2025-08-11 18:41:03 CEST; 2 days ago
       Docs: man:corosync
             man:corosync.conf
             man:corosync_overview
   Main PID: 103131 (corosync)
      Tasks: 9 (limit: 13951)
     Memory: 132.8M
        CPU: 23min 31.693s
     CGroup: /system.slice/corosync.service
             └─103131 /usr/sbin/corosync -f

d’ag. 11 18:41:03 host1 corosync[103131]:   [KNET  ] host: host: 3 has no active links
d’ag. 11 18:41:03 host1 corosync[103131]:   [KNET  ] host: host: 3 has no active links
d’ag. 11 18:41:35 host1 corosync[103131]:   [KNET  ] host: host: 3 has no active links
d’ag. 11 18:41:35 host1 corosync[103131]:   [KNET  ] host: host: 3 has no active links
d’ag. 11 18:41:37 host1 corosync[103131]:   [KNET  ] host: host: 3 has no active links
d’ag. 11 18:41:42 host1 corosync[103131]:   [KNET  ] host: host: 1 has no active links
d’ag. 11 19:50:54 host1 corosync[103131]:   [KNET  ] host: host: 3 has no active links
d’ag. 11 19:50:58 host1 corosync[103131]:   [KNET  ] host: host: 1 has no active links
d’ag. 12 16:45:37 host1 corosync[103131]:   [KNET  ] host: host: 3 has no active links
d’ag. 12 16:45:42 host1 corosync[103131]:   [KNET  ] host: host: 1 has no active links

And with corosync.conf without changes suggested by ChatGPT, the file looks like:

Code:
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: PRHOST
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 192.168.201.239
  }
  node {
    name: host0
    nodeid: 3
    quorum_votes: 1
    ring0_addr: 192.168.201.241
  }
  node {
    name: host1
    nodeid: 2
    quorum_votes: 3
    ring0_addr: 192.168.201.240
  }
}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: PRHOSTS
  config_version: 5
  interface {
    linknumber: 0
#    ringnumber: 0
#    bindnetaddr: 192.168.201.0
#    mcastport: 5405
  }
  ip_version: ipv4-6
#  ip_version: ipv4
  link_mode: active
  secauth: on
  version: 2
}

Checked that /etc/pve/corosync.conf is synced on three nodes after changes

After corosync service restarted on three nodes:

host1:

Code:
root@host1:~# systemctl status corosync
● corosync.service - Corosync Cluster Engine
     Loaded: loaded (/lib/systemd/system/corosync.service; enabled; preset: enabled)
     Active: active (running) since Wed 2025-08-13 19:18:00 CEST; 2min 1s ago
       Docs: man:corosync
             man:corosync.conf
             man:corosync_overview
   Main PID: 2078145 (corosync)
      Tasks: 9 (limit: 13951)
     Memory: 132.3M
        CPU: 1.851s
     CGroup: /system.slice/corosync.service
             └─2078145 /usr/sbin/corosync -f

d’ag. 13 19:18:00 host1 corosync[2078145]:   [WD    ] resource load_15min missing a recovery key.
d’ag. 13 19:18:00 host1 corosync[2078145]:   [WD    ] resource memory_used missing a recovery key.
d’ag. 13 19:18:00 host1 corosync[2078145]:   [KNET  ] host: host: 1 has no active links
d’ag. 13 19:18:00 host1 corosync[2078145]:   [KNET  ] host: host: 1 has no active links
d’ag. 13 19:18:00 host1 corosync[2078145]:   [KNET  ] host: host: 1 has no active links
d’ag. 13 19:18:00 host1 corosync[2078145]:   [KNET  ] host: host: 3 has no active links
d’ag. 13 19:18:00 host1 corosync[2078145]:   [KNET  ] host: host: 3 has no active links
d’ag. 13 19:18:00 host1 corosync[2078145]:   [KNET  ] host: host: 3 has no active links
d’ag. 13 19:18:00 host1 corosync[2078145]:   [KNET  ] host: host: 2 has no active links
d’ag. 13 19:18:20 host1 corosync[2078145]:   [KNET  ] host: host: 1 has no active links

host0:

Code:
root@host0:~# systemctl status corosync
● corosync.service - Corosync Cluster Engine
     Loaded: loaded (/lib/systemd/system/corosync.service; enabled; preset: enabled)
     Active: active (running) since Wed 2025-08-13 19:18:07 CEST; 2min 23s ago
       Docs: man:corosync
             man:corosync.conf
             man:corosync_overview
   Main PID: 3444 (corosync)
      Tasks: 9 (limit: 18834)
     Memory: 132.3M
        CPU: 3.008s
     CGroup: /system.slice/corosync.service
             └─3444 /usr/sbin/corosync -f

d’ag. 13 19:18:19 host0 corosync[3444]:   [QUORUM] Sync members[2]: 2 3
d’ag. 13 19:18:19 host0 corosync[3444]:   [QUORUM] Sync left[1]: 1
d’ag. 13 19:18:19 host0 corosync[3444]:   [TOTEM ] A new membership (2.632) was formed. Members left: 1
d’ag. 13 19:18:19 host0 corosync[3444]:   [QUORUM] Members[2]: 2 3
d’ag. 13 19:18:19 host0 corosync[3444]:   [MAIN  ] Completed service synchronization, ready to provide service.
d’ag. 13 19:18:22 host0 corosync[3444]:   [QUORUM] Sync members[3]: 1 2 3
d’ag. 13 19:18:22 host0 corosync[3444]:   [QUORUM] Sync joined[1]: 1
d’ag. 13 19:18:22 host0 corosync[3444]:   [TOTEM ] A new membership (1.637) was formed. Members joined: 1
d’ag. 13 19:18:22 host0 corosync[3444]:   [QUORUM] Members[3]: 1 2 3
d’ag. 13 19:18:22 host0 corosync[3444]:   [MAIN  ] Completed service synchronization, ready to provide service.

PRHOST:

Code:
root@PRHOST:~# systemctl status corosync
● corosync.service - Corosync Cluster Engine
     Loaded: loaded (/lib/systemd/system/corosync.service; enabled; preset: enabled)
     Active: active (running) since Wed 2025-08-13 19:18:20 CEST; 2min 38s ago
       Docs: man:corosync
             man:corosync.conf
             man:corosync_overview
   Main PID: 4139 (corosync)
      Tasks: 9 (limit: 9257)
     Memory: 132.1M
        CPU: 2.874s
     CGroup: /system.slice/corosync.service
             └─4139 /usr/sbin/corosync -f

d’ag. 13 19:18:22 PRHOST corosync[4139]:   [KNET  ] host: host: 3 has 1 active links
d’ag. 13 19:18:22 PRHOST corosync[4139]:   [KNET  ] pmtud: PMTUD link change for host: 2 link: 0 from 469 to 1397
d’ag. 13 19:18:22 PRHOST corosync[4139]:   [KNET  ] pmtud: PMTUD link change for host: 3 link: 0 from 469 to 1397
d’ag. 13 19:18:22 PRHOST corosync[4139]:   [KNET  ] pmtud: Global data MTU changed to: 1397
d’ag. 13 19:18:22 PRHOST corosync[4139]:   [QUORUM] Sync members[3]: 1 2 3
d’ag. 13 19:18:22 PRHOST corosync[4139]:   [QUORUM] Sync joined[2]: 2 3
d’ag. 13 19:18:22 PRHOST corosync[4139]:   [TOTEM ] A new membership (1.637) was formed. Members joined: 2 3
d’ag. 13 19:18:22 PRHOST corosync[4139]:   [QUORUM] This node is within the primary component and will provide service.
d’ag. 13 19:18:22 PRHOST corosync[4139]:   [QUORUM] Members[3]: 1 2 3
d’ag. 13 19:18:22 PRHOST corosync[4139]:   [MAIN  ] Completed service synchronization, ready to provide service.

After restarting pve-cluster service, no chages. The same ouputs.
 
Thanks for the vast amount of output :-)

I do not see anything obviously wrong. As @fba noted, "host: 3 has no active links" is from the day before; those messages are gone now? Now everything works?

Take @fba's hint serious, three votes for one node kills the chance to activate the mechanism "Quorum" is made for.

That said... I have to admit that I have had a setup with weighted votes too, some years ago. As long as it is a small cluster with "the Administrator" looking for it continuously and as long as there are not really important services running. (Currently I have so many nodes that "vote=1" is fine for me, even when I shutdown some nodes to save some energy.)

But please, as I have already said: first build it with default settings, be sure its behavior is stable, establish backup procedures, test a restore, test failure scenarios like "disk lost" etc. Then optimize it...

Most important in a Homelab: have fun! :-)
Tried to give one vote to host1:

Code:
root@host1:~# cat /etc/pve/corosync.conf
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: PRHOST
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 192.168.201.239
  }
  node {
    name: host0
    nodeid: 3
    quorum_votes: 1
    ring0_addr: 192.168.201.241
  }
  node {
    name: host1
    nodeid: 2
    quorum_votes: 1
    ring0_addr: 192.168.201.240
  }
}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: PRHOSTS
  config_version: 6
  interface {
    linknumber: 0
#    ringnumber: 0
#    bindnetaddr: 192.168.201.0
#    mcastport: 5405
  }
  ip_version: ipv4-6
#  ip_version: ipv4
  link_mode: active
  secauth: on
  version: 2
}

The outputs:

host1:

Code:
● corosync.service - Corosync Cluster Engine
     Loaded: loaded (/lib/systemd/system/corosync.service; enabled; preset: enabled)
     Active: active (running) since Wed 2025-08-13 19:30:39 CEST; 4min 40s ago
       Docs: man:corosync
             man:corosync.conf
             man:corosync_overview
   Main PID: 2087407 (corosync)
      Tasks: 9 (limit: 13951)
     Memory: 132.5M
        CPU: 3.971s
     CGroup: /system.slice/corosync.service
             └─2087407 /usr/sbin/corosync -f

d’ag. 13 19:30:39 host1 corosync[2087407]:   [WD    ] resource memory_used missing a recovery key.
d’ag. 13 19:30:39 host1 corosync[2087407]:   [KNET  ] host: host: 1 has no active links
d’ag. 13 19:30:39 host1 corosync[2087407]:   [KNET  ] host: host: 1 has no active links
d’ag. 13 19:30:39 host1 corosync[2087407]:   [KNET  ] host: host: 1 has no active links
d’ag. 13 19:30:39 host1 corosync[2087407]:   [KNET  ] host: host: 3 has no active links
d’ag. 13 19:30:39 host1 corosync[2087407]:   [KNET  ] host: host: 3 has no active links
d’ag. 13 19:30:39 host1 corosync[2087407]:   [KNET  ] host: host: 3 has no active links
d’ag. 13 19:30:39 host1 corosync[2087407]:   [KNET  ] host: host: 2 has no active links
d’ag. 13 19:30:45 host1 corosync[2087407]:   [KNET  ] host: host: 3 has no active links
d’ag. 13 19:30:52 host1 corosync[2087407]:   [KNET  ] host: host: 1 has no active links

host0:

Code:
root@host0:~# systemctl status corosync
● corosync.service - Corosync Cluster Engine
     Loaded: loaded (/lib/systemd/system/corosync.service; enabled; preset: enabled)
     Active: active (running) since Wed 2025-08-13 19:30:45 CEST; 5min ago
       Docs: man:corosync
             man:corosync.conf
             man:corosync_overview
   Main PID: 6933 (corosync)
      Tasks: 9 (limit: 18834)
     Memory: 132.2M
        CPU: 5.875s
     CGroup: /system.slice/corosync.service
             └─6933 /usr/sbin/corosync -f

d’ag. 13 19:30:53 host0 corosync[6933]:   [KNET  ] host: host: 1 has 0 active links
d’ag. 13 19:30:53 host0 corosync[6933]:   [KNET  ] host: host: 1 has no active links
d’ag. 13 19:30:55 host0 corosync[6933]:   [KNET  ] link: Resetting MTU for link 0 because host 1 joined
d’ag. 13 19:30:55 host0 corosync[6933]:   [KNET  ] host: host: 1 has 1 active links
d’ag. 13 19:30:55 host0 corosync[6933]:   [QUORUM] Sync members[3]: 1 2 3
d’ag. 13 19:30:55 host0 corosync[6933]:   [QUORUM] Sync joined[1]: 1
d’ag. 13 19:30:55 host0 corosync[6933]:   [TOTEM ] A new membership (1.676) was formed. Members joined: 1
d’ag. 13 19:30:55 host0 corosync[6933]:   [QUORUM] Members[3]: 1 2 3
d’ag. 13 19:30:55 host0 corosync[6933]:   [MAIN  ] Completed service synchronization, ready to provide service.
d’ag. 13 19:30:55 host0 corosync[6933]:   [KNET  ] pmtud: Global data MTU changed to: 1397

PRHOST:

Code:
root@PRHOST:~# systemctl status corosync
● corosync.service - Corosync Cluster Engine
     Loaded: loaded (/lib/systemd/system/corosync.service; enabled; preset: enabled)
     Active: active (running) since Wed 2025-08-13 19:30:53 CEST; 5min ago
       Docs: man:corosync
             man:corosync.conf
             man:corosync_overview
   Main PID: 9689 (corosync)
      Tasks: 9 (limit: 9257)
     Memory: 132.1M
        CPU: 5.484s
     CGroup: /system.slice/corosync.service
             └─9689 /usr/sbin/corosync -f

d’ag. 13 19:30:55 PRHOST corosync[9689]:   [KNET  ] host: host: 3 has 1 active links
d’ag. 13 19:30:55 PRHOST corosync[9689]:   [KNET  ] pmtud: PMTUD link change for host: 2 link: 0 from 469 to 1397
d’ag. 13 19:30:55 PRHOST corosync[9689]:   [KNET  ] pmtud: PMTUD link change for host: 3 link: 0 from 469 to 1397
d’ag. 13 19:30:55 PRHOST corosync[9689]:   [KNET  ] pmtud: Global data MTU changed to: 1397
d’ag. 13 19:30:55 PRHOST corosync[9689]:   [QUORUM] Sync members[3]: 1 2 3
d’ag. 13 19:30:55 PRHOST corosync[9689]:   [QUORUM] Sync joined[2]: 2 3
d’ag. 13 19:30:55 PRHOST corosync[9689]:   [TOTEM ] A new membership (1.676) was formed. Members joined: 2 3
d’ag. 13 19:30:55 PRHOST corosync[9689]:   [QUORUM] This node is within the primary component and will provide service.
d’ag. 13 19:30:55 PRHOST corosync[9689]:   [QUORUM] Members[3]: 1 2 3
d’ag. 13 19:30:55 PRHOST corosync[9689]:   [MAIN  ] Completed service synchronization, ready to provide service.
 
Tried to give one vote to host1:

Code:
root@host1:~# cat /etc/pve/corosync.conf
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: PRHOST
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 192.168.201.239
  }
  node {
    name: host0
    nodeid: 3
    quorum_votes: 1
    ring0_addr: 192.168.201.241
  }
  node {
    name: host1
    nodeid: 2
    quorum_votes: 1
    ring0_addr: 192.168.201.240
  }
}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: PRHOSTS
  config_version: 6
  interface {
    linknumber: 0
#    ringnumber: 0
#    bindnetaddr: 192.168.201.0
#    mcastport: 5405
  }
  ip_version: ipv4-6
#  ip_version: ipv4
  link_mode: active
  secauth: on
  version: 2
}

The outputs:

host1:

Code:
● corosync.service - Corosync Cluster Engine
     Loaded: loaded (/lib/systemd/system/corosync.service; enabled; preset: enabled)
     Active: active (running) since Wed 2025-08-13 19:30:39 CEST; 4min 40s ago
       Docs: man:corosync
             man:corosync.conf
             man:corosync_overview
   Main PID: 2087407 (corosync)
      Tasks: 9 (limit: 13951)
     Memory: 132.5M
        CPU: 3.971s
     CGroup: /system.slice/corosync.service
             └─2087407 /usr/sbin/corosync -f

d’ag. 13 19:30:39 host1 corosync[2087407]:   [WD    ] resource memory_used missing a recovery key.
d’ag. 13 19:30:39 host1 corosync[2087407]:   [KNET  ] host: host: 1 has no active links
d’ag. 13 19:30:39 host1 corosync[2087407]:   [KNET  ] host: host: 1 has no active links
d’ag. 13 19:30:39 host1 corosync[2087407]:   [KNET  ] host: host: 1 has no active links
d’ag. 13 19:30:39 host1 corosync[2087407]:   [KNET  ] host: host: 3 has no active links
d’ag. 13 19:30:39 host1 corosync[2087407]:   [KNET  ] host: host: 3 has no active links
d’ag. 13 19:30:39 host1 corosync[2087407]:   [KNET  ] host: host: 3 has no active links
d’ag. 13 19:30:39 host1 corosync[2087407]:   [KNET  ] host: host: 2 has no active links
d’ag. 13 19:30:45 host1 corosync[2087407]:   [KNET  ] host: host: 3 has no active links
d’ag. 13 19:30:52 host1 corosync[2087407]:   [KNET  ] host: host: 1 has no active links

host0:

Code:
root@host0:~# systemctl status corosync
● corosync.service - Corosync Cluster Engine
     Loaded: loaded (/lib/systemd/system/corosync.service; enabled; preset: enabled)
     Active: active (running) since Wed 2025-08-13 19:30:45 CEST; 5min ago
       Docs: man:corosync
             man:corosync.conf
             man:corosync_overview
   Main PID: 6933 (corosync)
      Tasks: 9 (limit: 18834)
     Memory: 132.2M
        CPU: 5.875s
     CGroup: /system.slice/corosync.service
             └─6933 /usr/sbin/corosync -f

d’ag. 13 19:30:53 host0 corosync[6933]:   [KNET  ] host: host: 1 has 0 active links
d’ag. 13 19:30:53 host0 corosync[6933]:   [KNET  ] host: host: 1 has no active links
d’ag. 13 19:30:55 host0 corosync[6933]:   [KNET  ] link: Resetting MTU for link 0 because host 1 joined
d’ag. 13 19:30:55 host0 corosync[6933]:   [KNET  ] host: host: 1 has 1 active links
d’ag. 13 19:30:55 host0 corosync[6933]:   [QUORUM] Sync members[3]: 1 2 3
d’ag. 13 19:30:55 host0 corosync[6933]:   [QUORUM] Sync joined[1]: 1
d’ag. 13 19:30:55 host0 corosync[6933]:   [TOTEM ] A new membership (1.676) was formed. Members joined: 1
d’ag. 13 19:30:55 host0 corosync[6933]:   [QUORUM] Members[3]: 1 2 3
d’ag. 13 19:30:55 host0 corosync[6933]:   [MAIN  ] Completed service synchronization, ready to provide service.
d’ag. 13 19:30:55 host0 corosync[6933]:   [KNET  ] pmtud: Global data MTU changed to: 1397

PRHOST:

Code:
root@PRHOST:~# systemctl status corosync
● corosync.service - Corosync Cluster Engine
     Loaded: loaded (/lib/systemd/system/corosync.service; enabled; preset: enabled)
     Active: active (running) since Wed 2025-08-13 19:30:53 CEST; 5min ago
       Docs: man:corosync
             man:corosync.conf
             man:corosync_overview
   Main PID: 9689 (corosync)
      Tasks: 9 (limit: 9257)
     Memory: 132.1M
        CPU: 5.484s
     CGroup: /system.slice/corosync.service
             └─9689 /usr/sbin/corosync -f

d’ag. 13 19:30:55 PRHOST corosync[9689]:   [KNET  ] host: host: 3 has 1 active links
d’ag. 13 19:30:55 PRHOST corosync[9689]:   [KNET  ] pmtud: PMTUD link change for host: 2 link: 0 from 469 to 1397
d’ag. 13 19:30:55 PRHOST corosync[9689]:   [KNET  ] pmtud: PMTUD link change for host: 3 link: 0 from 469 to 1397
d’ag. 13 19:30:55 PRHOST corosync[9689]:   [KNET  ] pmtud: Global data MTU changed to: 1397
d’ag. 13 19:30:55 PRHOST corosync[9689]:   [QUORUM] Sync members[3]: 1 2 3
d’ag. 13 19:30:55 PRHOST corosync[9689]:   [QUORUM] Sync joined[2]: 2 3
d’ag. 13 19:30:55 PRHOST corosync[9689]:   [TOTEM ] A new membership (1.676) was formed. Members joined: 2 3
d’ag. 13 19:30:55 PRHOST corosync[9689]:   [QUORUM] This node is within the primary component and will provide service.
d’ag. 13 19:30:55 PRHOST corosync[9689]:   [QUORUM] Members[3]: 1 2 3
d’ag. 13 19:30:55 PRHOST corosync[9689]:   [MAIN  ] Completed service synchronization, ready to provide service.
I forgot to change link_mode: active (suggested by ChatGPT) to link_mode: passive again.

The outputs:

Code:
root@host1:~# systemctl status corosync
● corosync.service - Corosync Cluster Engine
     Loaded: loaded (/lib/systemd/system/corosync.service; enabled; preset: enabled)
     Active: active (running) since Wed 2025-08-13 19:43:18 CEST; 44s ago
       Docs: man:corosync
             man:corosync.conf
             man:corosync_overview
   Main PID: 2096284 (corosync)
      Tasks: 9 (limit: 13951)
     Memory: 132.4M
        CPU: 750ms
     CGroup: /system.slice/corosync.service
             └─2096284 /usr/sbin/corosync -f

d’ag. 13 19:43:18 host1 corosync[2096284]:   [WD    ] Watchdog not enabled by configuration
d’ag. 13 19:43:18 host1 corosync[2096284]:   [WD    ] resource load_15min missing a recovery key.
d’ag. 13 19:43:18 host1 corosync[2096284]:   [WD    ] resource memory_used missing a recovery key.
d’ag. 13 19:43:18 host1 corosync[2096284]:   [KNET  ] host: host: 1 has no active links
d’ag. 13 19:43:18 host1 corosync[2096284]:   [KNET  ] host: host: 1 has no active links
d’ag. 13 19:43:18 host1 corosync[2096284]:   [KNET  ] host: host: 1 has no active links
d’ag. 13 19:43:18 host1 corosync[2096284]:   [KNET  ] host: host: 3 has no active links
d’ag. 13 19:43:18 host1 corosync[2096284]:   [KNET  ] host: host: 3 has no active links
d’ag. 13 19:43:18 host1 corosync[2096284]:   [KNET  ] host: host: 3 has no active links

Code:
root@host0:~# systemctl status corosync
● corosync.service - Corosync Cluster Engine
     Loaded: loaded (/lib/systemd/system/corosync.service; enabled; preset: enabled)
     Active: active (running) since Wed 2025-08-13 19:43:03 CEST; 2min 14s ago
       Docs: man:corosync
             man:corosync.conf
             man:corosync_overview
   Main PID: 9920 (corosync)
      Tasks: 9 (limit: 18834)
     Memory: 132.5M
        CPU: 2.939s
     CGroup: /system.slice/corosync.service
             └─9920 /usr/sbin/corosync -f

d’ag. 13 19:43:17 host0 corosync[9920]:   [KNET  ] host: host: 2 (passive) best link: 0 (pri: 1)
d’ag. 13 19:43:17 host0 corosync[9920]:   [KNET  ] host: host: 2 has no active links
d’ag. 13 19:43:20 host0 corosync[9920]:   [KNET  ] link: Resetting MTU for link 0 because host 2 joined
d’ag. 13 19:43:20 host0 corosync[9920]:   [KNET  ] host: host: 2 (passive) best link: 0 (pri: 1)
d’ag. 13 19:43:20 host0 corosync[9920]:   [QUORUM] Sync members[3]: 1 2 3
d’ag. 13 19:43:20 host0 corosync[9920]:   [QUORUM] Sync joined[1]: 2
d’ag. 13 19:43:20 host0 corosync[9920]:   [TOTEM ] A new membership (1.69a) was formed. Members joined: 2
d’ag. 13 19:43:20 host0 corosync[9920]:   [QUORUM] Members[3]: 1 2 3
d’ag. 13 19:43:20 host0 corosync[9920]:   [MAIN  ] Completed service synchronization, ready to provide service.
d’ag. 13 19:43:20 host0 corosync[9920]:   [KNET  ] pmtud: Global data MTU changed to: 1397

Code:
root@PRHOST:~# systemctl status corosync
● corosync.service - Corosync Cluster Engine
     Loaded: loaded (/lib/systemd/system/corosync.service; enabled; preset: enabled)
     Active: active (running) since Wed 2025-08-13 19:43:09 CEST; 2min 31s ago
       Docs: man:corosync
             man:corosync.conf
             man:corosync_overview
   Main PID: 15195 (corosync)
      Tasks: 9 (limit: 9257)
     Memory: 132.4M
        CPU: 2.748s
     CGroup: /system.slice/corosync.service
             └─15195 /usr/sbin/corosync -f

d’ag. 13 19:43:17 PRHOST corosync[15195]:   [KNET  ] host: host: 2 (passive) best link: 0 (pri: 1)
d’ag. 13 19:43:17 PRHOST corosync[15195]:   [KNET  ] host: host: 2 has no active links
d’ag. 13 19:43:20 PRHOST corosync[15195]:   [KNET  ] link: Resetting MTU for link 0 because host 2 joined
d’ag. 13 19:43:20 PRHOST corosync[15195]:   [KNET  ] host: host: 2 (passive) best link: 0 (pri: 1)
d’ag. 13 19:43:20 PRHOST corosync[15195]:   [QUORUM] Sync members[3]: 1 2 3
d’ag. 13 19:43:20 PRHOST corosync[15195]:   [QUORUM] Sync joined[1]: 2
d’ag. 13 19:43:20 PRHOST corosync[15195]:   [TOTEM ] A new membership (1.69a) was formed. Members joined: 2
d’ag. 13 19:43:20 PRHOST corosync[15195]:   [QUORUM] Members[3]: 1 2 3
d’ag. 13 19:43:20 PRHOST corosync[15195]:   [MAIN  ] Completed service synchronization, ready to provide service.
d’ag. 13 19:43:20 PRHOST corosync[15195]:   [KNET  ] pmtud: Global data MTU changed to: 1397
 
Ok, so when running pvecm status or corosync-cfgtool -n e. g. on host1, what's the output now?
Now the outputs are:

Code:
root@host1:~# systemctl status corosync
● corosync.service - Corosync Cluster Engine
     Loaded: loaded (/lib/systemd/system/corosync.service; enabled; preset: enabled)
     Active: active (running) since Wed 2025-08-13 22:17:01 CEST; 6s ago
       Docs: man:corosync
             man:corosync.conf
             man:corosync_overview
   Main PID: 2200684 (corosync)
      Tasks: 9 (limit: 13951)
     Memory: 132.1M
        CPU: 176ms
     CGroup: /system.slice/corosync.service
             └─2200684 /usr/sbin/corosync -f

d’ag. 13 22:17:01 host1 corosync[2200684]:   [WD    ] Watchdog not enabled by configuration
d’ag. 13 22:17:01 host1 corosync[2200684]:   [WD    ] resource load_15min missing a recovery key.
d’ag. 13 22:17:01 host1 corosync[2200684]:   [WD    ] resource memory_used missing a recovery key.
d’ag. 13 22:17:01 host1 corosync[2200684]:   [KNET  ] host: host: 1 has no active links
d’ag. 13 22:17:01 host1 corosync[2200684]:   [KNET  ] host: host: 1 has no active links
d’ag. 13 22:17:01 host1 corosync[2200684]:   [KNET  ] host: host: 1 has no active links
d’ag. 13 22:17:01 host1 corosync[2200684]:   [KNET  ] host: host: 3 has no active links
d’ag. 13 22:17:01 host1 corosync[2200684]:   [KNET  ] host: host: 3 has no active links
d’ag. 13 22:17:01 host1 corosync[2200684]:   [KNET  ] host: host: 3 has no active links

Code:
root@host0:~# systemctl status corosync
● corosync.service - Corosync Cluster Engine
     Loaded: loaded (/lib/systemd/system/corosync.service; enabled; preset: enabled)
     Active: active (running) since Wed 2025-08-13 22:14:25 CEST; 3min 9s ago
       Docs: man:corosync
             man:corosync.conf
             man:corosync_overview
   Main PID: 875 (corosync)
      Tasks: 9 (limit: 18834)
     Memory: 135.4M
        CPU: 2.572s
     CGroup: /system.slice/corosync.service
             └─875 /usr/sbin/corosync -f

d’ag. 13 22:17:00 host0 corosync[875]:   [QUORUM] Sync members[2]: 1 3
d’ag. 13 22:17:00 host0 corosync[875]:   [QUORUM] Sync left[1]: 2
d’ag. 13 22:17:00 host0 corosync[875]:   [TOTEM ] A new membership (1.6ae) was formed. Members left: 2
d’ag. 13 22:17:00 host0 corosync[875]:   [QUORUM] Members[2]: 1 3
d’ag. 13 22:17:00 host0 corosync[875]:   [MAIN  ] Completed service synchronization, ready to provide service.
d’ag. 13 22:17:04 host0 corosync[875]:   [QUORUM] Sync members[3]: 1 2 3
d’ag. 13 22:17:04 host0 corosync[875]:   [QUORUM] Sync joined[1]: 2
d’ag. 13 22:17:04 host0 corosync[875]:   [TOTEM ] A new membership (1.6b3) was formed. Members joined: 2
d’ag. 13 22:17:04 host0 corosync[875]:   [QUORUM] Members[3]: 1 2 3
d’ag. 13 22:17:04 host0 corosync[875]:   [MAIN  ] Completed service synchronization, ready to provide service.

Code:
root@PRHOST:~# systemctl status corosync
● corosync.service - Corosync Cluster Engine
     Loaded: loaded (/lib/systemd/system/corosync.service; enabled; preset: enabled)
     Active: active (running) since Wed 2025-08-13 22:16:38 CEST; 1min 29s ago
       Docs: man:corosync
             man:corosync.conf
             man:corosync_overview
   Main PID: 1252 (corosync)
      Tasks: 9 (limit: 9257)
     Memory: 141.1M
        CPU: 1.160s
     CGroup: /system.slice/corosync.service
             └─1252 /usr/sbin/corosync -f

d’ag. 13 22:17:01 PRHOST corosync[1252]:   [KNET  ] host: host: 2 (passive) best link: 0 (pri: 1)
d’ag. 13 22:17:01 PRHOST corosync[1252]:   [KNET  ] host: host: 2 has no active links
d’ag. 13 22:17:04 PRHOST corosync[1252]:   [KNET  ] link: Resetting MTU for link 0 because host 2 joined
d’ag. 13 22:17:04 PRHOST corosync[1252]:   [KNET  ] host: host: 2 (passive) best link: 0 (pri: 1)
d’ag. 13 22:17:04 PRHOST corosync[1252]:   [QUORUM] Sync members[3]: 1 2 3
d’ag. 13 22:17:04 PRHOST corosync[1252]:   [QUORUM] Sync joined[1]: 2
d’ag. 13 22:17:04 PRHOST corosync[1252]:   [TOTEM ] A new membership (1.6b3) was formed. Members joined: 2
d’ag. 13 22:17:04 PRHOST corosync[1252]:   [QUORUM] Members[3]: 1 2 3
d’ag. 13 22:17:04 PRHOST corosync[1252]:   [MAIN  ] Completed service synchronization, ready to provide service.
d’ag. 13 22:17:04 PRHOST corosync[1252]:   [KNET  ] pmtud: Global data MTU changed to: 1397

And

Code:
root@host1:~# pvecm status
Cluster information
-------------------
Name:             PRHOSTS
Config Version:   7
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Wed Aug 13 22:18:37 2025
Quorum provider:  corosync_votequorum
Nodes:            3
Node ID:          0x00000002
Ring ID:          1.6b3
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   3
Highest expected: 3
Total votes:      3
Quorum:           2 
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 192.168.201.239
0x00000002          1 192.168.201.240 (local)
0x00000003          1 192.168.201.241

Code:
root@host1:~# corosync-cfgtool -n
Local node ID 2, transport knet
nodeid: 1 reachable
   LINK: 0 udp (192.168.201.240->192.168.201.239) enabled connected mtu: 1397

nodeid: 3 reachable
   LINK: 0 udp (192.168.201.240->192.168.201.241) enabled connected mtu: 1397
 
Code:
root@host1:~# systemctl status corosync
● corosync.service - Corosync Cluster Engine
     Loaded: loaded (/lib/systemd/system/corosync.service; enabled; preset: enabled)
     Active: active (running) since Wed 2025-08-13 22:17:01 CEST; 6s ago
       Docs: man:corosync
             man:corosync.conf
             man:corosync_overview
   Main PID: 2200684 (corosync)
      Tasks: 9 (limit: 13951)
     Memory: 132.1M
        CPU: 176ms
     CGroup: /system.slice/corosync.service
             └─2200684 /usr/sbin/corosync -f

d’ag. 13 22:17:01 host1 corosync[2200684]:   [WD    ] Watchdog not enabled by configuration
d’ag. 13 22:17:01 host1 corosync[2200684]:   [WD    ] resource load_15min missing a recovery key.
d’ag. 13 22:17:01 host1 corosync[2200684]:   [WD    ] resource memory_used missing a recovery key.
d’ag. 13 22:17:01 host1 corosync[2200684]:   [KNET  ] host: host: 1 has no active links
d’ag. 13 22:17:01 host1 corosync[2200684]:   [KNET  ] host: host: 1 has no active links
d’ag. 13 22:17:01 host1 corosync[2200684]:   [KNET  ] host: host: 1 has no active links
d’ag. 13 22:17:01 host1 corosync[2200684]:   [KNET  ] host: host: 3 has no active links
d’ag. 13 22:17:01 host1 corosync[2200684]:   [KNET  ] host: host: 3 has no active links
d’ag. 13 22:17:01 host1 corosync[2200684]:   [KNET  ] host: host: 3 has no active links
These messages are from service start (same second) and as the corosync is connected later on, I wouldn't worry at the moment.


[..]in one of the restarts of the host that is 24/7 it did not start.
Would you like to elaborate a bit more on the original problem? I understood, that after restarting a node, something went wrong.
 
These messages are from service start (same second) and as the corosync is connected later on, I wouldn't worry at the moment.



Would you like to elaborate a bit more on the original problem? I understood, that after restarting a node, something went wrong.
The first host to work was host1, it is a Chuwi LarkBox X that is currently the only home automation server. This equipment is on 24/7.

I also have an HPE Proliant ML30 Plus Gen10 (PRHOST) but with mechanical disks (it was given to me by a person who migrated to Cloud services). It is configured, I tested with some VMs and containers but it is only for backup.

The last to be incorporated was host0. In this case it is this equipment (https://www.amazon.com/dp/B0DSVXQHV5). The intention is to replace the ISP's Fiber Optic router with this equipment + ONT but I don't have the ONT yet. I have OPNsense as a VM but nothing else. Therefore it is added to the cluster but I don't have it on 24/7 at the moment.

When I had host1 + PRHOST and had 2 votes for host1 everything worked. When I added host0 and kept 2 quorum votes for host1, on a reboot of host1 it was stuck waiting for quorum. This is where I looked again for the corosync.conf file (I remembered that I had modified /etc/corosync/corosync.conf but finally I saw that /etc/pve/corosync.conf needs to be modified and this one overwrites the other one). And this is where I detect all these problems by searching for commands before finding that I was modifying the wrong file.
 
Last edited:
What I don't understand is, why there is a cluster at all, if you do not migrate vm/lxc or use HA.
As the network setup doesn't meet the suggested requirement anyway (no separate corosync links).
When I had host1 + PRHOST and had 2 votes for host1 everything worked. When I added host0 and kept 2 quorum votes for host1, on a reboot of host1 it was stuck waiting for quorum.
In situation like this, check network connectivity first (ip a) , do simple test with ping and then use the corosync-cfgtool -n to have a look at corosync, used IPs and it's connections. Corosync uses port 5405 udp by default, so you could even look at the traffic with tcpdump -i any -n -v -s0 port 5405 to monitor if traffic is sent/received.
 
What I don't understand is, why there is a cluster at all, if you do not migrate vm/lxc or use HA.
As the network setup doesn't meet the suggested requirement anyway (no separate corosync links).

In situation like this, check network connectivity first (ip a) , do simple test with ping and then use the corosync-cfgtool -n to have a look at corosync, used IPs and it's connections. Corosync uses port 5405 udp by default, so you could even look at the traffic with tcpdump -i any -n -v -s0 port 5405 to monitor if traffic is sent/received.
With host1 and PRHOST (host0 off) I have:

Code:
root@host1:~# systemctl status corosync
● corosync.service - Corosync Cluster Engine
     Loaded: loaded (/lib/systemd/system/corosync.service; enabled; preset: enabled)
     Active: active (running) since Fri 2025-08-15 16:59:51 CEST; 1 day 1h ago
       Docs: man:corosync
             man:corosync.conf
             man:corosync_overview
   Main PID: 3929487 (corosync)
      Tasks: 9 (limit: 13951)
     Memory: 132.3M
        CPU: 11min 52.134s
     CGroup: /system.slice/corosync.service
             └─3929487 /usr/sbin/corosync -f

d’ag. 15 16:59:51 host1 corosync[3929487]:   [WD    ] resource load_15min missing a recovery key.
d’ag. 15 16:59:51 host1 corosync[3929487]:   [WD    ] resource memory_used missing a recovery key.
d’ag. 15 16:59:51 host1 corosync[3929487]:   [KNET  ] host: host: 1 has no active links
d’ag. 15 16:59:51 host1 corosync[3929487]:   [KNET  ] host: host: 1 has no active links
d’ag. 15 16:59:51 host1 corosync[3929487]:   [KNET  ] host: host: 1 has no active links
d’ag. 15 16:59:51 host1 corosync[3929487]:   [KNET  ] host: host: 3 has no active links
d’ag. 15 16:59:51 host1 corosync[3929487]:   [KNET  ] host: host: 3 has no active links
d’ag. 15 16:59:51 host1 corosync[3929487]:   [KNET  ] host: host: 3 has no active links
d’ag. 15 17:03:54 host1 corosync[3929487]:   [KNET  ] host: host: 3 has no active links
d’ag. 15 17:05:40 host1 corosync[3929487]:   [KNET  ] host: host: 1 has no active links

Code:
root@host1:~# corosync-cfgtool -n
Local node ID 2, transport knet
nodeid: 1 reachable
   LINK: 0 udp (192.168.1.240->192.168.1.239) enabled connected mtu: 1397

Code:
root@host1:~# tcpdump -i any -n -v -s0 port 5405
18:24:21.545148 enp1s0 In  IP (tos 0x10, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 156)
    192.168.1.239.5405 > 192.168.1.240.5405: UDP, length 128
18:24:21.545148 vmbr0 In  IP (tos 0x10, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 156)
    192.168.1.239.5405 > 192.168.1.240.5405: UDP, length 128
18:24:21.545317 vmbr0 Out IP (tos 0x10, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 156)
    192.168.1.240.5405 > 192.168.1.239.5405: UDP, length 128
18:24:21.545322 enp1s0 Out IP (tos 0x10, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 156)
    192.168.1.240.5405 > 192.168.1.239.5405: UDP, length 128
18:24:21.745881 enp1s0 In  IP (tos 0x10, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 124)
    192.168.1.239.5405 > 192.168.1.240.5405: UDP, length 96
18:24:21.745881 vmbr0 In  IP (tos 0x10, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 124)
    192.168.1.239.5405 > 192.168.1.240.5405: UDP, length 96
18:24:21.837018 vmbr0 Out IP (tos 0x10, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 108)
    192.168.1.240.5405 > 192.168.1.239.5405: UDP, length 80
18:24:21.837022 enp1s0 Out IP (tos 0x10, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 108)
    192.168.1.240.5405 > 192.168.1.239.5405: UDP, length 80
18:24:21.837350 enp1s0 In  IP (tos 0x10, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 108)
    192.168.1.239.5405 > 192.168.1.240.5405: UDP, length 80
18:24:21.837350 vmbr0 In  IP (tos 0x10, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 108)
    192.168.1.239.5405 > 192.168.1.240.5405: UDP, length 80
18:24:21.984096 enp1s0 In  IP (tos 0x10, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 108)
    192.168.1.239.5405 > 192.168.1.240.5405: UDP, length 80
18:24:21.984096 vmbr0 In  IP (tos 0x10, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 108)
    192.168.1.239.5405 > 192.168.1.240.5405: UDP, length 80
18:24:21.984288 vmbr0 Out IP (tos 0x10, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 108)
    192.168.1.240.5405 > 192.168.1.239.5405: UDP, length 80
18:24:21.984297 enp1s0 Out IP (tos 0x10, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 108)
    192.168.1.240.5405 > 192.168.1.239.5405: UDP, length 80
18:24:22.127578 vmbr0 Out IP (tos 0x10, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 124)
    192.168.1.240.5405 > 192.168.1.239.5405: UDP, length 96
18:24:22.127591 enp1s0 Out IP (tos 0x10, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 124)
    192.168.1.240.5405 > 192.168.1.239.5405: UDP, length 96
18:24:22.128260 enp1s0 In  IP (tos 0x10, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 156)
    192.168.1.239.5405 > 192.168.1.240.5405: UDP, length 128
18:24:22.128260 vmbr0 In  IP (tos 0x10, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 156)
    192.168.1.239.5405 > 192.168.1.240.5405: UDP, length 128
18:24:22.128493 vmbr0 Out IP (tos 0x10, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 1500)
    192.168.1.240.5405 > 192.168.1.239.5405: UDP, length 1472
18:24:22.128499 enp1s0 Out IP (tos 0x10, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 1500)
    192.168.1.240.5405 > 192.168.1.239.5405: UDP, length 1472
18:24:22.128509 vmbr0 Out IP (tos 0x10, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 780)
    192.168.1.240.5405 > 192.168.1.239.5405: UDP, length 752
18:24:22.128510 enp1s0 Out IP (tos 0x10, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 780)
    192.168.1.240.5405 > 192.168.1.239.5405: UDP, length 752
18:24:22.128554 vmbr0 Out IP (tos 0x10, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 156)
    192.168.1.240.5405 > 192.168.1.239.5405: UDP, length 128
18:24:22.128556 enp1s0 Out IP (tos 0x10, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 156)
    192.168.1.240.5405 > 192.168.1.239.5405: UDP, length 128
18:24:22.129297 enp1s0 In  IP (tos 0x10, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 156)
    192.168.1.239.5405 > 192.168.1.240.5405: UDP, length 128
18:24:22.129297 vmbr0 In  IP (tos 0x10, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 156)
    192.168.1.239.5405 > 192.168.1.240.5405: UDP, length 128
18:24:22.129460 vmbr0 Out IP (tos 0x10, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 156)
    192.168.1.240.5405 > 192.168.1.239.5405: UDP, length 128
 
Code:
root@host1:~# corosync-cfgtool -n
Local node ID 2, transport knet
nodeid: 1 reachable
   LINK: 0 udp (192.168.1.240->192.168.1.239) enabled connected mtu: 1397
Your corosync link seems to work as expected between "host 2" aka host1 and "host 1" aka PRHOST.

btw. there is no need to post the same output of root@host1:~# systemctl status corosync again as there is no change in the output.
 
Your corosync link seems to work as expected between "host 2" aka host1 and "host 1" aka PRHOST.

btw. there is no need to post the same output of root@host1:~# systemctl status corosync again as there is no change in the output.
Maybe trying to resolve this will help?

Code:
d’ag. 15 16:59:51 host1 corosync[3929487]:   [WD    ] resource load_15min missing a recovery key.
d’ag. 15 16:59:51 host1 corosync[3929487]:   [WD    ] resource memory_used missing a recovery key.
 
Code:
d’ag. 15 16:59:51 host1 corosync[3929487]:   [WD    ] resource load_15min missing a recovery key.
d’ag. 15 16:59:51 host1 corosync[3929487]:   [WD    ] resource memory_used missing a recovery key.
this is the watchdog of corosync complaining about missing action definition for the resources load_15min and memory_used. If no action is defined, nothing will happen if a treshold is reached. This is just a warning.

On the hosts1, what's the output of
systemctl cat corosync
echo $COROSYNC_OPTIONS
corosync-cmapctl | grep -i resources
 
Last edited: