How to re-include a node in cluster?

chupacabra

New Member
Apr 9, 2023
29
0
1
i have a 3 node cluster that was working just fine. i added some NICs to the servers and was going to try to move the main interface for the cluster to a bond. i started on node 1 and screwed up the configuration, so the node got booted from the cluster. i corrected the network config, but the node still remains isolated and i can't figure out the correct steps to bring him back in. AS part of my first attempt at moving interfaces i did mess around with the corosync.conf, which is what i think is messing up the cluster. I can ping every node by name and IP from every other node, so i know the networking is fine.

On node 1 here are some of the details
corosync.conf
Code:
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: pve01
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 10.10.100.100
  }
  node {
    name: pve02
    nodeid: 2
    quorum_votes: 1
    ring0_addr: 10.10.100.105
  }
  node {
    name: pve03
    nodeid: 4
    quorum_votes: 1
    ring0_addr: 10.10.100.110
  }
}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: proxmox
  config_version: 5
  interface {
    linknumber: 0
  }
  ip_version: ipv4-6
  link_mode: passive
  secauth: on
  version: 2
}

node 1 network info:
Code:
auto lo
iface lo inet loopback

iface enp0s31f6 inet manual
#NiC on motherboard

iface wlo1 inet manual

auto enp1s0
iface enp1s0 inet manual
#10GbE Card Port 0

auto enp1s0d1
iface enp1s0d1 inet manual
#10GbE Card Port 1

auto enp6s0f0
iface enp6s0f0 inet manual
#1GbE Card Port 0

auto enp6s0f1
iface enp6s0f1 inet manual
#1GbE Card Port 1

auto bond0
iface bond0 inet manual
        bond-slaves enp6s0f0 enp6s0f1
        bond-miimon 100
        bond-mode 802.3ad
#       bond-xmit-hash-policy layer2+3
#       bond-downdelay 200
#       bond-updelay 200
#       bond-lacp-rate 1
#LACP of 1GbE Card Ports

auto vmbr0
iface vmbr0 inet manual
        address 10.10.100.100/24
        gateway 10.10.100.1
        bridge-ports enp0s31f6
        bridge-stp off
        bridge-fd 0
#Bridge on motherboard NIC

auto vmbr1
iface vmbr1 inet static
        bridge-ports bond0
        bridge-stp off
        bridge-fd 0
        bridge-vlan-aware yes
        bridge-vids 2-4094
#Bridge on bond0

auto vmbr2
iface vmbr2 inet manual
        bridge-ports enp1s0
        bridge-stp off
        bridge-fd 0
        bridge-vlan-aware yes
        bridge-vids 2-4094
#Bridge for 10GbE Ports

On the other nodes:
Code:
Cluster information
-------------------
Name:             proxmox
Config Version:   5
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Wed Apr 12 09:23:00 2023
Quorum provider:  corosync_votequorum
Nodes:            1
Node ID:          0x00000001
Ring ID:          1.9e1
Quorate:          No

Votequorum information
----------------------
Expected votes:   3
Highest expected: 3
Total votes:      1
Quorum:           2 Activity blocked
Flags:           

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 10.10.100.100 (local)

on the other nodes the pvecm status shows as:
Code:
Cluster information
-------------------
Name:             proxmox
Config Version:   5
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Wed Apr 12 08:42:06 2023
Quorum provider:  corosync_votequorum
Nodes:            2
Node ID:          0x00000004
Ring ID:          2.9d9
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   3
Highest expected: 3
Total votes:      2
Quorum:           2
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000002          1 10.10.100.105
0x00000004          1 10.10.100.110 (local)

network interfaces on other nodes:
Code:
auto lo
iface lo inet loopback

iface enp0s31f6 inet manual
#NiC on motherboard

auto enp6s0f0
iface enp6s0f0 inet manual
#1GbE Card Port 0

auto enp6s0f1
iface enp6s0f1 inet manual
#1GbE Card Port 1

auto enp1s0
iface enp1s0 inet manual
#10GbE Card Port 0

auto enp1s0d1
iface enp1s0d1 inet manual
#10GbE Card Port 1

auto bond0
iface bond0 inet manual
        bond-slaves enp6s0f0 enp6s0f1
        bond-miimon 100
        bond-mode 802.3ad
#LACP of 1GbE Card Ports

auto vmbr0
iface vmbr0 inet static
        address 10.10.100.105/24
        gateway 10.10.100.1
        bridge-ports enp0s31f6
        bridge-stp off
        bridge-fd 0
#Bridge on motherboard NIC

auto vmbr1
iface vmbr1 inet manual
        bridge-ports bond0
        bridge-stp off
        bridge-fd 0
        bridge-vlan-aware yes
        bridge-vids 2-4094
#Bridge on bond0

auto vmbr2
iface vmbr2 inet manual
        bridge-ports enp1s0d1
        bridge-stp off
        bridge-fd 0
        bridge-vlan-aware yes
        bridge-vids 2-4094
#Bridge on 10Gb port

I am pretty sure all i need to do is edit the corosync.conf file on node 1, and increase the config_version. i tried that, but it still didn't join the cluster. i am sure i must have the order of steps wrong, but now i don't want to mess things up even further
 
Last edited:
Hi,
please post the output of systemctl status corosync.service pve-cluster.service as well as journalctl -b -u corosync.service -u pve-cluster.service
 
stauts of the service:
Code:
● corosync.service - Corosync Cluster Engine
     Loaded: loaded (/lib/systemd/system/corosync.service; enabled; vendor preset: enabled)
     Active: active (running) since Wed 2023-04-12 09:04:22 EDT; 3h 24min ago
       Docs: man:corosync
             man:corosync.conf
             man:corosync_overview
   Main PID: 4200 (corosync)
      Tasks: 9 (limit: 76891)
     Memory: 131.6M
        CPU: 40.517s
     CGroup: /system.slice/corosync.service
             └─4200 /usr/sbin/corosync -f

Apr 12 09:04:22 pve01 corosync[4200]:   [KNET  ] host: host: 4 (passive) best link: 0 (pri: 1)
Apr 12 09:04:22 pve01 corosync[4200]:   [KNET  ] host: host: 4 has no active links
Apr 12 09:04:22 pve01 corosync[4200]:   [KNET  ] host: host: 4 (passive) best link: 0 (pri: 1)
Apr 12 09:04:22 pve01 corosync[4200]:   [KNET  ] host: host: 4 has no active links
Apr 12 09:04:22 pve01 corosync[4200]:   [QUORUM] Sync members[1]: 1
Apr 12 09:04:22 pve01 corosync[4200]:   [QUORUM] Sync joined[1]: 1
Apr 12 09:04:22 pve01 corosync[4200]:   [TOTEM ] A new membership (1.9e1) was formed. Members joined: 1
Apr 12 09:04:22 pve01 corosync[4200]:   [QUORUM] Members[1]: 1
Apr 12 09:04:22 pve01 systemd[1]: Started Corosync Cluster Engine.
Apr 12 09:04:22 pve01 corosync[4200]:   [MAIN  ] Completed service synchronization, ready to provide service.

● pve-cluster.service - The Proxmox VE cluster filesystem
     Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled; vendor preset: enabled)
     Active: active (running) since Wed 2023-04-12 09:04:30 EDT; 3h 24min ago
    Process: 4212 ExecStart=/usr/bin/pmxcfs (code=exited, status=0/SUCCESS)
   Main PID: 4213 (pmxcfs)
      Tasks: 6 (limit: 76891)
     Memory: 40.5M
        CPU: 4.365s
     CGroup: /system.slice/pve-cluster.service
             └─4213 /usr/bin/pmxcfs

Apr 12 09:04:29 pve01 pmxcfs[4212]: [dcdb] crit: local corosync.conf is newer
Apr 12 09:04:29 pve01 pmxcfs[4213]: [status] notice: update cluster info (cluster name  proxmox, version = 15)
Apr 12 09:04:29 pve01 pmxcfs[4213]: [dcdb] notice: members: 1/4213
Apr 12 09:04:29 pve01 pmxcfs[4213]: [dcdb] notice: all data is up to date
Apr 12 09:04:29 pve01 pmxcfs[4213]: [status] notice: members: 1/4213
Apr 12 09:04:29 pve01 pmxcfs[4213]: [status] notice: all data is up to date
Apr 12 09:04:30 pve01 systemd[1]: Started The Proxmox VE cluster filesystem.
Apr 12 10:04:29 pve01 pmxcfs[4213]: [dcdb] notice: data verification successful
Apr 12 11:04:29 pve01 pmxcfs[4213]: [dcdb] notice: data verification successful
Apr 12 12:04:29 pve01 pmxcfs[4213]: [dcdb] notice: data verification successful
 
Last edited:
journalctl output
Code:
root@pve01:~# journalctl -b -u corosync.service -u pve-cluster.service
-- Journal begins at Fri 2023-01-20 08:48:55 EST, ends at Wed 2023-04-12 12:31:50 EDT. --
Apr 12 08:53:57 pve01 systemd[1]: Starting The Proxmox VE cluster filesystem...
Apr 12 08:53:57 pve01 pmxcfs[1924]: [dcdb] crit: local corosync.conf is newer
Apr 12 08:53:57 pve01 pmxcfs[1924]: [dcdb] crit: local corosync.conf is newer
Apr 12 08:53:57 pve01 pmxcfs[1941]: [quorum] crit: quorum_initialize failed: 2
Apr 12 08:53:57 pve01 pmxcfs[1941]: [quorum] crit: can't initialize service
Apr 12 08:53:57 pve01 pmxcfs[1941]: [confdb] crit: cmap_initialize failed: 2
Apr 12 08:53:57 pve01 pmxcfs[1941]: [confdb] crit: can't initialize service
Apr 12 08:53:57 pve01 pmxcfs[1941]: [dcdb] crit: cpg_initialize failed: 2
Apr 12 08:53:57 pve01 pmxcfs[1941]: [dcdb] crit: can't initialize service
Apr 12 08:53:57 pve01 pmxcfs[1941]: [status] crit: cpg_initialize failed: 2
Apr 12 08:53:57 pve01 pmxcfs[1941]: [status] crit: can't initialize service
Apr 12 08:53:58 pve01 systemd[1]: Started The Proxmox VE cluster filesystem.
Apr 12 08:53:58 pve01 systemd[1]: Starting Corosync Cluster Engine...
Apr 12 08:53:58 pve01 corosync[2037]:   [MAIN  ] Corosync Cluster Engine 3.1.7 starting up
Apr 12 08:53:58 pve01 corosync[2037]:   [MAIN  ] Corosync built-in features: dbus monitoring watchdog systemd xmlconf vqsim nozzle snmp pie relro bindnow
Apr 12 08:53:58 pve01 corosync[2037]:   [TOTEM ] Initializing transport (Kronosnet).
Apr 12 08:53:59 pve01 corosync[2037]:   [TOTEM ] totemknet initialized
Apr 12 08:53:59 pve01 corosync[2037]:   [KNET  ] pmtud: MTU manually set to: 0
Apr 12 08:53:59 pve01 corosync[2037]:   [KNET  ] common: crypto_nss.so has been loaded from /usr/lib/x86_64-linux-gnu/kronosnet/crypto_nss.so
Apr 12 08:53:59 pve01 corosync[2037]:   [SERV  ] Service engine loaded: corosync configuration map access [0]
Apr 12 08:53:59 pve01 corosync[2037]:   [QB    ] server name: cmap
Apr 12 08:53:59 pve01 corosync[2037]:   [SERV  ] Service engine loaded: corosync configuration service [1]
Apr 12 08:53:59 pve01 corosync[2037]:   [QB    ] server name: cfg
Apr 12 08:53:59 pve01 corosync[2037]:   [SERV  ] Service engine loaded: corosync cluster closed process group service v1.01 [2]
Apr 12 08:53:59 pve01 corosync[2037]:   [QB    ] server name: cpg
Apr 12 08:53:59 pve01 corosync[2037]:   [SERV  ] Service engine loaded: corosync profile loading service [4]
Apr 12 08:53:59 pve01 corosync[2037]:   [SERV  ] Service engine loaded: corosync resource monitoring service [6]
Apr 12 08:53:59 pve01 corosync[2037]:   [WD    ] Watchdog not enabled by configuration
Apr 12 08:53:59 pve01 corosync[2037]:   [WD    ] resource load_15min missing a recovery key.
Apr 12 08:53:59 pve01 corosync[2037]:   [WD    ] resource memory_used missing a recovery key.
Apr 12 08:53:59 pve01 corosync[2037]:   [WD    ] no resources configured.
Apr 12 08:53:59 pve01 corosync[2037]:   [SERV  ] Service engine loaded: corosync watchdog service [7]
Apr 12 08:53:59 pve01 corosync[2037]:   [QUORUM] Using quorum provider corosync_votequorum
Apr 12 08:53:59 pve01 corosync[2037]:   [SERV  ] Service engine loaded: corosync vote quorum service v1.0 [5]
Apr 12 08:53:59 pve01 corosync[2037]:   [QB    ] server name: votequorum
Apr 12 08:53:59 pve01 corosync[2037]:   [SERV  ] Service engine loaded: corosync cluster quorum service v0.1 [3]
Apr 12 08:53:59 pve01 corosync[2037]:   [QB    ] server name: quorum
Apr 12 08:53:59 pve01 corosync[2037]:   [TOTEM ] Configuring link 0
Apr 12 08:53:59 pve01 corosync[2037]:   [TOTEM ] Configured link number 0: local addr: 10.10.100.100, port=5405
Apr 12 08:53:59 pve01 corosync[2037]:   [KNET  ] link: Resetting MTU for link 0 because host 1 joined
Apr 12 08:53:59 pve01 corosync[2037]:   [KNET  ] host: host: 2 (passive) best link: 0 (pri: 1)
Apr 12 08:53:59 pve01 corosync[2037]:   [KNET  ] host: host: 2 has no active links
Apr 12 08:53:59 pve01 corosync[2037]:   [KNET  ] host: host: 2 (passive) best link: 0 (pri: 1)
Apr 12 08:53:59 pve01 corosync[2037]:   [KNET  ] host: host: 2 has no active links
Apr 12 08:53:59 pve01 corosync[2037]:   [KNET  ] host: host: 2 (passive) best link: 0 (pri: 1)
Apr 12 08:53:59 pve01 corosync[2037]:   [KNET  ] host: host: 2 has no active links
Apr 12 08:53:59 pve01 corosync[2037]:   [KNET  ] host: host: 4 (passive) best link: 0 (pri: 1)
Apr 12 08:53:59 pve01 corosync[2037]:   [KNET  ] host: host: 4 has no active links
Apr 12 08:53:59 pve01 corosync[2037]:   [KNET  ] host: host: 4 (passive) best link: 0 (pri: 1)
Apr 12 08:53:59 pve01 corosync[2037]:   [KNET  ] host: host: 4 has no active links
Apr 12 08:53:59 pve01 corosync[2037]:   [KNET  ] host: host: 4 (passive) best link: 0 (pri: 1)
Apr 12 08:53:59 pve01 corosync[2037]:   [KNET  ] host: host: 4 has no active links
Apr 12 08:53:59 pve01 corosync[2037]:   [QUORUM] Sync members[1]: 1
Apr 12 08:53:59 pve01 corosync[2037]:   [QUORUM] Sync joined[1]: 1
Apr 12 08:53:59 pve01 corosync[2037]:   [TOTEM ] A new membership (1.9dc) was formed. Members joined: 1
Apr 12 08:53:59 pve01 corosync[2037]:   [QUORUM] Members[1]: 1
Apr 12 08:53:59 pve01 corosync[2037]:   [MAIN  ] Completed service synchronization, ready to provide service.
Apr 12 08:53:59 pve01 systemd[1]: Started Corosync Cluster Engine.
Apr 12 08:54:03 pve01 pmxcfs[1941]: [status] notice: update cluster info (cluster name  proxmox, version = 15)
Apr 12 08:54:03 pve01 pmxcfs[1941]: [dcdb] notice: members: 1/1941
Apr 12 08:54:03 pve01 pmxcfs[1941]: [dcdb] notice: all data is up to date
Apr 12 08:54:03 pve01 pmxcfs[1941]: [status] notice: members: 1/1941
Apr 12 08:54:03 pve01 pmxcfs[1941]: [status] notice: all data is up to date
Apr 12 09:04:20 pve01 systemd[1]: Stopping Corosync Cluster Engine...
Apr 12 09:04:20 pve01 corosync-cfgtool[4196]: Shutting down corosync
Apr 12 09:04:20 pve01 corosync[2037]:   [MAIN  ] Node was shut down by a signal
Apr 12 09:04:20 pve01 corosync[2037]:   [SERV  ] Unloading all Corosync service engines.
Apr 12 09:04:20 pve01 corosync[2037]:   [QB    ] withdrawing server sockets
Apr 12 09:04:20 pve01 corosync[2037]:   [SERV  ] Service engine unloaded: corosync vote quorum service v1.0
Apr 12 09:04:20 pve01 corosync[2037]:   [CFG   ] Node 1 was shut down by sysadmin
Apr 12 09:04:20 pve01 pmxcfs[1941]: [confdb] crit: cmap_dispatch failed: 2
Apr 12 09:04:20 pve01 corosync[2037]:   [QB    ] withdrawing server sockets
Apr 12 09:04:20 pve01 corosync[2037]:   [SERV  ] Service engine unloaded: corosync configuration map access
Apr 12 09:04:20 pve01 corosync[2037]:   [QB    ] withdrawing server sockets
Apr 12 09:04:20 pve01 corosync[2037]:   [SERV  ] Service engine unloaded: corosync configuration service
Apr 12 09:04:20 pve01 corosync[2037]:   [QB    ] withdrawing server sockets
Apr 12 09:04:20 pve01 corosync[2037]:   [SERV  ] Service engine unloaded: corosync cluster closed process group service v1.01
Apr 12 09:04:20 pve01 corosync[2037]:   [QB    ] withdrawing server sockets
Apr 12 09:04:20 pve01 corosync[2037]:   [SERV  ] Service engine unloaded: corosync cluster quorum service v0.1
Apr 12 09:04:20 pve01 corosync[2037]:   [SERV  ] Service engine unloaded: corosync profile loading service
Apr 12 09:04:20 pve01 corosync[2037]:   [SERV  ] Service engine unloaded: corosync resource monitoring service
Apr 12 09:04:20 pve01 corosync[2037]:   [SERV  ] Service engine unloaded: corosync watchdog service
Apr 12 09:04:20 pve01 pmxcfs[1941]: [quorum] crit: quorum_dispatch failed: 2
Apr 12 09:04:20 pve01 pmxcfs[1941]: [dcdb] crit: cpg_dispatch failed: 2
Apr 12 09:04:20 pve01 pmxcfs[1941]: [dcdb] crit: cpg_leave failed: 2
Apr 12 09:04:21 pve01 pmxcfs[1941]: [status] crit: cpg_dispatch failed: 2
Apr 12 09:04:21 pve01 pmxcfs[1941]: [status] crit: cpg_leave failed: 2
Apr 12 09:04:21 pve01 pmxcfs[1941]: [quorum] crit: quorum_initialize failed: 2
Apr 12 09:04:21 pve01 pmxcfs[1941]: [quorum] crit: can't initialize service
Apr 12 09:04:21 pve01 pmxcfs[1941]: [confdb] crit: cmap_initialize failed: 2
Apr 12 09:04:21 pve01 pmxcfs[1941]: [confdb] crit: can't initialize service
Apr 12 09:04:21 pve01 pmxcfs[1941]: [dcdb] notice: start cluster connection
Apr 12 09:04:21 pve01 pmxcfs[1941]: [dcdb] crit: cpg_initialize failed: 2
Apr 12 09:04:21 pve01 pmxcfs[1941]: [dcdb] crit: can't initialize service
Apr 12 09:04:21 pve01 pmxcfs[1941]: [status] notice: start cluster connection
Apr 12 09:04:21 pve01 pmxcfs[1941]: [status] crit: cpg_initialize failed: 2
Apr 12 09:04:21 pve01 pmxcfs[1941]: [status] crit: can't initialize service
Apr 12 09:04:21 pve01 corosync[2037]:   [KNET  ] link: Resetting MTU for link 0 because host 1 joined
Apr 12 09:04:21 pve01 corosync[2037]:   [MAIN  ] Corosync Cluster Engine exiting normally
Apr 12 09:04:21 pve01 systemd[1]: corosync.service: Succeeded.
Apr 12 09:04:21 pve01 systemd[1]: Stopped Corosync Cluster Engine.
Apr 12 09:04:21 pve01 systemd[1]: corosync.service: Consumed 2.128s CPU time.
Apr 12 09:04:21 pve01 systemd[1]: Starting Corosync Cluster Engine...
Apr 12 09:04:21 pve01 corosync[4200]:   [MAIN  ] Corosync Cluster Engine 3.1.7 starting up
Apr 12 09:04:21 pve01 corosync[4200]:   [MAIN  ] Corosync built-in features: dbus monitoring watchdog systemd xmlconf vqsim nozzle snmp pie relro bindnow
Apr 12 09:04:21 pve01 corosync[4200]:   [TOTEM ] Initializing transport (Kronosnet).
Apr 12 09:04:21 pve01 corosync[4200]:   [TOTEM ] totemknet initialized
Apr 12 09:04:21 pve01 corosync[4200]:   [KNET  ] pmtud: MTU manually set to: 0
Apr 12 09:04:21 pve01 corosync[4200]:   [KNET  ] common: crypto_nss.so has been loaded from /usr/lib/x86_64-linux-gnu/kronosnet/crypto_nss.so
Apr 12 09:04:22 pve01 corosync[4200]:   [SERV  ] Service engine loaded: corosync configuration map access [0]
Apr 12 09:04:22 pve01 corosync[4200]:   [QB    ] server name: cmap
Apr 12 09:04:22 pve01 corosync[4200]:   [SERV  ] Service engine loaded: corosync configuration service [1]
Apr 12 09:04:22 pve01 corosync[4200]:   [QB    ] server name: cfg
Apr 12 09:04:22 pve01 corosync[4200]:   [SERV  ] Service engine loaded: corosync cluster closed process group service v1.01 [2]
Apr 12 09:04:22 pve01 corosync[4200]:   [QB    ] server name: cpg
Apr 12 09:04:22 pve01 corosync[4200]:   [SERV  ] Service engine loaded: corosync profile loading service [4]
Apr 12 09:04:22 pve01 corosync[4200]:   [SERV  ] Service engine loaded: corosync resource monitoring service [6]
Apr 12 09:04:22 pve01 corosync[4200]:   [WD    ] Watchdog not enabled by configuration
Apr 12 09:04:22 pve01 corosync[4200]:   [WD    ] resource load_15min missing a recovery key.
Apr 12 09:04:22 pve01 corosync[4200]:   [WD    ] resource memory_used missing a recovery key.
Apr 12 09:04:22 pve01 corosync[4200]:   [WD    ] no resources configured.
Apr 12 09:04:22 pve01 corosync[4200]:   [SERV  ] Service engine loaded: corosync watchdog service [7]
Apr 12 09:04:22 pve01 corosync[4200]:   [QUORUM] Using quorum provider corosync_votequorum
Apr 12 09:04:22 pve01 corosync[4200]:   [SERV  ] Service engine loaded: corosync vote quorum service v1.0 [5]
Apr 12 09:04:22 pve01 corosync[4200]:   [QB    ] server name: votequorum
Apr 12 09:04:22 pve01 corosync[4200]:   [SERV  ] Service engine loaded: corosync cluster quorum service v0.1 [3]
Apr 12 09:04:22 pve01 corosync[4200]:   [QB    ] server name: quorum
Apr 12 09:04:22 pve01 corosync[4200]:   [TOTEM ] Configuring link 0
Apr 12 09:04:22 pve01 corosync[4200]:   [TOTEM ] Configured link number 0: local addr: 10.10.100.100, port=5405
Apr 12 09:04:22 pve01 corosync[4200]:   [KNET  ] link: Resetting MTU for link 0 because host 1 joined
Apr 12 09:04:22 pve01 corosync[4200]:   [KNET  ] host: host: 2 (passive) best link: 0 (pri: 1)
Apr 12 09:04:22 pve01 corosync[4200]:   [KNET  ] host: host: 2 has no active links
Apr 12 09:04:22 pve01 corosync[4200]:   [KNET  ] host: host: 4 (passive) best link: 0 (pri: 1)
Apr 12 09:04:22 pve01 corosync[4200]:   [KNET  ] host: host: 4 has no active links
Apr 12 09:04:22 pve01 corosync[4200]:   [KNET  ] host: host: 2 (passive) best link: 0 (pri: 1)
Apr 12 09:04:22 pve01 corosync[4200]:   [KNET  ] host: host: 2 has no active links
Apr 12 09:04:22 pve01 corosync[4200]:   [KNET  ] host: host: 2 (passive) best link: 0 (pri: 1)
Apr 12 09:04:22 pve01 corosync[4200]:   [KNET  ] host: host: 2 has no active links
Apr 12 09:04:22 pve01 corosync[4200]:   [KNET  ] host: host: 4 (passive) best link: 0 (pri: 1)
Apr 12 09:04:22 pve01 corosync[4200]:   [KNET  ] host: host: 4 has no active links
Apr 12 09:04:22 pve01 corosync[4200]:   [KNET  ] host: host: 4 (passive) best link: 0 (pri: 1)
Apr 12 09:04:22 pve01 corosync[4200]:   [KNET  ] host: host: 4 has no active links
Apr 12 09:04:22 pve01 corosync[4200]:   [QUORUM] Sync members[1]: 1
Apr 12 09:04:22 pve01 corosync[4200]:   [QUORUM] Sync joined[1]: 1
Apr 12 09:04:22 pve01 corosync[4200]:   [TOTEM ] A new membership (1.9e1) was formed. Members joined: 1
Apr 12 09:04:22 pve01 corosync[4200]:   [QUORUM] Members[1]: 1
Apr 12 09:04:22 pve01 systemd[1]: Started Corosync Cluster Engine.
Apr 12 09:04:22 pve01 corosync[4200]:   [MAIN  ] Completed service synchronization, ready to provide service.
Apr 12 09:04:27 pve01 pmxcfs[1941]: [status] notice: update cluster info (cluster name  proxmox, version = 15)
Apr 12 09:04:27 pve01 pmxcfs[1941]: [dcdb] notice: members: 1/1941
Apr 12 09:04:27 pve01 pmxcfs[1941]: [dcdb] notice: all data is up to date
Apr 12 09:04:27 pve01 pmxcfs[1941]: [status] notice: members: 1/1941
Apr 12 09:04:27 pve01 pmxcfs[1941]: [status] notice: all data is up to date
Apr 12 09:04:28 pve01 systemd[1]: Stopping The Proxmox VE cluster filesystem...
Apr 12 09:04:28 pve01 pmxcfs[1941]: [main] notice: teardown filesystem
Apr 12 09:04:29 pve01 pmxcfs[1941]: [main] notice: exit proxmox configuration filesystem (0)
Apr 12 09:04:29 pve01 systemd[1]: pve-cluster.service: Succeeded.
Apr 12 09:04:29 pve01 systemd[1]: Stopped The Proxmox VE cluster filesystem.
Apr 12 09:04:29 pve01 systemd[1]: Starting The Proxmox VE cluster filesystem...
Apr 12 09:04:29 pve01 pmxcfs[4212]: [dcdb] crit: local corosync.conf is newer
Apr 12 09:04:29 pve01 pmxcfs[4212]: [dcdb] crit: local corosync.conf is newer
Apr 12 09:04:29 pve01 pmxcfs[4213]: [status] notice: update cluster info (cluster name  proxmox, version = 15)
Apr 12 09:04:29 pve01 pmxcfs[4213]: [dcdb] notice: members: 1/4213
Apr 12 09:04:29 pve01 pmxcfs[4213]: [dcdb] notice: all data is up to date
Apr 12 09:04:29 pve01 pmxcfs[4213]: [status] notice: members: 1/4213
Apr 12 09:04:29 pve01 pmxcfs[4213]: [status] notice: all data is up to date
Apr 12 09:04:30 pve01 systemd[1]: Started The Proxmox VE cluster filesystem.
Apr 12 10:04:29 pve01 pmxcfs[4213]: [dcdb] notice: data verification successful
Apr 12 11:04:29 pve01 pmxcfs[4213]: [dcdb] notice: data verification successful
Apr 12 12:04:29 pve01 pmxcfs[4213]: [dcdb] notice: data verification successful
 
Tried these steps from another post, but still same results.

Code:
# stop corosync and pmxcfs on all nodes
$ systemctl stop corosync pve-cluster

# start pmxcfs in local mode on all nodes
$ pmxcfs -l

# put correct corosync config into local pmxcfs and corosync config dir (make sure to bump the 'config_version' inside the config file) 
#
####  ====> did this only in node 1 at this time to see if it would fix things
#
$ cp correct_corosync.conf /etc/pve/corosync.conf
$ cp correct_corosync.conf /etc/corosync/corosync.conf

# kill local pmxcfs
$ killall pmxcfs

# start corosync and pmxcfs again
$ systemctl start pve-cluster corosync

# check status
$ journalctl --since '-5min' -u pve-cluster -u corosync
$ pvecm status
 
this is the journalctl output:

Code:
-- Journal begins at Fri 2023-01-20 08:48:55 EST, ends at Wed 2023-04-12 21:28:09 EDT. --
Apr 12 21:24:30 pve01 systemd[1]: Starting The Proxmox VE cluster filesystem...
Apr 12 21:24:30 pve01 pmxcfs[116735]: [quorum] crit: quorum_initialize failed: 2
Apr 12 21:24:30 pve01 pmxcfs[116735]: [quorum] crit: can't initialize service
Apr 12 21:24:30 pve01 pmxcfs[116735]: [confdb] crit: cmap_initialize failed: 2
Apr 12 21:24:30 pve01 pmxcfs[116735]: [confdb] crit: can't initialize service
Apr 12 21:24:30 pve01 pmxcfs[116735]: [dcdb] crit: cpg_initialize failed: 2
Apr 12 21:24:30 pve01 pmxcfs[116735]: [dcdb] crit: can't initialize service
Apr 12 21:24:30 pve01 pmxcfs[116735]: [status] crit: cpg_initialize failed: 2
Apr 12 21:24:30 pve01 pmxcfs[116735]: [status] crit: can't initialize service
Apr 12 21:24:31 pve01 systemd[1]: Started The Proxmox VE cluster filesystem.
Apr 12 21:24:31 pve01 systemd[1]: Starting Corosync Cluster Engine...
Apr 12 21:24:31 pve01 corosync[117383]:   [MAIN  ] Corosync Cluster Engine 3.1.7 starting up
Apr 12 21:24:31 pve01 corosync[117383]:   [MAIN  ] Corosync built-in features: dbus monitoring watchdog systemd xmlconf vqsim nozzle snmp pie relro bindnow
Apr 12 21:24:31 pve01 corosync[117383]:   [TOTEM ] Initializing transport (Kronosnet).
Apr 12 21:24:32 pve01 corosync[117383]:   [TOTEM ] totemknet initialized
Apr 12 21:24:32 pve01 corosync[117383]:   [KNET  ] pmtud: MTU manually set to: 0
Apr 12 21:24:32 pve01 corosync[117383]:   [KNET  ] common: crypto_nss.so has been loaded from /usr/lib/x86_64-linux-gnu/kronosnet/crypto_nss.so
Apr 12 21:24:32 pve01 corosync[117383]:   [SERV  ] Service engine loaded: corosync configuration map access [0]
Apr 12 21:24:32 pve01 corosync[117383]:   [QB    ] server name: cmap
Apr 12 21:24:32 pve01 corosync[117383]:   [SERV  ] Service engine loaded: corosync configuration service [1]
Apr 12 21:24:32 pve01 corosync[117383]:   [QB    ] server name: cfg
Apr 12 21:24:32 pve01 corosync[117383]:   [SERV  ] Service engine loaded: corosync cluster closed process group service v1.01 [2]
Apr 12 21:24:32 pve01 corosync[117383]:   [QB    ] server name: cpg
Apr 12 21:24:32 pve01 corosync[117383]:   [SERV  ] Service engine loaded: corosync profile loading service [4]
Apr 12 21:24:32 pve01 corosync[117383]:   [SERV  ] Service engine loaded: corosync resource monitoring service [6]
Apr 12 21:24:32 pve01 corosync[117383]:   [WD    ] Watchdog not enabled by configuration
Apr 12 21:24:32 pve01 corosync[117383]:   [WD    ] resource load_15min missing a recovery key.
Apr 12 21:24:32 pve01 corosync[117383]:   [WD    ] resource memory_used missing a recovery key.
Apr 12 21:24:32 pve01 corosync[117383]:   [WD    ] no resources configured.
Apr 12 21:24:32 pve01 corosync[117383]:   [SERV  ] Service engine loaded: corosync watchdog service [7]
Apr 12 21:24:32 pve01 corosync[117383]:   [QUORUM] Using quorum provider corosync_votequorum
Apr 12 21:24:32 pve01 corosync[117383]:   [SERV  ] Service engine loaded: corosync vote quorum service v1.0 [5]
Apr 12 21:24:32 pve01 corosync[117383]:   [QB    ] server name: votequorum
Apr 12 21:24:32 pve01 corosync[117383]:   [SERV  ] Service engine loaded: corosync cluster quorum service v0.1 [3]
Apr 12 21:24:32 pve01 corosync[117383]:   [QB    ] server name: quorum
Apr 12 21:24:32 pve01 corosync[117383]:   [TOTEM ] Configuring link 0
Apr 12 21:24:32 pve01 corosync[117383]:   [TOTEM ] Configured link number 0: local addr: 10.10.100.100, port=5405
Apr 12 21:24:32 pve01 corosync[117383]:   [KNET  ] link: Resetting MTU for link 0 because host 1 joined
Apr 12 21:24:32 pve01 corosync[117383]:   [KNET  ] host: host: 2 (passive) best link: 0 (pri: 1)
Apr 12 21:24:32 pve01 corosync[117383]:   [KNET  ] host: host: 2 has no active links
Apr 12 21:24:32 pve01 corosync[117383]:   [KNET  ] host: host: 2 (passive) best link: 0 (pri: 1)
Apr 12 21:24:32 pve01 corosync[117383]:   [KNET  ] host: host: 2 has no active links
Apr 12 21:24:32 pve01 corosync[117383]:   [KNET  ] host: host: 2 (passive) best link: 0 (pri: 1)
Apr 12 21:24:32 pve01 corosync[117383]:   [KNET  ] host: host: 2 has no active links
Apr 12 21:24:32 pve01 corosync[117383]:   [KNET  ] host: host: 4 (passive) best link: 0 (pri: 0)
Apr 12 21:24:32 pve01 corosync[117383]:   [KNET  ] host: host: 4 has no active links
Apr 12 21:24:32 pve01 corosync[117383]:   [KNET  ] host: host: 4 (passive) best link: 0 (pri: 1)
Apr 12 21:24:32 pve01 corosync[117383]:   [KNET  ] host: host: 4 has no active links
Apr 12 21:24:32 pve01 corosync[117383]:   [KNET  ] host: host: 4 (passive) best link: 0 (pri: 1)
Apr 12 21:24:32 pve01 corosync[117383]:   [KNET  ] host: host: 4 has no active links
Apr 12 21:24:32 pve01 corosync[117383]:   [QUORUM] Sync members[1]: 1
Apr 12 21:24:32 pve01 corosync[117383]:   [QUORUM] Sync joined[1]: 1
Apr 12 21:24:32 pve01 corosync[117383]:   [TOTEM ] A new membership (1.9f5) was formed. Members joined: 1
Apr 12 21:24:32 pve01 corosync[117383]:   [QUORUM] Members[1]: 1
Apr 12 21:24:32 pve01 corosync[117383]:   [MAIN  ] Completed service synchronization, ready to provide service.
Apr 12 21:24:32 pve01 systemd[1]: Started Corosync Cluster Engine.
Apr 12 21:24:36 pve01 pmxcfs[116735]: [status] notice: update cluster info (cluster name  proxmox, version = 6)
Apr 12 21:24:36 pve01 pmxcfs[116735]: [dcdb] notice: members: 1/116735
Apr 12 21:24:36 pve01 pmxcfs[116735]: [dcdb] notice: all data is up to date
Apr 12 21:24:36 pve01 pmxcfs[116735]: [status] notice: members: 1/116735
Apr 12 21:24:36 pve01 pmxcfs[116735]: [status] notice: all data is up to date
 
Apr 12 21:24:32 pve01 corosync[117383]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 1) Apr 12 21:24:32 pve01 corosync[117383]: [KNET ] host: host: 2 has no active links
The config and service status look fine to me, but above message indicates that there seems to be a network communication error. This node does not see any of the other nodes in the cluster as up.

Please check also the journal of the other two nodes, which will probably show that there is no active link for node with id 1.

Is there possibly a firewall interfering with the network traffic?
 
The config and service status look fine to me, but above message indicates that there seems to be a network communication error. This node does not see any of the other nodes in the cluster as up.

Please check also the journal of the other two nodes, which will probably show that there is no active link for node with id 1.

Is there possibly a firewall interfering with the network traffic?

No firewalls are in place and i can ssh between any and all hosts. just really strange. i did check on the other hosts and they are rejecting packets from host 1. here is a bit of the journal on one of the other hosts:

Code:
Apr 12 07:29:15 pve02 systemd[1]: Starting The Proxmox VE cluster filesystem...
Apr 12 07:29:15 pve02 pmxcfs[2096]: [dcdb] crit: local corosync.conf is newer
Apr 12 07:29:15 pve02 pmxcfs[2096]: [dcdb] crit: local corosync.conf is newer
Apr 12 07:29:15 pve02 pmxcfs[2115]: [quorum] crit: quorum_initialize failed: 2
Apr 12 07:29:15 pve02 pmxcfs[2115]: [quorum] crit: can't initialize service
Apr 12 07:29:15 pve02 pmxcfs[2115]: [confdb] crit: cmap_initialize failed: 2
Apr 12 07:29:15 pve02 pmxcfs[2115]: [confdb] crit: can't initialize service
Apr 12 07:29:15 pve02 pmxcfs[2115]: [dcdb] crit: cpg_initialize failed: 2
Apr 12 07:29:15 pve02 pmxcfs[2115]: [dcdb] crit: can't initialize service
Apr 12 07:29:15 pve02 pmxcfs[2115]: [status] crit: cpg_initialize failed: 2
Apr 12 07:29:15 pve02 pmxcfs[2115]: [status] crit: can't initialize service
Apr 12 07:29:16 pve02 systemd[1]: Started The Proxmox VE cluster filesystem.
Apr 12 07:29:16 pve02 systemd[1]: Starting Corosync Cluster Engine...
Apr 12 07:29:16 pve02 corosync[2205]:   [MAIN  ] Corosync Cluster Engine 3.1.7 starting up
Apr 12 07:29:16 pve02 corosync[2205]:   [MAIN  ] Corosync built-in features: dbus monitoring watchdog systemd xmlconf vqsim nozzle snmp pie relro bindnow
Apr 12 07:29:16 pve02 corosync[2205]:   [TOTEM ] Initializing transport (Kronosnet).
Apr 12 07:29:17 pve02 corosync[2205]:   [TOTEM ] totemknet initialized
Apr 12 07:29:17 pve02 corosync[2205]:   [KNET  ] pmtud: MTU manually set to: 0
Apr 12 07:29:17 pve02 corosync[2205]:   [KNET  ] common: crypto_nss.so has been loaded from /usr/lib/x86_64-linux-gnu/kronosnet/crypto_nss.so
Apr 12 07:29:17 pve02 corosync[2205]:   [SERV  ] Service engine loaded: corosync configuration map access [0]
Apr 12 07:29:17 pve02 corosync[2205]:   [QB    ] server name: cmap
Apr 12 07:29:17 pve02 corosync[2205]:   [SERV  ] Service engine loaded: corosync configuration service [1]
Apr 12 07:29:17 pve02 corosync[2205]:   [QB    ] server name: cfg
Apr 12 07:29:17 pve02 corosync[2205]:   [SERV  ] Service engine loaded: corosync cluster closed process group service v1.01 [2]
Apr 12 07:29:17 pve02 corosync[2205]:   [QB    ] server name: cpg
Apr 12 07:29:17 pve02 corosync[2205]:   [SERV  ] Service engine loaded: corosync profile loading service [4]
Apr 12 07:29:17 pve02 corosync[2205]:   [SERV  ] Service engine loaded: corosync resource monitoring service [6]
Apr 12 07:29:17 pve02 corosync[2205]:   [WD    ] Watchdog not enabled by configuration
Apr 12 07:29:17 pve02 corosync[2205]:   [WD    ] resource load_15min missing a recovery key.
Apr 12 07:29:17 pve02 corosync[2205]:   [WD    ] resource memory_used missing a recovery key.
Apr 12 07:29:17 pve02 corosync[2205]:   [WD    ] no resources configured.
Apr 12 07:29:17 pve02 corosync[2205]:   [SERV  ] Service engine loaded: corosync watchdog service [7]
Apr 12 07:29:17 pve02 corosync[2205]:   [QUORUM] Using quorum provider corosync_votequorum
Apr 12 07:29:17 pve02 corosync[2205]:   [SERV  ] Service engine loaded: corosync vote quorum service v1.0 [5]
Apr 12 07:29:17 pve02 corosync[2205]:   [QB    ] server name: votequorum
Apr 12 07:29:17 pve02 corosync[2205]:   [SERV  ] Service engine loaded: corosync cluster quorum service v0.1 [3]
Apr 12 07:29:17 pve02 corosync[2205]:   [QB    ] server name: quorum
Apr 12 07:29:17 pve02 corosync[2205]:   [TOTEM ] Configuring link 0
Apr 12 07:29:17 pve02 corosync[2205]:   [TOTEM ] Configured link number 0: local addr: 10.10.100.105, port=5405
Apr 12 07:29:17 pve02 corosync[2205]:   [KNET  ] host: host: 1 (passive) best link: 0 (pri: 0)
Apr 12 07:29:17 pve02 corosync[2205]:   [KNET  ] host: host: 1 has no active links
Apr 12 07:29:17 pve02 corosync[2205]:   [KNET  ] host: host: 1 (passive) best link: 0 (pri: 1)
Apr 12 07:29:17 pve02 corosync[2205]:   [KNET  ] host: host: 1 has no active links
Apr 12 07:29:17 pve02 corosync[2205]:   [KNET  ] host: host: 1 (passive) best link: 0 (pri: 1)
Apr 12 07:29:17 pve02 corosync[2205]:   [KNET  ] host: host: 1 has no active links
Apr 12 07:29:17 pve02 corosync[2205]:   [KNET  ] link: Resetting MTU for link 0 because host 2 joined
Apr 12 07:29:17 pve02 corosync[2205]:   [KNET  ] host: host: 4 (passive) best link: 0 (pri: 1)
Apr 12 07:29:17 pve02 corosync[2205]:   [KNET  ] host: host: 4 has no active links
Apr 12 07:29:17 pve02 corosync[2205]:   [KNET  ] host: host: 4 (passive) best link: 0 (pri: 1)
Apr 12 07:29:17 pve02 corosync[2205]:   [KNET  ] host: host: 4 has no active links
Apr 12 07:29:17 pve02 corosync[2205]:   [KNET  ] host: host: 4 (passive) best link: 0 (pri: 1)
Apr 12 07:29:17 pve02 corosync[2205]:   [KNET  ] host: host: 4 has no active links
Apr 12 07:29:17 pve02 corosync[2205]:   [QUORUM] Sync members[1]: 2
Apr 12 07:29:17 pve02 corosync[2205]:   [QUORUM] Sync joined[1]: 2
Apr 12 07:29:17 pve02 corosync[2205]:   [TOTEM ] A new membership (2.9d5) was formed. Members joined: 2
Apr 12 07:29:17 pve02 corosync[2205]:   [QUORUM] Members[1]: 2
Apr 12 07:29:17 pve02 corosync[2205]:   [MAIN  ] Completed service synchronization, ready to provide service.
Apr 12 07:29:17 pve02 systemd[1]: Started Corosync Cluster Engine.
Apr 12 07:29:21 pve02 pmxcfs[2115]: [status] notice: update cluster info (cluster name  proxmox, version = 10)
Apr 12 07:29:21 pve02 pmxcfs[2115]: [dcdb] notice: members: 2/2115
Apr 12 07:29:21 pve02 pmxcfs[2115]: [dcdb] notice: all data is up to date
Apr 12 07:29:21 pve02 pmxcfs[2115]: [status] notice: members: 2/2115
Apr 12 07:29:21 pve02 pmxcfs[2115]: [status] notice: all data is up to date
Apr 12 07:40:09 pve02 corosync[2205]:   [KNET  ] rx: Packet rejected from 10.10.100.100:5405
Apr 12 07:40:10 pve02 corosync[2205]:   [KNET  ] rx: Packet rejected from 10.10.100.100:5405
Apr 12 07:40:11 pve02 corosync[2205]:   [KNET  ] rx: Packet rejected from 10.10.100.100:5405
Apr 12 07:40:12 pve02 corosync[2205]:   [KNET  ] rx: Packet rejected from 10.10.100.100:5405
Apr 12 07:40:13 pve02 corosync[2205]:   [KNET  ] rx: Packet rejected from 10.10.100.100:5405
Apr 12 07:40:13 pve02 corosync[2205]:   [KNET  ] rx: host: 4 link: 0 is up
Apr 12 07:40:13 pve02 corosync[2205]:   [KNET  ] link: Resetting MTU for link 0 because host 4 joined
Apr 12 07:40:13 pve02 corosync[2205]:   [KNET  ] host: host: 4 (passive) best link: 0 (pri: 1)
Apr 12 07:40:13 pve02 corosync[2205]:   [QUORUM] Sync members[2]: 2 4
Apr 12 07:40:13 pve02 corosync[2205]:   [QUORUM] Sync joined[1]: 4
Apr 12 07:40:13 pve02 corosync[2205]:   [TOTEM ] A new membership (2.9d9) was formed. Members joined: 4
Apr 12 07:40:13 pve02 pmxcfs[2115]: [dcdb] notice: members: 2/2115, 4/1901
Apr 12 07:40:13 pve02 pmxcfs[2115]: [dcdb] notice: starting data syncronisation
Apr 12 07:40:13 pve02 corosync[2205]:   [QUORUM] This node is within the primary component and will provide service.
Apr 12 07:40:13 pve02 corosync[2205]:   [QUORUM] Members[2]: 2 4
Apr 12 07:40:13 pve02 corosync[2205]:   [MAIN  ] Completed service synchronization, ready to provide service.
Apr 12 07:40:13 pve02 corosync[2205]:   [KNET  ] pmtud: PMTUD link change for host: 4 link: 0 from 469 to 1397
Apr 12 07:40:13 pve02 corosync[2205]:   [KNET  ] pmtud: Global data MTU changed to: 1397
Apr 12 07:40:13 pve02 pmxcfs[2115]: [dcdb] notice: cpg_send_message retried 1 times
Apr 12 07:40:13 pve02 pmxcfs[2115]: [status] notice: node has quorum
Apr 12 07:40:13 pve02 pmxcfs[2115]: [status] notice: members: 2/2115, 4/1901
Apr 12 07:40:13 pve02 pmxcfs[2115]: [status] notice: starting data syncronisation
Apr 12 07:40:13 pve02 pmxcfs[2115]: [dcdb] notice: received sync request (epoch 2/2115/00000002)
Apr 12 07:40:13 pve02 pmxcfs[2115]: [status] notice: received sync request (epoch 2/2115/00000002)
Apr 12 07:40:13 pve02 pmxcfs[2115]: [dcdb] notice: received all states
Apr 12 07:40:13 pve02 pmxcfs[2115]: [dcdb] notice: leader is 4/1901
Apr 12 07:40:13 pve02 pmxcfs[2115]: [dcdb] notice: synced members: 4/1901
Apr 12 07:40:13 pve02 pmxcfs[2115]: [dcdb] notice: waiting for updates from leader
Apr 12 07:40:13 pve02 pmxcfs[2115]: [status] notice: received all states
Apr 12 07:40:13 pve02 pmxcfs[2115]: [status] notice: all data is up to date
Apr 12 07:40:13 pve02 pmxcfs[2115]: [dcdb] notice: update complete - trying to commit (got 3 inode updates)
Apr 12 07:40:13 pve02 pmxcfs[2115]: [dcdb] crit: local corosync.conf is newer
Apr 12 07:40:13 pve02 pmxcfs[2115]: [dcdb] notice: all data is up to date
Apr 12 07:40:14 pve02 corosync[2205]:   [KNET  ] rx: Packet rejected from 10.10.100.100:5405
Apr 12 07:40:15 pve02 pmxcfs[2115]: [status] notice: received log
Apr 12 07:40:15 pve02 corosync[2205]:   [KNET  ] rx: Packet rejected from 10.10.100.100:5405
Apr 12 07:40:16 pve02 corosync[2205]:   [KNET  ] rx: Packet rejected from 10.10.100.100:5405
Apr 12 07:40:17 pve02 corosync[2205]:   [KNET  ] rx: Packet rejected from 10.10.100.100:5405
Apr 12 07:40:18 pve02 corosync[2205]:   [KNET  ] rx: Packet rejected from 10.10.100.100:5405
Apr 12 07:40:19 pve02 pmxcfs[2115]: [status] notice: received log
Apr 12 07:40:19 pve02 corosync[2205]:   [KNET  ] rx: Packet rejected from 10.10.100.100:5405
Apr 12 07:40:20 pve02 corosync[2205]:   [KNET  ] rx: Packet rejected from 10.10.100.100:5405
Apr 12 07:40:21 pve02 corosync[2205]:   [KNET  ] rx: Packet rejected from 10.10.100.100:5405
Apr 12 07:40:23 pve02 corosync[2205]:   [KNET  ] rx: Packet rejected from 10.10.100.100:5405

this message keeps repeating after these

[KNET ] rx: Packet rejected from 10.10.100.100:5405
 
Okay that might indicate that there still is an issue with the network configuration. Can you check that the current network config given by `ip a` matches the one in `/etc/network/interfaces` and that the nodes are all in the same subnet as they should? An `ifreload -a` should apply the latest config. Also check routing `ip r`.

In addition, verify all your NIC names, just to make sure the packets leave via the correct interface and maybe verify via tcpdump.
 
Please also check if the correct IP is listed in the nodelist of the corosync.conf of the other two nodes.

from the corosync man page
Code:
block_unlisted_ips
    Allow UDPU and KNET to drop packets from IP addresses that are not known (nodes which don't exist in the nodelist) to corosync. Value is yes or no.
    This feature is mainly to protect against the joining of nodes with outdated configurations after a cluster split. Another use case is to allow the atomic merge of two independent clusters.
    Changing the default value is not recommended, the overhead is tiny and an existing cluster may fail if corosync is started on an unlisted node with an old configuration.

    The default value is yes.
 
Gents, apologies for the delayed response, i had a family matter to take care of and finally back at trying to figure out this issue. here is some additional data.

'ip a' for each of the hosts

PVE01

Code:
root@pve01:/dev# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: enp6s0f0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP group default qlen 1000
    link/ether fe:3b:f2:74:50:10 brd ff:ff:ff:ff:ff:ff permaddr 80:61:5f:0e:d1:f6
3: enp0s31f6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master vmbr0 state UP group default qlen 1000
    link/ether 04:d9:f5:ba:2d:0a brd ff:ff:ff:ff:ff:ff
4: enp6s0f1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP group default qlen 1000
    link/ether fe:3b:f2:74:50:10 brd ff:ff:ff:ff:ff:ff permaddr 80:61:5f:0e:d1:f7
5: enp1s0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq master vmbr2 state DOWN group default qlen 1000
    link/ether e0:07:1b:81:c9:80 brd ff:ff:ff:ff:ff:ff
6: enp1s0d1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether e0:07:1b:81:c9:81 brd ff:ff:ff:ff:ff:ff
7: wlo1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether ac:67:5d:5d:cd:07 brd ff:ff:ff:ff:ff:ff
    altname wlp0s20f3
8: vmbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 04:d9:f5:ba:2d:0a brd ff:ff:ff:ff:ff:ff
    inet 10.10.100.100/24 scope global vmbr0
       valid_lft forever preferred_lft forever
    inet6 fe80::6d9:f5ff:feba:2d0a/64 scope link
       valid_lft forever preferred_lft forever
9: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue master vmbr1 state UP group default qlen 1000
    link/ether fe:3b:f2:74:50:10 brd ff:ff:ff:ff:ff:ff
10: vmbr1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether fe:3b:f2:74:50:10 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::fc3b:f2ff:fe74:5010/64 scope link
       valid_lft forever preferred_lft forever
11: vmbr2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
    link/ether e0:07:1b:81:c9:80 brd ff:ff:ff:ff:ff:ff

PVE02
Code:
root@pve02:~# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: enp6s0f0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP group default qlen 1000
    link/ether 80:61:5f:0e:cd:8e brd ff:ff:ff:ff:ff:ff
3: enp0s31f6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master vmbr0 state UP group default qlen 1000
    link/ether 04:d9:f5:ba:6f:a8 brd ff:ff:ff:ff:ff:ff
4: enp6s0f1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP group default qlen 1000
    link/ether 80:61:5f:0e:cd:8e brd ff:ff:ff:ff:ff:ff permaddr 80:61:5f:0e:cd:8f
5: enp1s0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether e0:07:1b:81:49:50 brd ff:ff:ff:ff:ff:ff
6: enp1s0d1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq master vmbr2 state DOWN group default qlen 1000
    link/ether e0:07:1b:81:49:51 brd ff:ff:ff:ff:ff:ff
7: vmbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 04:d9:f5:ba:6f:a8 brd ff:ff:ff:ff:ff:ff
    inet 10.10.100.105/24 scope global vmbr0
       valid_lft forever preferred_lft forever
    inet6 fe80::6d9:f5ff:feba:6fa8/64 scope link
       valid_lft forever preferred_lft forever
8: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue master vmbr1 state UP group default qlen 1000
    link/ether 80:61:5f:0e:cd:8e brd ff:ff:ff:ff:ff:ff
9: vmbr1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 80:61:5f:0e:cd:8e brd ff:ff:ff:ff:ff:ff
    inet6 fe80::8261:5fff:fe0e:cd8e/64 scope link
       valid_lft forever preferred_lft forever
10: vmbr2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
    link/ether e0:07:1b:81:49:51 brd ff:ff:ff:ff:ff:ff

PVE03
Code:
root@pve03:~# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master vmbr0 state UP group default qlen 1000
    link/ether e4:54:e8:c0:b4:80 brd ff:ff:ff:ff:ff:ff
    altname enp0s31f6
3: enp3s0f0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP group default qlen 1000
    link/ether 8e:20:16:f8:d4:7f brd ff:ff:ff:ff:ff:ff permaddr 80:61:5f:07:3a:ce
4: enp3s0f1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP group default qlen 1000
    link/ether 8e:20:16:f8:d4:7f brd ff:ff:ff:ff:ff:ff permaddr 80:61:5f:07:3a:cf
5: enp4s0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether e0:07:1b:81:59:80 brd ff:ff:ff:ff:ff:ff
6: enp4s0d1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq master vmbr2 state DOWN group default qlen 1000
    link/ether e0:07:1b:81:59:81 brd ff:ff:ff:ff:ff:ff
7: vmbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether e4:54:e8:c0:b4:80 brd ff:ff:ff:ff:ff:ff
    inet 10.10.100.110/24 scope global vmbr0
       valid_lft forever preferred_lft forever
    inet6 fe80::e654:e8ff:fec0:b480/64 scope link
       valid_lft forever preferred_lft forever
8: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue master vmbr1 state UP group default qlen 1000
    link/ether 8e:20:16:f8:d4:7f brd ff:ff:ff:ff:ff:ff
9: vmbr1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 8e:20:16:f8:d4:7f brd ff:ff:ff:ff:ff:ff
    inet6 fe80::8c20:16ff:fef8:d47f/64 scope link
       valid_lft forever preferred_lft forever
10: vmbr2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
    link/ether e0:07:1b:81:59:81 brd ff:ff:ff:ff:ff:ff

/etc/network/interfaces for each host
PVE01

Code:
auto lo
iface lo inet loopback

iface enp0s31f6 inet manual
#NiC on motherboard

iface wlo1 inet manual

auto enp1s0
iface enp1s0 inet manual
#10GbE Card Port 0

auto enp1s0d1
iface enp1s0d1 inet manual
#10GbE Card Port 1

auto enp6s0f0
iface enp6s0f0 inet manual
#1GbE Card Port 0

auto enp6s0f1
iface enp6s0f1 inet manual
#1GbE Card Port 1

auto bond0
iface bond0 inet manual
        bond-slaves enp6s0f0 enp6s0f1
        bond-miimon 100
        bond-mode 802.3ad
#       bond-xmit-hash-policy layer2+3
#       bond-downdelay 200
#       bond-updelay 200
#       bond-lacp-rate 1
#LACP of 1GbE Card Ports

auto vmbr0
iface vmbr0 inet manual
        address 10.10.100.100/24
        gateway 10.10.100.1
        bridge-ports enp0s31f6
        bridge-stp off
        bridge-fd 0
#Bridge on motherboard NIC

auto vmbr1
iface vmbr1 inet static
        bridge-ports bond0
        bridge-stp off
        bridge-fd 0
        bridge-vlan-aware yes
        bridge-vids 2-4094
#Bridge on bond0

auto vmbr2
iface vmbr2 inet manual
        bridge-ports enp1s0
        bridge-stp off
        bridge-fd 0
        bridge-vlan-aware yes
        bridge-vids 2-4094
#Bridge for 10GbE Ports

PVE02
Code:
auto lo
iface lo inet loopback

iface enp0s31f6 inet manual
#NiC on motherboard

auto enp6s0f0
iface enp6s0f0 inet manual
#1GbE Card Port 0

auto enp6s0f1
iface enp6s0f1 inet manual
#1GbE Card Port 1

auto enp1s0
iface enp1s0 inet manual
#10GbE Card Port 0

auto enp1s0d1
iface enp1s0d1 inet manual
#10GbE Card Port 1

auto bond0
iface bond0 inet manual
        bond-slaves enp6s0f0 enp6s0f1
        bond-miimon 100
        bond-mode 802.3ad
#LACP of 1GbE Card Ports

auto vmbr0
iface vmbr0 inet static
        address 10.10.100.105/24
        gateway 10.10.100.1
        bridge-ports enp0s31f6
        bridge-stp off
        bridge-fd 0
#Bridge on motherboard NIC

auto vmbr1
iface vmbr1 inet manual
        bridge-ports bond0
        bridge-stp off
        bridge-fd 0
        bridge-vlan-aware yes
        bridge-vids 2-4094
#Bridge on bond0

auto vmbr2
iface vmbr2 inet manual
        bridge-ports enp1s0d1
        bridge-stp off
        bridge-fd 0
        bridge-vlan-aware yes
        bridge-vids 2-4094
#Bridge on 10Gb port

PVE03
Code:
auto lo
iface lo inet loopback

iface eno1 inet manual
#NiC on motherboard

auto enp3s0f0
iface enp3s0f0 inet manual
#1GbE Card Port 0

auto enp3s0f1
iface enp3s0f1 inet manual
#1GbE Card Port 1

auto enp4s0
iface enp4s0 inet manual
#10GbE Card Port 0

auto enp4s0d1
iface enp4s0d1 inet manual
#10GbE Card Port 1

auto bond0
iface bond0 inet manual
        bond-slaves enp3s0f0 enp3s0f1
        bond-miimon 100
        bond-mode 802.3ad
#LACP of 1GbE Card Ports

auto vmbr0
iface vmbr0 inet static
        address 10.10.100.110/24
        gateway 10.10.100.1
        bridge-ports eno1
        bridge-stp off
        bridge-fd 0
#Bridge on motherboard NIC

auto vmbr1
iface vmbr1 inet manual
        bridge-ports bond0
        bridge-stp off
        bridge-fd 0
        bridge-vlan-aware yes
        bridge-vids 2-4094
#Bridge on bond0

auto vmbr2
iface vmbr2 inet manual
        bridge-ports enp4s0d1
        bridge-stp off
        bridge-fd 0
        bridge-vlan-aware yes
        bridge-vids 2-4094
#Bridge for 10GbE Ports


'ip r' for all hosts
PVE01

Code:
root@pve01:/dev# ip r
default via 10.10.100.1 dev vmbr0 proto kernel onlink
10.10.100.0/24 dev vmbr0 proto kernel scope link src 10.10.100.100

PVE03
Code:
root@pve02:~# ip r
default via 10.10.100.1 dev vmbr0 proto kernel onlink
10.10.100.0/24 dev vmbr0 proto kernel scope link src 10.10.100.105

PVE03
Code:
root@pve03:~# ip r
default via 10.10.100.1 dev vmbr0 proto kernel onlink
10.10.100.0/24 dev vmbr0 proto kernel scope link src 10.10.100.110


corosync.conf for each host
PVE01

Code:
root@pve01:/dev# cat /etc/pve/corosync.conf
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: pve01
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 10.10.100.100
  }
  node {
    name: pve02
    nodeid: 2
    quorum_votes: 1
    ring0_addr: 10.10.100.105
  }
  node {
    name: pve03
    nodeid: 4
    quorum_votes: 1
    ring0_addr: 10.10.100.110
  }
}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: proxmox
  config_version: 6
  interface {
    linknumber: 0
  }
  ip_version: ipv4-6
  link_mode: passive
  secauth: on
  version: 2
}

PVE02
Code:
root@pve02:~# cat /etc/pve/corosync.conf
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: pve01
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 10.10.100.100
  }
  node {
    name: pve02
    nodeid: 2
    quorum_votes: 1
    ring0_addr: 10.10.100.105
  }
  node {
    name: pve03
    nodeid: 4
    quorum_votes: 1
    ring0_addr: 10.10.100.110
  }
}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: proxmox
  config_version: 5
  interface {
    linknumber: 0
  }
  ip_version: ipv4-6
  link_mode: passive
  secauth: on
  version: 2
}

PVE03
Code:
root@pve03:~# cat /etc/pve/corosync.conf
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: pve01
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 10.10.100.100
  }
  node {
    name: pve02
    nodeid: 2
    quorum_votes: 1
    ring0_addr: 10.10.100.105
  }
  node {
    name: pve03
    nodeid: 4
    quorum_votes: 1
    ring0_addr: 10.10.100.110
  }
}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: proxmox
  config_version: 5
  interface {
    linknumber: 0
  }
  ip_version: ipv4-6
  link_mode: passive
  secauth: on
  version: 2
}
 
Last edited:
i also checked on PVE01 if it was using the interface and it does look like it is using the vmbr0 interface to try to talk to the other hosts:

Code:
root@pve01:/dev# tcpdump --interface any -c 5 host 10.10.100.105
tcpdump: data link type LINUX_SLL2
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
21:09:13.462185 vmbr0 Out IP pve01.local.beachbox.casa.5405 > pve02.local.beachbox.casa.5405: UDP, length 80
21:09:13.462186 enp0s31f6 Out IP pve01.local.beachbox.casa.5405 > pve02.local.beachbox.casa.5405: UDP, length 80
21:09:14.463554 vmbr0 Out IP pve01.local.beachbox.casa.5405 > pve02.local.beachbox.casa.5405: UDP, length 80
21:09:14.463555 enp0s31f6 Out IP pve01.local.beachbox.casa.5405 > pve02.local.beachbox.casa.5405: UDP, length 80
21:09:15.464892 vmbr0 Out IP pve01.local.beachbox.casa.5405 > pve02.local.beachbox.casa.5405: UDP, length 80
5 packets captured
8 packets received by filter
0 packets dropped by kernel
root@pve01:/dev# tcpdump --interface any -c 5 host 10.10.100.110
tcpdump: data link type LINUX_SLL2
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
21:09:25.482175 vmbr0 Out IP pve01.local.beachbox.casa.5405 > pve03.local.beachbox.casa.5405: UDP, length 80
21:09:25.482179 enp0s31f6 Out IP pve01.local.beachbox.casa.5405 > pve03.local.beachbox.casa.5405: UDP, length 80
21:09:26.484023 vmbr0 Out IP pve01.local.beachbox.casa.5405 > pve03.local.beachbox.casa.5405: UDP, length 80
21:09:26.484029 enp0s31f6 Out IP pve01.local.beachbox.casa.5405 > pve03.local.beachbox.casa.5405: UDP, length 80
21:09:27.485817 vmbr0 Out IP pve01.local.beachbox.casa.5405 > pve03.local.beachbox.casa.5405: UDP, length 80
5 packets captured
8 packets received by filter
0 packets dropped by kernel
 
I changed the config file on PVE02 and the two surviving nodes replicated the corosync config without issues, the other node (PVE01) did not get the changes. i tried stopping and starting the corosync service on all the hosts and still nothing reaches PVE01. really strange... here is some output from the configs

PVE01

Code:
root@pve01:~# cat /etc/pve/corosync.conf
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: pve01
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 10.10.100.100
  }
  node {
    name: pve02
    nodeid: 2
    quorum_votes: 1
    ring0_addr: 10.10.100.105
  }
  node {
    name: pve03
    nodeid: 4
    quorum_votes: 1
    ring0_addr: 10.10.100.110
  }
}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: proxmox
  config_version: 6
  interface {
    linknumber: 0
  }
  ip_version: ipv4-6
  link_mode: passive
  secauth: on
  version: 2
}

PVE02

Code:
root@pve02:~# cat /etc/pve/corosync.conf
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: pve01
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 10.10.100.100
  }
  node {
    name: pve02
    nodeid: 2
    quorum_votes: 1
    ring0_addr: 10.10.100.105
  }
  node {
    name: pve03
    nodeid: 4
    quorum_votes: 1
    ring0_addr: 10.10.100.110
  }
}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: proxmox
  config_version: 8
  interface {
    linknumber: 0
  }
  ip_version: ipv4-6
  link_mode: passive
  secauth: on
  version: 2
}

PVE03
Code:
root@pve03:~# cat /etc/pve/corosync.conf
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: pve01
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 10.10.100.100
  }
  node {
    name: pve02
    nodeid: 2
    quorum_votes: 1
    ring0_addr: 10.10.100.105
  }
  node {
    name: pve03
    nodeid: 4
    quorum_votes: 1
    ring0_addr: 10.10.100.110
  }
}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: proxmox
  config_version: 8
  interface {
    linknumber: 0
  }
  ip_version: ipv4-6
  link_mode: passive
  secauth: on
  version: 2
}
 
i'm not very knowledgeable on corosync, but saw these entries in the journalctl which makes it look like the pve01 node is creating it's own cluster (maybe)?

PVE01
Code:
root@pve01:/etc/pve/priv# journalctl -b -u corosync | grep TOTEM
Apr 17 21:33:31 pve01 corosync[2114]:   [TOTEM ] Initializing transport (Kronosnet).
Apr 17 21:33:32 pve01 corosync[2114]:   [TOTEM ] totemknet initialized
Apr 17 21:33:32 pve01 corosync[2114]:   [TOTEM ] Configuring link 0
Apr 17 21:33:32 pve01 corosync[2114]:   [TOTEM ] Configured link number 0: local addr: 10.10.100.100, port=5405
Apr 17 21:33:32 pve01 corosync[2114]:   [TOTEM ] A new membership (1.9ff) was formed. Members joined: 1
Apr 18 07:53:41 pve01 corosync[124256]:   [TOTEM ] Initializing transport (Kronosnet).
Apr 18 07:53:41 pve01 corosync[124256]:   [TOTEM ] totemknet initialized
Apr 18 07:53:41 pve01 corosync[124256]:   [TOTEM ] Configuring link 0
Apr 18 07:53:41 pve01 corosync[124256]:   [TOTEM ] Configured link number 0: local addr: 10.10.100.100, port=5405
Apr 18 07:53:41 pve01 corosync[124256]:   [TOTEM ] A new membership (1.a04) was formed. Members joined: 1
Apr 18 07:53:44 pve01 corosync[124305]:   [TOTEM ] Initializing transport (Kronosnet).
Apr 18 07:53:44 pve01 corosync[124305]:   [TOTEM ] totemknet initialized
Apr 18 07:53:44 pve01 corosync[124305]:   [TOTEM ] Configuring link 0
Apr 18 07:53:44 pve01 corosync[124305]:   [TOTEM ] Configured link number 0: local addr: 10.10.100.100, port=5405
Apr 18 07:53:44 pve01 corosync[124305]:   [TOTEM ] A new membership (1.a09) was formed. Members joined: 1
Apr 18 07:58:39 pve01 corosync[210147]:   [TOTEM ] Initializing transport (Kronosnet).
Apr 18 07:58:40 pve01 corosync[210147]:   [TOTEM ] totemknet initialized
Apr 18 07:58:40 pve01 corosync[210147]:   [TOTEM ] Configuring link 0
Apr 18 07:58:40 pve01 corosync[210147]:   [TOTEM ] Configured link number 0: local addr: 10.10.100.100, port=5405
Apr 18 07:58:40 pve01 corosync[210147]:   [TOTEM ] A new membership (1.a0e) was formed. Members joined: 1


PVE02
Code:
root@pve02:~# journalctl -b -u corosync | grep TOTEM
Apr 12 07:29:16 pve02 corosync[2205]:   [TOTEM ] Initializing transport (Kronosnet).
Apr 12 07:29:17 pve02 corosync[2205]:   [TOTEM ] totemknet initialized
Apr 12 07:29:17 pve02 corosync[2205]:   [TOTEM ] Configuring link 0
Apr 12 07:29:17 pve02 corosync[2205]:   [TOTEM ] Configured link number 0: local addr: 10.10.100.105, port=5405
Apr 12 07:29:17 pve02 corosync[2205]:   [TOTEM ] A new membership (2.9d5) was formed. Members joined: 2
Apr 12 07:40:13 pve02 corosync[2205]:   [TOTEM ] A new membership (2.9d9) was formed. Members joined: 4
Apr 12 13:31:33 pve02 corosync[2205]:   [TOTEM ] Token has not been received in 2737 ms
Apr 12 15:12:35 pve02 corosync[2205]:   [TOTEM ] Token has not been received in 2737 ms
Apr 12 15:14:22 pve02 corosync[2205]:   [TOTEM ] Token has not been received in 2737 ms
Apr 12 15:20:39 pve02 corosync[2205]:   [TOTEM ] Token has not been received in 2737 ms
Apr 12 18:21:05 pve02 corosync[2205]:   [TOTEM ] Token has not been received in 2737 ms
Apr 12 20:11:49 pve02 corosync[2205]:   [TOTEM ] Token has not been received in 2737 ms
Apr 12 21:24:20 pve02 corosync[1431551]:   [TOTEM ] Initializing transport (Kronosnet).
Apr 12 21:24:20 pve02 corosync[1431551]:   [TOTEM ] totemknet initialized
Apr 12 21:24:20 pve02 corosync[1431551]:   [TOTEM ] Configuring link 0
Apr 12 21:24:20 pve02 corosync[1431551]:   [TOTEM ] Configured link number 0: local addr: 10.10.100.105, port=5405
Apr 12 21:24:20 pve02 corosync[1431551]:   [TOTEM ] A new membership (2.9de) was formed. Members joined: 2
Apr 12 21:24:27 pve02 corosync[1431551]:   [TOTEM ] A new membership (2.9e6) was formed. Members joined: 4
Apr 12 23:34:54 pve02 corosync[1431551]:   [TOTEM ] Token has not been received in 2737 ms
Apr 13 06:48:42 pve02 corosync[1431551]:   [TOTEM ] Retransmit List: 15500
Apr 13 14:03:59 pve02 corosync[1431551]:   [TOTEM ] Token has not been received in 2737 ms
Apr 13 14:09:46 pve02 corosync[1431551]:   [TOTEM ] Token has not been received in 2737 ms
Apr 13 15:54:15 pve02 corosync[1431551]:   [TOTEM ] Token has not been received in 2737 ms
Apr 13 23:12:31 pve02 corosync[1431551]:   [TOTEM ] Token has not been received in 2737 ms
Apr 14 00:20:07 pve02 corosync[1431551]:   [TOTEM ] Token has not been received in 2737 ms
Apr 14 05:40:29 pve02 corosync[1431551]:   [TOTEM ] Token has not been received in 2737 ms
Apr 14 09:56:23 pve02 corosync[1431551]:   [TOTEM ] Token has not been received in 2737 ms
Apr 14 13:12:58 pve02 corosync[1431551]:   [TOTEM ] Token has not been received in 2737 ms
Apr 14 14:18:17 pve02 corosync[1431551]:   [TOTEM ] Retransmit List: 5ccd2
Apr 14 15:18:27 pve02 corosync[1431551]:   [TOTEM ] Retransmit List: 5f14f
Apr 14 16:44:02 pve02 corosync[1431551]:   [TOTEM ] Token has not been received in 2737 ms
Apr 14 17:15:58 pve02 corosync[1431551]:   [TOTEM ] Token has not been received in 2737 ms
Apr 14 18:09:06 pve02 corosync[1431551]:   [TOTEM ] Token has not been received in 2737 ms
Apr 14 18:13:18 pve02 corosync[1431551]:   [TOTEM ] Token has not been received in 2737 ms
Apr 14 18:14:48 pve02 corosync[1431551]:   [TOTEM ] Token has not been received in 2737 ms
Apr 14 19:48:58 pve02 corosync[1431551]:   [TOTEM ] Token has not been received in 2737 ms
Apr 14 20:40:00 pve02 corosync[1431551]:   [TOTEM ] Token has not been received in 2737 ms
Apr 14 21:17:28 pve02 corosync[1431551]:   [TOTEM ] Token has not been received in 2737 ms
Apr 14 21:31:45 pve02 corosync[1431551]:   [TOTEM ] Token has not been received in 2737 ms
Apr 14 23:35:05 pve02 corosync[1431551]:   [TOTEM ] Token has not been received in 2737 ms
Apr 15 05:35:09 pve02 corosync[1431551]:   [TOTEM ] Token has not been received in 2737 ms
Apr 15 07:23:24 pve02 corosync[1431551]:   [TOTEM ] Token has not been received in 2737 ms
Apr 15 10:37:22 pve02 corosync[1431551]:   [TOTEM ] Token has not been received in 2737 ms
Apr 15 13:40:38 pve02 corosync[1431551]:   [TOTEM ] Retransmit List: 91ee4
Apr 15 21:38:27 pve02 corosync[1431551]:   [TOTEM ] Token has not been received in 2737 ms
Apr 16 04:28:59 pve02 corosync[1431551]:   [TOTEM ] Token has not been received in 2737 ms
Apr 16 08:52:31 pve02 corosync[1431551]:   [TOTEM ] Token has not been received in 2737 ms
Apr 16 16:21:59 pve02 corosync[1431551]:   [TOTEM ] Token has not been received in 2737 ms
Apr 16 17:03:27 pve02 corosync[1431551]:   [TOTEM ] Token has not been received in 2737 ms
Apr 16 21:29:49 pve02 corosync[1431551]:   [TOTEM ] Token has not been received in 2737 ms
Apr 16 21:58:21 pve02 corosync[1431551]:   [TOTEM ] Token has not been received in 2737 ms
Apr 17 01:00:25 pve02 corosync[1431551]:   [TOTEM ] Token has not been received in 2737 ms
Apr 17 01:54:41 pve02 corosync[1431551]:   [TOTEM ] Token has not been received in 2737 ms
Apr 17 02:46:03 pve02 corosync[1431551]:   [TOTEM ] Token has not been received in 2737 ms
Apr 17 03:05:25 pve02 corosync[1431551]:   [TOTEM ] Token has not been received in 2737 ms
Apr 17 06:50:55 pve02 corosync[1431551]:   [TOTEM ] Token has not been received in 2737 ms
Apr 17 13:41:01 pve02 corosync[1431551]:   [TOTEM ] Token has not been received in 2737 ms
Apr 17 14:10:37 pve02 corosync[1431551]:   [TOTEM ] Retransmit List: ffd46
Apr 17 15:26:31 pve02 corosync[1431551]:   [TOTEM ] Token has not been received in 2737 ms
Apr 18 01:54:42 pve02 corosync[1431551]:   [TOTEM ] Token has not been received in 2737 ms
Apr 18 04:18:11 pve02 corosync[1431551]:   [TOTEM ] Token has not been received in 2737 ms
Apr 18 07:58:32 pve02 corosync[1331306]:   [TOTEM ] Initializing transport (Kronosnet).
Apr 18 07:58:32 pve02 corosync[1331306]:   [TOTEM ] totemknet initialized
Apr 18 07:58:32 pve02 corosync[1331306]:   [TOTEM ] Configuring link 0
Apr 18 07:58:32 pve02 corosync[1331306]:   [TOTEM ] Configured link number 0: local addr: 10.10.100.105, port=5405
Apr 18 07:58:32 pve02 corosync[1331306]:   [TOTEM ] A new membership (2.9eb) was formed. Members joined: 2
Apr 18 07:58:40 pve02 corosync[1331306]:   [TOTEM ] A new membership (2.9f3) was formed. Members joined: 4
 
Could you provide the output of corosync-cmapctl for all nodes?

Please try the following by running each step on all nodes before proceeding with the next step :
  • stop corosync and pmxcfs, systemctl stop corosync pve-cluster
  • start corosync, systemctl start corosync
  • give full journalctl output of corosync, journalctl -u corosync
only then start pmxcfs again.
 
Last edited:
here is the output:

PVE01
Code:
root@pve01:/etc/pve/priv# corosync-cmapctl
internal_configuration.service.0.name (str) = corosync_cmap
internal_configuration.service.0.ver (u32) = 0
internal_configuration.service.1.name (str) = corosync_cfg
internal_configuration.service.1.ver (u32) = 0
internal_configuration.service.2.name (str) = corosync_cpg
internal_configuration.service.2.ver (u32) = 0
internal_configuration.service.3.name (str) = corosync_quorum
internal_configuration.service.3.ver (u32) = 0
internal_configuration.service.4.name (str) = corosync_pload
internal_configuration.service.4.ver (u32) = 0
internal_configuration.service.5.name (str) = corosync_votequorum
internal_configuration.service.5.ver (u32) = 0
internal_configuration.service.6.name (str) = corosync_mon
internal_configuration.service.6.ver (u32) = 0
internal_configuration.service.7.name (str) = corosync_wd
internal_configuration.service.7.ver (u32) = 0
logging.debug (str) = off
logging.to_syslog (str) = yes
nodelist.local_node_pos (u32) = 0
nodelist.node.0.name (str) = pve01
nodelist.node.0.nodeid (u32) = 1
nodelist.node.0.quorum_votes (u32) = 1
nodelist.node.0.ring0_addr (str) = 10.10.100.100
nodelist.node.1.name (str) = pve02
nodelist.node.1.nodeid (u32) = 2
nodelist.node.1.quorum_votes (u32) = 1
nodelist.node.1.ring0_addr (str) = 10.10.100.105
nodelist.node.2.name (str) = pve03
nodelist.node.2.nodeid (u32) = 4
nodelist.node.2.quorum_votes (u32) = 1
nodelist.node.2.ring0_addr (str) = 10.10.100.110
quorum.provider (str) = corosync_votequorum
resources.system.load_15min.current (dbl) = 0.000000
resources.system.load_15min.last_updated (u64) = 0
resources.system.load_15min.poll_period (u64) = 3000
resources.system.load_15min.state (str) = stopped
resources.system.memory_used.current (i32) = 0
resources.system.memory_used.last_updated (u64) = 0
resources.system.memory_used.poll_period (u64) = 3000
resources.system.memory_used.state (str) = stopped
resources.watchdog_timeout (u32) = 6
runtime.blackbox.dump_flight_data (str) = no
runtime.blackbox.dump_state (str) = no
runtime.config.totem.block_unlisted_ips (u32) = 1
runtime.config.totem.cancel_token_hold_on_retransmit (u32) = 0
runtime.config.totem.consensus (u32) = 4380
runtime.config.totem.downcheck (u32) = 1000
runtime.config.totem.fail_recv_const (u32) = 2500
runtime.config.totem.heartbeat_failures_allowed (u32) = 0
runtime.config.totem.hold (u32) = 685
runtime.config.totem.interface.0.knet_ping_interval (u32) = 912
runtime.config.totem.interface.0.knet_ping_timeout (u32) = 1825
runtime.config.totem.join (u32) = 50
runtime.config.totem.knet_compression_level (i32) = 0
runtime.config.totem.knet_compression_model (str) = none
runtime.config.totem.knet_compression_threshold (u32) = 0
runtime.config.totem.knet_mtu (u32) = 0
runtime.config.totem.knet_pmtud_interval (u32) = 30
runtime.config.totem.max_messages (u32) = 17
runtime.config.totem.max_network_delay (u32) = 50
runtime.config.totem.merge (u32) = 200
runtime.config.totem.miss_count_const (u32) = 5
runtime.config.totem.send_join (u32) = 0
runtime.config.totem.seqno_unchanged_const (u32) = 30
runtime.config.totem.token (u32) = 3650
runtime.config.totem.token_retransmit (u32) = 869
runtime.config.totem.token_retransmits_before_loss_const (u32) = 4
runtime.config.totem.token_warning (u32) = 75
runtime.config.totem.window_size (u32) = 50
runtime.force_gather (str) = no
runtime.members.1.config_version (u64) = 6
runtime.members.1.ip (str) = r(0) ip(10.10.100.100)
runtime.members.1.join_count (u32) = 1
runtime.members.1.status (str) = joined
runtime.services.cfg.0.rx (u64) = 0
runtime.services.cfg.0.tx (u64) = 0
runtime.services.cfg.1.rx (u64) = 0
runtime.services.cfg.1.tx (u64) = 0
runtime.services.cfg.2.rx (u64) = 0
runtime.services.cfg.2.tx (u64) = 0
runtime.services.cfg.3.rx (u64) = 0
runtime.services.cfg.3.tx (u64) = 0
runtime.services.cfg.4.rx (u64) = 0
runtime.services.cfg.4.tx (u64) = 0
runtime.services.cfg.service_id (u16) = 1
runtime.services.cmap.0.rx (u64) = 1
runtime.services.cmap.0.tx (u64) = 1
runtime.services.cmap.service_id (u16) = 0
runtime.services.cpg.0.rx (u64) = 2
runtime.services.cpg.0.tx (u64) = 2
runtime.services.cpg.1.rx (u64) = 0
runtime.services.cpg.1.tx (u64) = 0
runtime.services.cpg.2.rx (u64) = 0
runtime.services.cpg.2.tx (u64) = 0
runtime.services.cpg.3.rx (u64) = 20294
runtime.services.cpg.3.tx (u64) = 20294
runtime.services.cpg.4.rx (u64) = 0
runtime.services.cpg.4.tx (u64) = 0
runtime.services.cpg.5.rx (u64) = 1
runtime.services.cpg.5.tx (u64) = 1
runtime.services.cpg.6.rx (u64) = 0
runtime.services.cpg.6.tx (u64) = 0
runtime.services.cpg.service_id (u16) = 2
runtime.services.mon.service_id (u16) = 6
runtime.services.pload.0.rx (u64) = 0
runtime.services.pload.0.tx (u64) = 0
runtime.services.pload.1.rx (u64) = 0
runtime.services.pload.1.tx (u64) = 0
runtime.services.pload.service_id (u16) = 4
runtime.services.quorum.service_id (u16) = 3
runtime.services.votequorum.0.rx (u64) = 3
runtime.services.votequorum.0.tx (u64) = 2
runtime.services.votequorum.1.rx (u64) = 0
runtime.services.votequorum.1.tx (u64) = 0
runtime.services.votequorum.2.rx (u64) = 0
runtime.services.votequorum.2.tx (u64) = 0
runtime.services.votequorum.3.rx (u64) = 0
runtime.services.votequorum.3.tx (u64) = 0
runtime.services.votequorum.service_id (u16) = 5
runtime.services.wd.service_id (u16) = 7
runtime.votequorum.ev_barrier (u32) = 3
runtime.votequorum.this_node_id (u32) = 1
runtime.votequorum.two_node (u8) = 0
totem.cluster_name (str) = proxmox
totem.config_version (u64) = 6
totem.interface.0.bindnetaddr (str) = 10.10.100.100
totem.ip_version (str) = ipv4-6
totem.link_mode (str) = passive
totem.secauth (str) = on
totem.version (u32) = 2

PVE02
Code:
root@pve02:~# corosync-cmapctl
internal_configuration.service.0.name (str) = corosync_cmap
internal_configuration.service.0.ver (u32) = 0
internal_configuration.service.1.name (str) = corosync_cfg
internal_configuration.service.1.ver (u32) = 0
internal_configuration.service.2.name (str) = corosync_cpg
internal_configuration.service.2.ver (u32) = 0
internal_configuration.service.3.name (str) = corosync_quorum
internal_configuration.service.3.ver (u32) = 0
internal_configuration.service.4.name (str) = corosync_pload
internal_configuration.service.4.ver (u32) = 0
internal_configuration.service.5.name (str) = corosync_votequorum
internal_configuration.service.5.ver (u32) = 0
internal_configuration.service.6.name (str) = corosync_mon
internal_configuration.service.6.ver (u32) = 0
internal_configuration.service.7.name (str) = corosync_wd
internal_configuration.service.7.ver (u32) = 0
logging.debug (str) = off
logging.to_syslog (str) = yes
nodelist.local_node_pos (u32) = 1
nodelist.node.0.name (str) = pve01
nodelist.node.0.nodeid (u32) = 1
nodelist.node.0.quorum_votes (u32) = 1
nodelist.node.0.ring0_addr (str) = 10.10.1.100
nodelist.node.1.name (str) = pve02
nodelist.node.1.nodeid (u32) = 2
nodelist.node.1.quorum_votes (u32) = 1
nodelist.node.1.ring0_addr (str) = 10.10.100.105
nodelist.node.2.name (str) = pve03
nodelist.node.2.nodeid (u32) = 4
nodelist.node.2.quorum_votes (u32) = 1
nodelist.node.2.ring0_addr (str) = 10.10.100.110
quorum.provider (str) = corosync_votequorum
resources.system.load_15min.current (dbl) = 0.000000
resources.system.load_15min.last_updated (u64) = 0
resources.system.load_15min.poll_period (u64) = 3000
resources.system.load_15min.state (str) = stopped
resources.system.memory_used.current (i32) = 0
resources.system.memory_used.last_updated (u64) = 0
resources.system.memory_used.poll_period (u64) = 3000
resources.system.memory_used.state (str) = stopped
resources.watchdog_timeout (u32) = 6
runtime.blackbox.dump_flight_data (str) = no
runtime.blackbox.dump_state (str) = no
runtime.config.totem.block_unlisted_ips (u32) = 1
runtime.config.totem.cancel_token_hold_on_retransmit (u32) = 0
runtime.config.totem.consensus (u32) = 4380
runtime.config.totem.downcheck (u32) = 1000
runtime.config.totem.fail_recv_const (u32) = 2500
runtime.config.totem.heartbeat_failures_allowed (u32) = 0
runtime.config.totem.hold (u32) = 685
runtime.config.totem.interface.0.knet_ping_interval (u32) = 912
runtime.config.totem.interface.0.knet_ping_timeout (u32) = 1825
runtime.config.totem.join (u32) = 50
runtime.config.totem.knet_compression_level (i32) = 0
runtime.config.totem.knet_compression_model (str) = none
runtime.config.totem.knet_compression_threshold (u32) = 0
runtime.config.totem.knet_mtu (u32) = 0
runtime.config.totem.knet_pmtud_interval (u32) = 30
runtime.config.totem.max_messages (u32) = 17
runtime.config.totem.max_network_delay (u32) = 50
runtime.config.totem.merge (u32) = 200
runtime.config.totem.miss_count_const (u32) = 5
runtime.config.totem.send_join (u32) = 0
runtime.config.totem.seqno_unchanged_const (u32) = 30
runtime.config.totem.token (u32) = 3650
runtime.config.totem.token_retransmit (u32) = 869
runtime.config.totem.token_retransmits_before_loss_const (u32) = 4
runtime.config.totem.token_warning (u32) = 75
runtime.config.totem.window_size (u32) = 50
runtime.force_gather (str) = no
runtime.members.2.config_version (u64) = 10
runtime.members.2.ip (str) = r(0) ip(10.10.100.105)
runtime.members.2.join_count (u32) = 1
runtime.members.2.status (str) = joined
runtime.members.4.config_version (u64) = 10
runtime.members.4.ip (str) = r(0) ip(10.10.100.110)
runtime.members.4.join_count (u32) = 1
runtime.members.4.status (str) = joined
runtime.services.cfg.0.rx (u64) = 0
runtime.services.cfg.0.tx (u64) = 0
runtime.services.cfg.1.rx (u64) = 0
runtime.services.cfg.1.tx (u64) = 0
runtime.services.cfg.2.rx (u64) = 0
runtime.services.cfg.2.tx (u64) = 0
runtime.services.cfg.3.rx (u64) = 0
runtime.services.cfg.3.tx (u64) = 0
runtime.services.cfg.4.rx (u64) = 0
runtime.services.cfg.4.tx (u64) = 0
runtime.services.cfg.service_id (u16) = 1
runtime.services.cmap.0.rx (u64) = 3
runtime.services.cmap.0.tx (u64) = 2
runtime.services.cmap.service_id (u16) = 0
runtime.services.cpg.0.rx (u64) = 4
runtime.services.cpg.0.tx (u64) = 2
runtime.services.cpg.1.rx (u64) = 0
runtime.services.cpg.1.tx (u64) = 0
runtime.services.cpg.2.rx (u64) = 0
runtime.services.cpg.2.tx (u64) = 0
runtime.services.cpg.3.rx (u64) = 48313
runtime.services.cpg.3.tx (u64) = 24774
runtime.services.cpg.4.rx (u64) = 0
runtime.services.cpg.4.tx (u64) = 0
runtime.services.cpg.5.rx (u64) = 3
runtime.services.cpg.5.tx (u64) = 2
runtime.services.cpg.6.rx (u64) = 0
runtime.services.cpg.6.tx (u64) = 0
runtime.services.cpg.service_id (u16) = 2
runtime.services.mon.service_id (u16) = 6
runtime.services.pload.0.rx (u64) = 0
runtime.services.pload.0.tx (u64) = 0
runtime.services.pload.1.rx (u64) = 0
runtime.services.pload.1.tx (u64) = 0
runtime.services.pload.service_id (u16) = 4
runtime.services.quorum.service_id (u16) = 3
runtime.services.votequorum.0.rx (u64) = 7
runtime.services.votequorum.0.tx (u64) = 4
runtime.services.votequorum.1.rx (u64) = 0
runtime.services.votequorum.1.tx (u64) = 0
runtime.services.votequorum.2.rx (u64) = 0
runtime.services.votequorum.2.tx (u64) = 0
runtime.services.votequorum.3.rx (u64) = 0
runtime.services.votequorum.3.tx (u64) = 0
runtime.services.votequorum.service_id (u16) = 5
runtime.services.wd.service_id (u16) = 7
runtime.votequorum.ev_barrier (u32) = 3
runtime.votequorum.highest_node_id (u32) = 4
runtime.votequorum.lowest_node_id (u32) = 2
runtime.votequorum.this_node_id (u32) = 2
runtime.votequorum.two_node (u8) = 0
totem.cluster_name (str) = proxmox
totem.config_version (u64) = 10
totem.interface.0.bindnetaddr (str) = 10.10.100.105
totem.ip_version (str) = ipv4-6
totem.link_mode (str) = passive
totem.secauth (str) = on
totem.version (u32) = 2
 
PVE03
Code:
root@pve03:~# corosync-cmapctl
internal_configuration.service.0.name (str) = corosync_cmap
internal_configuration.service.0.ver (u32) = 0
internal_configuration.service.1.name (str) = corosync_cfg
internal_configuration.service.1.ver (u32) = 0
internal_configuration.service.2.name (str) = corosync_cpg
internal_configuration.service.2.ver (u32) = 0
internal_configuration.service.3.name (str) = corosync_quorum
internal_configuration.service.3.ver (u32) = 0
internal_configuration.service.4.name (str) = corosync_pload
internal_configuration.service.4.ver (u32) = 0
internal_configuration.service.5.name (str) = corosync_votequorum
internal_configuration.service.5.ver (u32) = 0
internal_configuration.service.6.name (str) = corosync_mon
internal_configuration.service.6.ver (u32) = 0
internal_configuration.service.7.name (str) = corosync_wd
internal_configuration.service.7.ver (u32) = 0
logging.debug (str) = off
logging.to_syslog (str) = yes
nodelist.local_node_pos (u32) = 2
nodelist.node.0.name (str) = pve01
nodelist.node.0.nodeid (u32) = 1
nodelist.node.0.quorum_votes (u32) = 1
nodelist.node.0.ring0_addr (str) = 10.10.1.100
nodelist.node.1.name (str) = pve02
nodelist.node.1.nodeid (u32) = 2
nodelist.node.1.quorum_votes (u32) = 1
nodelist.node.1.ring0_addr (str) = 10.10.100.105
nodelist.node.2.name (str) = pve03
nodelist.node.2.nodeid (u32) = 4
nodelist.node.2.quorum_votes (u32) = 1
nodelist.node.2.ring0_addr (str) = 10.10.100.110
quorum.provider (str) = corosync_votequorum
resources.system.load_15min.current (dbl) = 0.000000
resources.system.load_15min.last_updated (u64) = 0
resources.system.load_15min.poll_period (u64) = 3000
resources.system.load_15min.state (str) = stopped
resources.system.memory_used.current (i32) = 0
resources.system.memory_used.last_updated (u64) = 0
resources.system.memory_used.poll_period (u64) = 3000
resources.system.memory_used.state (str) = stopped
resources.watchdog_timeout (u32) = 6
runtime.blackbox.dump_flight_data (str) = no
runtime.blackbox.dump_state (str) = no
runtime.config.totem.block_unlisted_ips (u32) = 1
runtime.config.totem.cancel_token_hold_on_retransmit (u32) = 0
runtime.config.totem.consensus (u32) = 4380
runtime.config.totem.downcheck (u32) = 1000
runtime.config.totem.fail_recv_const (u32) = 2500
runtime.config.totem.heartbeat_failures_allowed (u32) = 0
runtime.config.totem.hold (u32) = 685
runtime.config.totem.interface.0.knet_ping_interval (u32) = 912
runtime.config.totem.interface.0.knet_ping_timeout (u32) = 1825
runtime.config.totem.join (u32) = 50
runtime.config.totem.knet_compression_level (i32) = 0
runtime.config.totem.knet_compression_model (str) = none
runtime.config.totem.knet_compression_threshold (u32) = 0
runtime.config.totem.knet_mtu (u32) = 0
runtime.config.totem.knet_pmtud_interval (u32) = 30
runtime.config.totem.max_messages (u32) = 17
runtime.config.totem.max_network_delay (u32) = 50
runtime.config.totem.merge (u32) = 200
runtime.config.totem.miss_count_const (u32) = 5
runtime.config.totem.send_join (u32) = 0
runtime.config.totem.seqno_unchanged_const (u32) = 30
runtime.config.totem.token (u32) = 3650
runtime.config.totem.token_retransmit (u32) = 869
runtime.config.totem.token_retransmits_before_loss_const (u32) = 4
runtime.config.totem.token_warning (u32) = 75
runtime.config.totem.window_size (u32) = 50
runtime.force_gather (str) = no
runtime.members.2.config_version (u64) = 10
runtime.members.2.ip (str) = r(0) ip(10.10.100.105)
runtime.members.2.join_count (u32) = 1
runtime.members.2.status (str) = joined
runtime.members.4.config_version (u64) = 10
runtime.members.4.ip (str) = r(0) ip(10.10.100.110)
runtime.members.4.join_count (u32) = 1
runtime.members.4.status (str) = joined
runtime.services.cfg.0.rx (u64) = 0
runtime.services.cfg.0.tx (u64) = 0
runtime.services.cfg.1.rx (u64) = 0
runtime.services.cfg.1.tx (u64) = 0
runtime.services.cfg.2.rx (u64) = 0
runtime.services.cfg.2.tx (u64) = 0
runtime.services.cfg.3.rx (u64) = 0
runtime.services.cfg.3.tx (u64) = 0
runtime.services.cfg.4.rx (u64) = 0
runtime.services.cfg.4.tx (u64) = 0
runtime.services.cfg.service_id (u16) = 1
runtime.services.cmap.0.rx (u64) = 3
runtime.services.cmap.0.tx (u64) = 2
runtime.services.cmap.service_id (u16) = 0
runtime.services.cpg.0.rx (u64) = 4
runtime.services.cpg.0.tx (u64) = 2
runtime.services.cpg.1.rx (u64) = 0
runtime.services.cpg.1.tx (u64) = 0
runtime.services.cpg.2.rx (u64) = 0
runtime.services.cpg.2.tx (u64) = 0
runtime.services.cpg.3.rx (u64) = 48442
runtime.services.cpg.3.tx (u64) = 23602
runtime.services.cpg.4.rx (u64) = 0
runtime.services.cpg.4.tx (u64) = 0
runtime.services.cpg.5.rx (u64) = 3
runtime.services.cpg.5.tx (u64) = 2
runtime.services.cpg.6.rx (u64) = 0
runtime.services.cpg.6.tx (u64) = 0
runtime.services.cpg.service_id (u16) = 2
runtime.services.mon.service_id (u16) = 6
runtime.services.pload.0.rx (u64) = 0
runtime.services.pload.0.tx (u64) = 0
runtime.services.pload.1.rx (u64) = 0
runtime.services.pload.1.tx (u64) = 0
runtime.services.pload.service_id (u16) = 4
runtime.services.quorum.service_id (u16) = 3
runtime.services.votequorum.0.rx (u64) = 7
runtime.services.votequorum.0.tx (u64) = 4
runtime.services.votequorum.1.rx (u64) = 0
runtime.services.votequorum.1.tx (u64) = 0
runtime.services.votequorum.2.rx (u64) = 0
runtime.services.votequorum.2.tx (u64) = 0
runtime.services.votequorum.3.rx (u64) = 0
runtime.services.votequorum.3.tx (u64) = 0
runtime.services.votequorum.service_id (u16) = 5
runtime.services.wd.service_id (u16) = 7
runtime.votequorum.ev_barrier (u32) = 3
runtime.votequorum.highest_node_id (u32) = 4
runtime.votequorum.lowest_node_id (u32) = 2
runtime.votequorum.this_node_id (u32) = 4
runtime.votequorum.two_node (u8) = 0
totem.cluster_name (str) = proxmox
totem.config_version (u64) = 10
totem.interface.0.bindnetaddr (str) = 10.10.100.110
totem.ip_version (str) = ipv4-6
totem.link_mode (str) = passive
totem.secauth (str) = on
totem.version (u32) = 2
 
tried stopping the pmxcfs service, but it seems to not be running on any of the nodes

Code:
root@pve01:/etc/pve/priv# systemctl stop corosync pmxcfs
Failed to stop pmxcfs.service: Unit pmxcfs.service not loaded.

here is the cluster stauts from the hosts:
PVE01
Code:
root@pve01:~# systemctl status pve-cluster.service
● pve-cluster.service - The Proxmox VE cluster filesystem
     Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled; vendor preset: enabled)
     Active: active (running) since Tue 2023-04-18 11:32:26 EDT; 9min ago
    Process: 1997 ExecStart=/usr/bin/pmxcfs (code=exited, status=0/SUCCESS)
   Main PID: 2014 (pmxcfs)
      Tasks: 7 (limit: 76891)
     Memory: 47.7M
        CPU: 204ms
     CGroup: /system.slice/pve-cluster.service
             └─2014 /usr/bin/pmxcfs

Apr 18 11:38:00 pve01 pmxcfs[2014]: [status] crit: can't initialize service
Apr 18 11:38:05 pve01 pmxcfs[2014]: [quorum] crit: quorum_initialize failed: 2
Apr 18 11:38:05 pve01 pmxcfs[2014]: [confdb] crit: cmap_initialize failed: 2
Apr 18 11:38:05 pve01 pmxcfs[2014]: [dcdb] crit: cpg_initialize failed: 2
Apr 18 11:38:06 pve01 pmxcfs[2014]: [status] crit: cpg_initialize failed: 2
Apr 18 11:38:11 pve01 pmxcfs[2014]: [status] notice: update cluster info (cluster name  proxmox, version = 6)
Apr 18 11:38:11 pve01 pmxcfs[2014]: [dcdb] notice: members: 1/2014
Apr 18 11:38:11 pve01 pmxcfs[2014]: [dcdb] notice: all data is up to date
Apr 18 11:38:12 pve01 pmxcfs[2014]: [status] notice: members: 1/2014
Apr 18 11:38:12 pve01 pmxcfs[2014]: [status] notice: all data is up to date

PVE02
Code:
root@pve02:~# systemctl status pve-cluster.service
● pve-cluster.service - The Proxmox VE cluster filesystem
     Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled; vendor preset: enabled)
     Active: active (running) since Tue 2023-04-18 11:28:12 EDT; 14min ago
    Process: 2025 ExecStart=/usr/bin/pmxcfs (code=exited, status=0/SUCCESS)
   Main PID: 2042 (pmxcfs)
      Tasks: 6 (limit: 76897)
     Memory: 52.1M
        CPU: 656ms
     CGroup: /system.slice/pve-cluster.service
             └─2042 /usr/bin/pmxcfs

Apr 18 11:28:19 pve02 pmxcfs[2042]: [dcdb] notice: sent all (0) updates
Apr 18 11:28:19 pve02 pmxcfs[2042]: [dcdb] notice: all data is up to date
Apr 18 11:28:19 pve02 pmxcfs[2042]: [status] notice: received all states
Apr 18 11:28:19 pve02 pmxcfs[2042]: [status] notice: all data is up to date
Apr 18 11:30:42 pve02 pmxcfs[2042]: [status] notice: received log
Apr 18 11:30:42 pve02 pmxcfs[2042]: [status] notice: received log
Apr 18 11:30:43 pve02 pmxcfs[2042]: [status] notice: received log
Apr 18 11:30:43 pve02 pmxcfs[2042]: [status] notice: received log
Apr 18 11:31:04 pve02 pmxcfs[2042]: [status] notice: received log
Apr 18 11:35:57 pve02 pmxcfs[2042]: [status] notice: received log

PVE03
Code:
root@pve03:~# systemctl status pve-cluster.service
● pve-cluster.service - The Proxmox VE cluster filesystem
     Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled; vendor preset: enabled)
     Active: active (running) since Tue 2023-04-18 11:23:05 EDT; 19min ago
    Process: 1913 ExecStart=/usr/bin/pmxcfs (code=exited, status=0/SUCCESS)
   Main PID: 1930 (pmxcfs)
      Tasks: 6 (limit: 57467)
     Memory: 61.6M
        CPU: 1.031s
     CGroup: /system.slice/pve-cluster.service
             └─1930 /usr/bin/pmxcfs

Apr 18 11:28:18 pve03 pmxcfs[1930]: [status] notice: received all states
Apr 18 11:28:18 pve03 pmxcfs[1930]: [status] notice: all data is up to date
Apr 18 11:28:20 pve03 pmxcfs[1930]: [status] notice: received log
Apr 18 11:28:25 pve03 pmxcfs[1930]: [status] notice: received log
Apr 18 11:28:27 pve03 pmxcfs[1930]: [status] notice: received log
Apr 18 11:28:31 pve03 pmxcfs[1930]: [status] notice: received log
Apr 18 11:28:35 pve03 pmxcfs[1930]: [status] notice: received log
Apr 18 11:28:39 pve03 pmxcfs[1930]: [status] notice: received log
Apr 18 11:28:43 pve03 pmxcfs[1930]: [status] notice: received log
Apr 18 11:28:45 pve03 pmxcfs[1930]: [status] notice: received log
 
Last edited:
Ah, sorry my bad, pmxcfs is managed by the pve-cluster.service, so systemctl stop pve-cluster.service.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!