The Cluster lost quorum from one moment to the next. I was checking the network and there is not a very firm problem around the network.
Quorum all nodes:
Pveversion
corosync.conf:
root@pve04:~# systemctl status pve-cluster.service corosync.service
● pve-cluster.service - The Proxmox VE cluster filesystem
Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled; preset: enabled)
Active: active (running) since Mon 2024-07-15 12:08:09 -03; 4 months 0 days ago
Main PID: 2421 (pmxcfs)
Tasks: 9 (limit: 309246)
Memory: 66.6M
CPU: 10h 27min 4.434s
CGroup: /system.slice/pve-cluster.service
└─2421 /usr/bin/pmxcfs
Nov 14 14:32:14 pve04 pmxcfs[2421]: [status] notice: cpg_send_message retry 60
Nov 14 14:32:15 pve04 pmxcfs[2421]: [status] notice: cpg_send_message retry 70
Nov 14 14:32:16 pve04 pmxcfs[2421]: [status] notice: cpg_send_message retry 80
Nov 14 14:32:17 pve04 pmxcfs[2421]: [status] notice: cpg_send_message retry 90
Nov 14 14:32:18 pve04 pmxcfs[2421]: [status] notice: cpg_send_message retry 100
Nov 14 14:32:18 pve04 pmxcfs[2421]: [status] notice: cpg_send_message retried 100 times
Nov 14 14:32:18 pve04 pmxcfs[2421]: [status] crit: cpg_send_message failed: 6
Nov 14 14:32:19 pve04 pmxcfs[2421]: [status] notice: cpg_send_message retry 10
Nov 14 14:32:20 pve04 pmxcfs[2421]: [status] notice: cpg_send_message retry 20
Nov 14 14:32:21 pve04 pmxcfs[2421]: [status] notice: cpg_send_message retry 30
● corosync.service - Corosync Cluster Engine
Loaded: loaded (/lib/systemd/system/corosync.service; enabled; preset: enabled)
Active: active (running) since Mon 2024-07-15 12:08:10 -03; 4 months 0 days ago
Docs: man:corosync
man:corosync.conf
man:corosync_overview
Main PID: 2546 (corosync)
Tasks: 9 (limit: 309246)
Memory: 608.5M
CPU: 4d 15h 54min 18.993s
CGroup: /system.slice/corosync.service
└─2546 /usr/sbin/corosync -f
Nov 14 14:31:54 pve04 corosync[2546]: [KNET ] loopback: send local failed. error=Resource temporarily unavailable
Nov 14 14:31:54 pve04 corosync[2546]: [KNET ] loopback: send local failed. error=Resource temporarily unavailable
Nov 14 14:31:54 pve04 corosync[2546]: [KNET ] loopback: send local failed. error=Resource temporarily unavailable
Nov 14 14:31:54 pve04 corosync[2546]: [KNET ] loopback: send local failed. error=Resource temporarily unavailable
Nov 14 14:31:54 pve04 corosync[2546]: [KNET ] loopback: send local failed. error=Resource temporarily unavailable
Nov 14 14:31:54 pve04 corosync[2546]: [KNET ] loopback: send local failed. error=Resource temporarily unavailable
Nov 14 14:31:54 pve04 corosync[2546]: [KNET ] loopback: send local failed. error=Resource temporarily unavailable
Nov 14 14:31:54 pve04 corosync[2546]: [KNET ] loopback: send local failed. error=Resource temporarily unavailable
Nov 14 14:31:54 pve04 corosync[2546]: [KNET ] loopback: send local failed. error=Resource temporarily unavailable
Nov 14 14:31:54 pve04 corosync[2546]: [KNET ] loopback: send local failed. error=Resource temporarily unavailable
Quorum all nodes:
root@pve23:~# pvecm status Cluster information-------------------Name: DC01Config Version: 14Transport: knetSecure auth: onQuorum information------------------Date: Thu Nov 14 14:18:10 2024Quorum provider: corosync_votequorumNodes: 1Node ID: 0x00000006Ring ID: 6.10fdcQuorate: NoVotequorum information----------------------Expected votes: 11Highest expected: 11Total votes: 1Quorum: 6 Activity blockedFlags: Membership information---------------------- Nodeid Votes Name0x00000006 1 192.168.150.248 (local)Pveversion
pveversion pve-manager/8.2.4/faa83925c9641325 (running kernel: 6.8.8-2-pve)corosync.conf:
Code:
root@pve23:~# cat /etc/pve/corosync.conf
logging {
debug: off
to_syslog: yes
}
nodelist {
node {
name: pve04
nodeid: 7
quorum_votes: 1
ring0_addr: 192.168.150.243
}
node {
name: pve05
nodeid: 8
quorum_votes: 1
ring0_addr: 192.168.150.244
}
node {
name: pve1
nodeid: 3
quorum_votes: 2
ring0_addr: 192.168.150.251
}
node {
name: pve2
nodeid: 1
quorum_votes: 2
ring0_addr: 192.168.150.252
}
node {
name: pve21
nodeid: 4
quorum_votes: 1
ring0_addr: 192.168.150.249
}
node {
name: pve22
nodeid: 5
quorum_votes: 1
ring0_addr: 192.168.150.250
}
node {
name: pve23
nodeid: 6
quorum_votes: 1
ring0_addr: 192.168.150.248
}
node {
name: pve3
nodeid: 2
quorum_votes: 2
ring0_addr: 192.168.150.253
}
}
quorum {
provider: corosync_votequorum
}
totem {
cluster_name: DC01
config_version: 14
interface {
linknumber: 0
}
ip_version: ipv4-6
link_mode: passive
secauth: on
version: 2
}
root@pve1:~# corosync-cfgtool -sLocal node ID 3, transport knetLINK ID 0 udp addr = 192.168.150.251 status: nodeid: 1: connected nodeid: 2: connected nodeid: 3: localhost nodeid: 4: connected nodeid: 5: connected nodeid: 6: connected nodeid: 7: connected nodeid: 8: connectedroot@pve1:~# journalctl -u corosync -fNov 14 14:22:21 pve1 corosync[550128]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 1)Nov 14 14:22:21 pve1 corosync[550128]: [KNET ] host: host: 6 (passive) best link: 0 (pri: 1)Nov 14 14:22:22 pve1 corosync[550128]: [KNET ] rx: host: 5 link: 0 is upNov 14 14:22:22 pve1 corosync[550128]: [KNET ] link: Resetting MTU for link 0 because host 5 joinedNov 14 14:22:22 pve1 corosync[550128]: [KNET ] host: host: 5 (passive) best link: 0 (pri: 1)Nov 14 14:22:22 pve1 corosync[550128]: [KNET ] pmtud: PMTUD link change for host: 2 link: 0 from 469 to 1397Nov 14 14:22:22 pve1 corosync[550128]: [KNET ] pmtud: PMTUD link change for host: 6 link: 0 from 469 to 1397Nov 14 14:22:23 pve1 corosync[550128]: [KNET ] pmtud: PMTUD link change for host: 5 link: 0 from 469 to 1397Nov 14 14:22:24 pve1 corosync[550128]: [KNET ] pmtud: PMTUD link change for host: 1 link: 0 from 469 to 1397Nov 14 14:22:25 pve1 corosync[550128]: [KNET ] pmtud: PMTUD link change for host: 8 link: 0 from 469 to 1397Nov 14 14:22:27 pve1 corosync[550128]: [KNET ] pmtud: PMTUD link change for host: 7 link: 0 from 469 to 1397Nov 14 14:22:27 pve1 corosync[550128]: [KNET ] pmtud: Global data MTU changed to: 1397^Croot@pve1:~# journalctl -xe -u corosync Nov 14 14:22:20 pve1 corosync[550128]: [KNET ] host: host: 8 (passive) best link: 0 (pri: 1)Nov 14 14:22:20 pve1 corosync[550128]: [KNET ] host: host: 8 has no active linksNov 14 14:22:20 pve1 corosync[550128]: [KNET ] link: Resetting MTU for link 0 because host 3 joinedNov 14 14:22:20 pve1 corosync[550128]: [KNET ] host: host: 1 (passive) best link: 0 (pri: 1)Nov 14 14:22:20 pve1 corosync[550128]: [KNET ] host: host: 1 has no active linksNov 14 14:22:20 pve1 corosync[550128]: [KNET ] host: host: 1 (passive) best link: 0 (pri: 1)Nov 14 14:22:20 pve1 corosync[550128]: [KNET ] host: host: 1 has no active linksNov 14 14:22:20 pve1 corosync[550128]: [KNET ] host: host: 1 (passive) best link: 0 (pri: 1)Nov 14 14:22:20 pve1 corosync[550128]: [KNET ] host: host: 1 has no active linksNov 14 14:22:20 pve1 corosync[550128]: [KNET ] host: host: 4 (passive) best link: 0 (pri: 1)Nov 14 14:22:20 pve1 corosync[550128]: [KNET ] host: host: 4 has no active linksNov 14 14:22:20 pve1 corosync[550128]: [KNET ] host: host: 4 (passive) best link: 0 (pri: 1)Nov 14 14:22:20 pve1 corosync[550128]: [KNET ] host: host: 4 has no active linksNov 14 14:22:20 pve1 corosync[550128]: [KNET ] host: host: 4 (passive) best link: 0 (pri: 1)Nov 14 14:22:20 pve1 corosync[550128]: [KNET ] host: host: 4 has no active linksNov 14 14:22:20 pve1 corosync[550128]: [KNET ] host: host: 5 (passive) best link: 0 (pri: 1)Nov 14 14:22:20 pve1 corosync[550128]: [KNET ] host: host: 5 has no active linksNov 14 14:22:20 pve1 corosync[550128]: [KNET ] host: host: 5 (passive) best link: 0 (pri: 1)Nov 14 14:22:20 pve1 corosync[550128]: [QUORUM] Sync members[1]: 3Nov 14 14:22:20 pve1 corosync[550128]: [QUORUM] Sync joined[1]: 3Nov 14 14:22:20 pve1 corosync[550128]: [TOTEM ] A new membership (3.fd0f) was formed. Members joined: 3Nov 14 14:22:20 pve1 corosync[550128]: [KNET ] host: host: 5 has no active linksNov 14 14:22:20 pve1 corosync[550128]: [KNET ] host: host: 5 (passive) best link: 0 (pri: 1)Nov 14 14:22:20 pve1 corosync[550128]: [KNET ] host: host: 5 has no active linksNov 14 14:22:20 pve1 corosync[550128]: [KNET ] host: host: 6 (passive) best link: 0 (pri: 1)Nov 14 14:22:20 pve1 corosync[550128]: [KNET ] host: host: 6 has no active linksNov 14 14:22:20 pve1 corosync[550128]: [KNET ] host: host: 6 (passive) best link: 0 (pri: 1)Nov 14 14:22:20 pve1 corosync[550128]: [KNET ] host: host: 6 has no active linksNov 14 14:22:20 pve1 corosync[550128]: [KNET ] host: host: 6 (passive) best link: 0 (pri: 1)Nov 14 14:22:20 pve1 corosync[550128]: [KNET ] host: host: 6 has no active linksNov 14 14:22:20 pve1 corosync[550128]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 1)Nov 14 14:22:20 pve1 corosync[550128]: [KNET ] host: host: 2 has no active linksNov 14 14:22:20 pve1 corosync[550128]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 1)Nov 14 14:22:20 pve1 corosync[550128]: [KNET ] host: host: 2 has no active linksNov 14 14:22:20 pve1 corosync[550128]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 1)Nov 14 14:22:20 pve1 corosync[550128]: [KNET ] host: host: 2 has no active linksNov 14 14:22:20 pve1 corosync[550128]: [QUORUM] Members[1]: 3Nov 14 14:22:20 pve1 corosync[550128]: [MAIN ] Completed service synchronization, ready to provide service.Nov 14 14:22:20 pve1 systemd[1]: Started corosync.service - Corosync Cluster Engine.░░ Subject: A start job for unit corosync.service has finished successfully░░ Defined-By: systemd░░ Support: https://www.debian.org/support░░ ░░ A start job for unit corosync.service has finished successfully.░░ ░░ The job identifier is 5117.Nov 14 14:22:20 pve1 corosync[550128]: [KNET ] link: Resetting MTU for link 0 because host 8 joinedNov 14 14:22:20 pve1 corosync[550128]: [KNET ] host: host: 8 (passive) best link: 0 (pri: 1)Nov 14 14:22:20 pve1 corosync[550128]: [KNET ] link: Resetting MTU for link 0 because host 1 joinedNov 14 14:22:20 pve1 corosync[550128]: [KNET ] host: host: 1 (passive) best link: 0 (pri: 1)Nov 14 14:22:21 pve1 corosync[550128]: [KNET ] rx: host: 7 link: 0 is upNov 14 14:22:21 pve1 corosync[550128]: [KNET ] link: Resetting MTU for link 0 because host 7 joinedNov 14 14:22:21 pve1 corosync[550128]: [KNET ] host: host: 7 (passive) best link: 0 (pri: 1)Nov 14 14:22:21 pve1 corosync[550128]: [KNET ] rx: host: 4 link: 0 is upNov 14 14:22:21 pve1 corosync[550128]: [KNET ] link: Resetting MTU for link 0 because host 4 joinedNov 14 14:22:21 pve1 corosync[550128]: [KNET ] host: host: 4 (passive) best link: 0 (pri: 1)Nov 14 14:22:21 pve1 corosync[550128]: [KNET ] pmtud: PMTUD link change for host: 4 link: 0 from 469 to 1397Nov 14 14:22:21 pve1 corosync[550128]: [KNET ] rx: host: 2 link: 0 is upNov 14 14:22:21 pve1 corosync[550128]: [KNET ] link: Resetting MTU for link 0 because host 2 joinedNov 14 14:22:21 pve1 corosync[550128]: [KNET ] rx: host: 6 link: 0 is upNov 14 14:22:21 pve1 corosync[550128]: [KNET ] link: Resetting MTU for link 0 because host 6 joinedNov 14 14:22:21 pve1 corosync[550128]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 1)Nov 14 14:22:21 pve1 corosync[550128]: [KNET ] host: host: 6 (passive) best link: 0 (pri: 1)Nov 14 14:22:22 pve1 corosync[550128]: [KNET ] rx: host: 5 link: 0 is upNov 14 14:22:22 pve1 corosync[550128]: [KNET ] link: Resetting MTU for link 0 because host 5 joinedNov 14 14:22:22 pve1 corosync[550128]: [KNET ] host: host: 5 (passive) best link: 0 (pri: 1)Nov 14 14:22:22 pve1 corosync[550128]: [KNET ] pmtud: PMTUD link change for host: 2 link: 0 from 469 to 1397Nov 14 14:22:22 pve1 corosync[550128]: [KNET ] pmtud: PMTUD link change for host: 6 link: 0 from 469 to 1397Nov 14 14:22:23 pve1 corosync[550128]: [KNET ] pmtud: PMTUD link change for host: 5 link: 0 from 469 to 1397Nov 14 14:22:24 pve1 corosync[550128]: [KNET ] pmtud: PMTUD link change for host: 1 link: 0 from 469 to 1397Nov 14 14:22:25 pve1 corosync[550128]: [KNET ] pmtud: PMTUD link change for host: 8 link: 0 from 469 to 1397Nov 14 14:22:27 pve1 corosync[550128]: [KNET ] pmtud: PMTUD link change for host: 7 link: 0 from 469 to 1397Nov 14 14:22:27 pve1 corosync[550128]: [KNET ] pmtud: Global data MTU changed to: 1397Nov 14 14:22:41 pve1 corosync[550128]: [KNET ] link: host: 5 link: 0 is downNov 14 14:22:41 pve1 corosync[550128]: [KNET ] host: host: 5 (passive) best link: 0 (pri: 1)Nov 14 14:22:41 pve1 corosync[550128]: [KNET ] host: host: 5 has no active linksNov 14 14:22:42 pve1 corosync[550128]: [KNET ] link: Resetting MTU for link 0 because host 5 joinedNov 14 14:22:42 pve1 corosync[550128]: [KNET ] host: host: 5 (passive) best link: 0 (pri: 1)root@pve04:~# systemctl status pve-cluster.service corosync.service
● pve-cluster.service - The Proxmox VE cluster filesystem
Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled; preset: enabled)
Active: active (running) since Mon 2024-07-15 12:08:09 -03; 4 months 0 days ago
Main PID: 2421 (pmxcfs)
Tasks: 9 (limit: 309246)
Memory: 66.6M
CPU: 10h 27min 4.434s
CGroup: /system.slice/pve-cluster.service
└─2421 /usr/bin/pmxcfs
Nov 14 14:32:14 pve04 pmxcfs[2421]: [status] notice: cpg_send_message retry 60
Nov 14 14:32:15 pve04 pmxcfs[2421]: [status] notice: cpg_send_message retry 70
Nov 14 14:32:16 pve04 pmxcfs[2421]: [status] notice: cpg_send_message retry 80
Nov 14 14:32:17 pve04 pmxcfs[2421]: [status] notice: cpg_send_message retry 90
Nov 14 14:32:18 pve04 pmxcfs[2421]: [status] notice: cpg_send_message retry 100
Nov 14 14:32:18 pve04 pmxcfs[2421]: [status] notice: cpg_send_message retried 100 times
Nov 14 14:32:18 pve04 pmxcfs[2421]: [status] crit: cpg_send_message failed: 6
Nov 14 14:32:19 pve04 pmxcfs[2421]: [status] notice: cpg_send_message retry 10
Nov 14 14:32:20 pve04 pmxcfs[2421]: [status] notice: cpg_send_message retry 20
Nov 14 14:32:21 pve04 pmxcfs[2421]: [status] notice: cpg_send_message retry 30
● corosync.service - Corosync Cluster Engine
Loaded: loaded (/lib/systemd/system/corosync.service; enabled; preset: enabled)
Active: active (running) since Mon 2024-07-15 12:08:10 -03; 4 months 0 days ago
Docs: man:corosync
man:corosync.conf
man:corosync_overview
Main PID: 2546 (corosync)
Tasks: 9 (limit: 309246)
Memory: 608.5M
CPU: 4d 15h 54min 18.993s
CGroup: /system.slice/corosync.service
└─2546 /usr/sbin/corosync -f
Nov 14 14:31:54 pve04 corosync[2546]: [KNET ] loopback: send local failed. error=Resource temporarily unavailable
Nov 14 14:31:54 pve04 corosync[2546]: [KNET ] loopback: send local failed. error=Resource temporarily unavailable
Nov 14 14:31:54 pve04 corosync[2546]: [KNET ] loopback: send local failed. error=Resource temporarily unavailable
Nov 14 14:31:54 pve04 corosync[2546]: [KNET ] loopback: send local failed. error=Resource temporarily unavailable
Nov 14 14:31:54 pve04 corosync[2546]: [KNET ] loopback: send local failed. error=Resource temporarily unavailable
Nov 14 14:31:54 pve04 corosync[2546]: [KNET ] loopback: send local failed. error=Resource temporarily unavailable
Nov 14 14:31:54 pve04 corosync[2546]: [KNET ] loopback: send local failed. error=Resource temporarily unavailable
Nov 14 14:31:54 pve04 corosync[2546]: [KNET ] loopback: send local failed. error=Resource temporarily unavailable
Nov 14 14:31:54 pve04 corosync[2546]: [KNET ] loopback: send local failed. error=Resource temporarily unavailable
Nov 14 14:31:54 pve04 corosync[2546]: [KNET ] loopback: send local failed. error=Resource temporarily unavailable
Last edited: