The Cluster lost quorum from one moment to the next. I was checking the network and there is not a very firm problem around the network.
Quorum all nodes:
Pveversion
corosync.conf:
root@pve04:~# systemctl status pve-cluster.service corosync.service
● pve-cluster.service - The Proxmox VE cluster filesystem
Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled; preset: enabled)
Active: active (running) since Mon 2024-07-15 12:08:09 -03; 4 months 0 days ago
Main PID: 2421 (pmxcfs)
Tasks: 9 (limit: 309246)
Memory: 66.6M
CPU: 10h 27min 4.434s
CGroup: /system.slice/pve-cluster.service
└─2421 /usr/bin/pmxcfs
Nov 14 14:32:14 pve04 pmxcfs[2421]: [status] notice: cpg_send_message retry 60
Nov 14 14:32:15 pve04 pmxcfs[2421]: [status] notice: cpg_send_message retry 70
Nov 14 14:32:16 pve04 pmxcfs[2421]: [status] notice: cpg_send_message retry 80
Nov 14 14:32:17 pve04 pmxcfs[2421]: [status] notice: cpg_send_message retry 90
Nov 14 14:32:18 pve04 pmxcfs[2421]: [status] notice: cpg_send_message retry 100
Nov 14 14:32:18 pve04 pmxcfs[2421]: [status] notice: cpg_send_message retried 100 times
Nov 14 14:32:18 pve04 pmxcfs[2421]: [status] crit: cpg_send_message failed: 6
Nov 14 14:32:19 pve04 pmxcfs[2421]: [status] notice: cpg_send_message retry 10
Nov 14 14:32:20 pve04 pmxcfs[2421]: [status] notice: cpg_send_message retry 20
Nov 14 14:32:21 pve04 pmxcfs[2421]: [status] notice: cpg_send_message retry 30
● corosync.service - Corosync Cluster Engine
Loaded: loaded (/lib/systemd/system/corosync.service; enabled; preset: enabled)
Active: active (running) since Mon 2024-07-15 12:08:10 -03; 4 months 0 days ago
Docs: man:corosync
man:corosync.conf
man:corosync_overview
Main PID: 2546 (corosync)
Tasks: 9 (limit: 309246)
Memory: 608.5M
CPU: 4d 15h 54min 18.993s
CGroup: /system.slice/corosync.service
└─2546 /usr/sbin/corosync -f
Nov 14 14:31:54 pve04 corosync[2546]: [KNET ] loopback: send local failed. error=Resource temporarily unavailable
Nov 14 14:31:54 pve04 corosync[2546]: [KNET ] loopback: send local failed. error=Resource temporarily unavailable
Nov 14 14:31:54 pve04 corosync[2546]: [KNET ] loopback: send local failed. error=Resource temporarily unavailable
Nov 14 14:31:54 pve04 corosync[2546]: [KNET ] loopback: send local failed. error=Resource temporarily unavailable
Nov 14 14:31:54 pve04 corosync[2546]: [KNET ] loopback: send local failed. error=Resource temporarily unavailable
Nov 14 14:31:54 pve04 corosync[2546]: [KNET ] loopback: send local failed. error=Resource temporarily unavailable
Nov 14 14:31:54 pve04 corosync[2546]: [KNET ] loopback: send local failed. error=Resource temporarily unavailable
Nov 14 14:31:54 pve04 corosync[2546]: [KNET ] loopback: send local failed. error=Resource temporarily unavailable
Nov 14 14:31:54 pve04 corosync[2546]: [KNET ] loopback: send local failed. error=Resource temporarily unavailable
Nov 14 14:31:54 pve04 corosync[2546]: [KNET ] loopback: send local failed. error=Resource temporarily unavailable
Quorum all nodes:
root@pve23:~# pvecm status
Cluster information
-------------------
Name: DC01
Config Version: 14
Transport: knet
Secure auth: on
Quorum information
------------------
Date: Thu Nov 14 14:18:10 2024
Quorum provider: corosync_votequorum
Nodes: 1
Node ID: 0x00000006
Ring ID: 6.10fdc
Quorate: No
Votequorum information
----------------------
Expected votes: 11
Highest expected: 11
Total votes: 1
Quorum: 6 Activity blocked
Flags:
Membership information
----------------------
Nodeid Votes Name
0x00000006 1 192.168.150.248 (local)
Pveversion
pveversion
pve-manager/8.2.4/faa83925c9641325 (running kernel: 6.8.8-2-pve)
corosync.conf:
Code:
root@pve23:~# cat /etc/pve/corosync.conf
logging {
debug: off
to_syslog: yes
}
nodelist {
node {
name: pve04
nodeid: 7
quorum_votes: 1
ring0_addr: 192.168.150.243
}
node {
name: pve05
nodeid: 8
quorum_votes: 1
ring0_addr: 192.168.150.244
}
node {
name: pve1
nodeid: 3
quorum_votes: 2
ring0_addr: 192.168.150.251
}
node {
name: pve2
nodeid: 1
quorum_votes: 2
ring0_addr: 192.168.150.252
}
node {
name: pve21
nodeid: 4
quorum_votes: 1
ring0_addr: 192.168.150.249
}
node {
name: pve22
nodeid: 5
quorum_votes: 1
ring0_addr: 192.168.150.250
}
node {
name: pve23
nodeid: 6
quorum_votes: 1
ring0_addr: 192.168.150.248
}
node {
name: pve3
nodeid: 2
quorum_votes: 2
ring0_addr: 192.168.150.253
}
}
quorum {
provider: corosync_votequorum
}
totem {
cluster_name: DC01
config_version: 14
interface {
linknumber: 0
}
ip_version: ipv4-6
link_mode: passive
secauth: on
version: 2
}
root@pve1:~# corosync-cfgtool -s
Local node ID 3, transport knet
LINK ID 0 udp
addr = 192.168.150.251
status:
nodeid: 1: connected
nodeid: 2: connected
nodeid: 3: localhost
nodeid: 4: connected
nodeid: 5: connected
nodeid: 6: connected
nodeid: 7: connected
nodeid: 8: connected
root@pve1:~# journalctl -u corosync -f
Nov 14 14:22:21 pve1 corosync[550128]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 1)
Nov 14 14:22:21 pve1 corosync[550128]: [KNET ] host: host: 6 (passive) best link: 0 (pri: 1)
Nov 14 14:22:22 pve1 corosync[550128]: [KNET ] rx: host: 5 link: 0 is up
Nov 14 14:22:22 pve1 corosync[550128]: [KNET ] link: Resetting MTU for link 0 because host 5 joined
Nov 14 14:22:22 pve1 corosync[550128]: [KNET ] host: host: 5 (passive) best link: 0 (pri: 1)
Nov 14 14:22:22 pve1 corosync[550128]: [KNET ] pmtud: PMTUD link change for host: 2 link: 0 from 469 to 1397
Nov 14 14:22:22 pve1 corosync[550128]: [KNET ] pmtud: PMTUD link change for host: 6 link: 0 from 469 to 1397
Nov 14 14:22:23 pve1 corosync[550128]: [KNET ] pmtud: PMTUD link change for host: 5 link: 0 from 469 to 1397
Nov 14 14:22:24 pve1 corosync[550128]: [KNET ] pmtud: PMTUD link change for host: 1 link: 0 from 469 to 1397
Nov 14 14:22:25 pve1 corosync[550128]: [KNET ] pmtud: PMTUD link change for host: 8 link: 0 from 469 to 1397
Nov 14 14:22:27 pve1 corosync[550128]: [KNET ] pmtud: PMTUD link change for host: 7 link: 0 from 469 to 1397
Nov 14 14:22:27 pve1 corosync[550128]: [KNET ] pmtud: Global data MTU changed to: 1397
^C
root@pve1:~# journalctl -xe -u corosync
Nov 14 14:22:20 pve1 corosync[550128]: [KNET ] host: host: 8 (passive) best link: 0 (pri: 1)
Nov 14 14:22:20 pve1 corosync[550128]: [KNET ] host: host: 8 has no active links
Nov 14 14:22:20 pve1 corosync[550128]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
Nov 14 14:22:20 pve1 corosync[550128]: [KNET ] host: host: 1 (passive) best link: 0 (pri: 1)
Nov 14 14:22:20 pve1 corosync[550128]: [KNET ] host: host: 1 has no active links
Nov 14 14:22:20 pve1 corosync[550128]: [KNET ] host: host: 1 (passive) best link: 0 (pri: 1)
Nov 14 14:22:20 pve1 corosync[550128]: [KNET ] host: host: 1 has no active links
Nov 14 14:22:20 pve1 corosync[550128]: [KNET ] host: host: 1 (passive) best link: 0 (pri: 1)
Nov 14 14:22:20 pve1 corosync[550128]: [KNET ] host: host: 1 has no active links
Nov 14 14:22:20 pve1 corosync[550128]: [KNET ] host: host: 4 (passive) best link: 0 (pri: 1)
Nov 14 14:22:20 pve1 corosync[550128]: [KNET ] host: host: 4 has no active links
Nov 14 14:22:20 pve1 corosync[550128]: [KNET ] host: host: 4 (passive) best link: 0 (pri: 1)
Nov 14 14:22:20 pve1 corosync[550128]: [KNET ] host: host: 4 has no active links
Nov 14 14:22:20 pve1 corosync[550128]: [KNET ] host: host: 4 (passive) best link: 0 (pri: 1)
Nov 14 14:22:20 pve1 corosync[550128]: [KNET ] host: host: 4 has no active links
Nov 14 14:22:20 pve1 corosync[550128]: [KNET ] host: host: 5 (passive) best link: 0 (pri: 1)
Nov 14 14:22:20 pve1 corosync[550128]: [KNET ] host: host: 5 has no active links
Nov 14 14:22:20 pve1 corosync[550128]: [KNET ] host: host: 5 (passive) best link: 0 (pri: 1)
Nov 14 14:22:20 pve1 corosync[550128]: [QUORUM] Sync members[1]: 3
Nov 14 14:22:20 pve1 corosync[550128]: [QUORUM] Sync joined[1]: 3
Nov 14 14:22:20 pve1 corosync[550128]: [TOTEM ] A new membership (3.fd0f) was formed. Members joined: 3
Nov 14 14:22:20 pve1 corosync[550128]: [KNET ] host: host: 5 has no active links
Nov 14 14:22:20 pve1 corosync[550128]: [KNET ] host: host: 5 (passive) best link: 0 (pri: 1)
Nov 14 14:22:20 pve1 corosync[550128]: [KNET ] host: host: 5 has no active links
Nov 14 14:22:20 pve1 corosync[550128]: [KNET ] host: host: 6 (passive) best link: 0 (pri: 1)
Nov 14 14:22:20 pve1 corosync[550128]: [KNET ] host: host: 6 has no active links
Nov 14 14:22:20 pve1 corosync[550128]: [KNET ] host: host: 6 (passive) best link: 0 (pri: 1)
Nov 14 14:22:20 pve1 corosync[550128]: [KNET ] host: host: 6 has no active links
Nov 14 14:22:20 pve1 corosync[550128]: [KNET ] host: host: 6 (passive) best link: 0 (pri: 1)
Nov 14 14:22:20 pve1 corosync[550128]: [KNET ] host: host: 6 has no active links
Nov 14 14:22:20 pve1 corosync[550128]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 1)
Nov 14 14:22:20 pve1 corosync[550128]: [KNET ] host: host: 2 has no active links
Nov 14 14:22:20 pve1 corosync[550128]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 1)
Nov 14 14:22:20 pve1 corosync[550128]: [KNET ] host: host: 2 has no active links
Nov 14 14:22:20 pve1 corosync[550128]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 1)
Nov 14 14:22:20 pve1 corosync[550128]: [KNET ] host: host: 2 has no active links
Nov 14 14:22:20 pve1 corosync[550128]: [QUORUM] Members[1]: 3
Nov 14 14:22:20 pve1 corosync[550128]: [MAIN ] Completed service synchronization, ready to provide service.
Nov 14 14:22:20 pve1 systemd[1]: Started corosync.service - Corosync Cluster Engine.
░░ Subject: A start job for unit corosync.service has finished successfully
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░
░░ A start job for unit corosync.service has finished successfully.
░░
░░ The job identifier is 5117.
Nov 14 14:22:20 pve1 corosync[550128]: [KNET ] link: Resetting MTU for link 0 because host 8 joined
Nov 14 14:22:20 pve1 corosync[550128]: [KNET ] host: host: 8 (passive) best link: 0 (pri: 1)
Nov 14 14:22:20 pve1 corosync[550128]: [KNET ] link: Resetting MTU for link 0 because host 1 joined
Nov 14 14:22:20 pve1 corosync[550128]: [KNET ] host: host: 1 (passive) best link: 0 (pri: 1)
Nov 14 14:22:21 pve1 corosync[550128]: [KNET ] rx: host: 7 link: 0 is up
Nov 14 14:22:21 pve1 corosync[550128]: [KNET ] link: Resetting MTU for link 0 because host 7 joined
Nov 14 14:22:21 pve1 corosync[550128]: [KNET ] host: host: 7 (passive) best link: 0 (pri: 1)
Nov 14 14:22:21 pve1 corosync[550128]: [KNET ] rx: host: 4 link: 0 is up
Nov 14 14:22:21 pve1 corosync[550128]: [KNET ] link: Resetting MTU for link 0 because host 4 joined
Nov 14 14:22:21 pve1 corosync[550128]: [KNET ] host: host: 4 (passive) best link: 0 (pri: 1)
Nov 14 14:22:21 pve1 corosync[550128]: [KNET ] pmtud: PMTUD link change for host: 4 link: 0 from 469 to 1397
Nov 14 14:22:21 pve1 corosync[550128]: [KNET ] rx: host: 2 link: 0 is up
Nov 14 14:22:21 pve1 corosync[550128]: [KNET ] link: Resetting MTU for link 0 because host 2 joined
Nov 14 14:22:21 pve1 corosync[550128]: [KNET ] rx: host: 6 link: 0 is up
Nov 14 14:22:21 pve1 corosync[550128]: [KNET ] link: Resetting MTU for link 0 because host 6 joined
Nov 14 14:22:21 pve1 corosync[550128]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 1)
Nov 14 14:22:21 pve1 corosync[550128]: [KNET ] host: host: 6 (passive) best link: 0 (pri: 1)
Nov 14 14:22:22 pve1 corosync[550128]: [KNET ] rx: host: 5 link: 0 is up
Nov 14 14:22:22 pve1 corosync[550128]: [KNET ] link: Resetting MTU for link 0 because host 5 joined
Nov 14 14:22:22 pve1 corosync[550128]: [KNET ] host: host: 5 (passive) best link: 0 (pri: 1)
Nov 14 14:22:22 pve1 corosync[550128]: [KNET ] pmtud: PMTUD link change for host: 2 link: 0 from 469 to 1397
Nov 14 14:22:22 pve1 corosync[550128]: [KNET ] pmtud: PMTUD link change for host: 6 link: 0 from 469 to 1397
Nov 14 14:22:23 pve1 corosync[550128]: [KNET ] pmtud: PMTUD link change for host: 5 link: 0 from 469 to 1397
Nov 14 14:22:24 pve1 corosync[550128]: [KNET ] pmtud: PMTUD link change for host: 1 link: 0 from 469 to 1397
Nov 14 14:22:25 pve1 corosync[550128]: [KNET ] pmtud: PMTUD link change for host: 8 link: 0 from 469 to 1397
Nov 14 14:22:27 pve1 corosync[550128]: [KNET ] pmtud: PMTUD link change for host: 7 link: 0 from 469 to 1397
Nov 14 14:22:27 pve1 corosync[550128]: [KNET ] pmtud: Global data MTU changed to: 1397
Nov 14 14:22:41 pve1 corosync[550128]: [KNET ] link: host: 5 link: 0 is down
Nov 14 14:22:41 pve1 corosync[550128]: [KNET ] host: host: 5 (passive) best link: 0 (pri: 1)
Nov 14 14:22:41 pve1 corosync[550128]: [KNET ] host: host: 5 has no active links
Nov 14 14:22:42 pve1 corosync[550128]: [KNET ] link: Resetting MTU for link 0 because host 5 joined
Nov 14 14:22:42 pve1 corosync[550128]: [KNET ] host: host: 5 (passive) best link: 0 (pri: 1)
root@pve04:~# systemctl status pve-cluster.service corosync.service
● pve-cluster.service - The Proxmox VE cluster filesystem
Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled; preset: enabled)
Active: active (running) since Mon 2024-07-15 12:08:09 -03; 4 months 0 days ago
Main PID: 2421 (pmxcfs)
Tasks: 9 (limit: 309246)
Memory: 66.6M
CPU: 10h 27min 4.434s
CGroup: /system.slice/pve-cluster.service
└─2421 /usr/bin/pmxcfs
Nov 14 14:32:14 pve04 pmxcfs[2421]: [status] notice: cpg_send_message retry 60
Nov 14 14:32:15 pve04 pmxcfs[2421]: [status] notice: cpg_send_message retry 70
Nov 14 14:32:16 pve04 pmxcfs[2421]: [status] notice: cpg_send_message retry 80
Nov 14 14:32:17 pve04 pmxcfs[2421]: [status] notice: cpg_send_message retry 90
Nov 14 14:32:18 pve04 pmxcfs[2421]: [status] notice: cpg_send_message retry 100
Nov 14 14:32:18 pve04 pmxcfs[2421]: [status] notice: cpg_send_message retried 100 times
Nov 14 14:32:18 pve04 pmxcfs[2421]: [status] crit: cpg_send_message failed: 6
Nov 14 14:32:19 pve04 pmxcfs[2421]: [status] notice: cpg_send_message retry 10
Nov 14 14:32:20 pve04 pmxcfs[2421]: [status] notice: cpg_send_message retry 20
Nov 14 14:32:21 pve04 pmxcfs[2421]: [status] notice: cpg_send_message retry 30
● corosync.service - Corosync Cluster Engine
Loaded: loaded (/lib/systemd/system/corosync.service; enabled; preset: enabled)
Active: active (running) since Mon 2024-07-15 12:08:10 -03; 4 months 0 days ago
Docs: man:corosync
man:corosync.conf
man:corosync_overview
Main PID: 2546 (corosync)
Tasks: 9 (limit: 309246)
Memory: 608.5M
CPU: 4d 15h 54min 18.993s
CGroup: /system.slice/corosync.service
└─2546 /usr/sbin/corosync -f
Nov 14 14:31:54 pve04 corosync[2546]: [KNET ] loopback: send local failed. error=Resource temporarily unavailable
Nov 14 14:31:54 pve04 corosync[2546]: [KNET ] loopback: send local failed. error=Resource temporarily unavailable
Nov 14 14:31:54 pve04 corosync[2546]: [KNET ] loopback: send local failed. error=Resource temporarily unavailable
Nov 14 14:31:54 pve04 corosync[2546]: [KNET ] loopback: send local failed. error=Resource temporarily unavailable
Nov 14 14:31:54 pve04 corosync[2546]: [KNET ] loopback: send local failed. error=Resource temporarily unavailable
Nov 14 14:31:54 pve04 corosync[2546]: [KNET ] loopback: send local failed. error=Resource temporarily unavailable
Nov 14 14:31:54 pve04 corosync[2546]: [KNET ] loopback: send local failed. error=Resource temporarily unavailable
Nov 14 14:31:54 pve04 corosync[2546]: [KNET ] loopback: send local failed. error=Resource temporarily unavailable
Nov 14 14:31:54 pve04 corosync[2546]: [KNET ] loopback: send local failed. error=Resource temporarily unavailable
Nov 14 14:31:54 pve04 corosync[2546]: [KNET ] loopback: send local failed. error=Resource temporarily unavailable
Last edited: