Hi,
I've a cluster of 9 nodes in 3 datacenters that looks like this:
My network config for each node (but P-ARB) is the following:
- 2X 10Gbps for storage (external ceph cluster). LACP bond
- 2X 1Gbps for corosync ring0 (and only for corosync, nothing else on those NICs). LACP bond
- 2X 10Gbps for VMs network, admin network and corosync ring1. Openvswitch controlled.
Network config for P-ARB node:
- 2X 1Gbps for corosync ring0. LACP bond
- 2X 1Gbps for admin network, corosync ring1, and 1 ceph monitor of the external cluster (no VM on this node). LACP bond.
The latency for the links are the following:
LINK1->3: 3.76 ms
LINK1->2: 1.57 ms
LINK2->3: 1.56 ms
For a maintenance operation I stopped 2 nodes (P8 and P9).
When the P8 node came back online, all the other nodes self-fenced (except P-ARB).
The logs didn't show any quorum failure on fenced nodes, only a timeout for the watchdog.
P1:
P2:
P3:
Same type of logs for P7 and P10.
P8:
P9:
no logs, offline.
P11:
P-ARB:
I don't understand why all but one node rebooted without any loose of quorum and no corosync logs.
Any idea ?
The pve version is:
pve-manager/5.2-1/0fcd7879 (running kernel: 4.15.17-2-pve)
Thank you.
I've a cluster of 9 nodes in 3 datacenters that looks like this:
Code:
DC 3 DC 1
+--------+ LINK 1->3 +---------------+
| P-ARB |--------------------| P1 P2 P3 P10 |
+--------+ +--------+------+
| |
| |
| LINK 2->3 | LINK 1->2
| |
| |
+-------+--------+ |
| P7 P8 P9 P11 +--------------------+
+----------------+
DC 2
My network config for each node (but P-ARB) is the following:
- 2X 10Gbps for storage (external ceph cluster). LACP bond
- 2X 1Gbps for corosync ring0 (and only for corosync, nothing else on those NICs). LACP bond
- 2X 10Gbps for VMs network, admin network and corosync ring1. Openvswitch controlled.
Network config for P-ARB node:
- 2X 1Gbps for corosync ring0. LACP bond
- 2X 1Gbps for admin network, corosync ring1, and 1 ceph monitor of the external cluster (no VM on this node). LACP bond.
The latency for the links are the following:
LINK1->3: 3.76 ms
LINK1->2: 1.57 ms
LINK2->3: 1.56 ms
For a maintenance operation I stopped 2 nodes (P8 and P9).
When the P8 node came back online, all the other nodes self-fenced (except P-ARB).
The logs didn't show any quorum failure on fenced nodes, only a timeout for the watchdog.
P1:
Code:
Jun 21 11:14:25 proxmox1 pmxcfs[3045]: [status] notice: received log
Jun 21 11:14:25 proxmox1 pmxcfs[3045]: [status] notice: received log
Jun 21 11:14:25 proxmox1 pmxcfs[3045]: [status] notice: received log
Jun 21 11:14:25 proxmox1 pmxcfs[3045]: [status] notice: received log
Jun 21 11:14:26 proxmox1 pmxcfs[3045]: [status] notice: received log
Jun 21 11:14:55 proxmox1 pveproxy[18977]: worker exit
Jun 21 11:14:55 proxmox1 pveproxy[3195]: worker 18977 finished
Jun 21 11:14:55 proxmox1 pveproxy[3195]: starting 1 worker(s)
Jun 21 11:14:55 proxmox1 pveproxy[3195]: worker 2279 started
Jun 21 11:15:00 proxmox1 systemd[1]: Starting Proxmox VE replication runner...
Jun 21 11:15:01 proxmox1 CRON[2395]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Jun 21 11:15:18 proxmox1 pvedaemon[35087]: <root@pam> successful auth for user 'brachere@e-tera.com'
Jun 21 11:15:24 proxmox1 watchdog-mux[1548]: client watchdog expired - disable watchdog updates
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@Jun 21 11:18:20 proxmox1 systemd-modules-load[659]: Inserted module 'iscsi_tcp'
Jun 21 11:18:20 proxmox1 systemd-modules-load[659]: Inserted module 'ib_iser'
P2:
Code:
Jun 21 11:14:24 proxmox2 pmxcfs[3047]: [status] notice: received log
Jun 21 11:14:24 proxmox2 pmxcfs[3047]: [status] notice: received log
Jun 21 11:14:24 proxmox2 pmxcfs[3047]: [status] notice: received log
Jun 21 11:14:24 proxmox2 pmxcfs[3047]: [status] notice: received log
Jun 21 11:14:24 proxmox2 pmxcfs[3047]: [status] notice: received log
Jun 21 11:14:24 proxmox2 pmxcfs[3047]: [status] notice: received log
Jun 21 11:14:25 proxmox2 pmxcfs[3047]: [status] notice: received log
Jun 21 11:14:25 proxmox2 pmxcfs[3047]: [status] notice: received log
Jun 21 11:14:25 proxmox2 pmxcfs[3047]: [status] notice: received log
Jun 21 11:14:25 proxmox2 pmxcfs[3047]: [status] notice: received log
Jun 21 11:14:26 proxmox2 pmxcfs[3047]: [status] notice: received log
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@Jun 21 11:17:33 proxmox2 systemd-modules-load[670]: Inserted module 'iscsi_tcp'
Jun 21 11:17:33 proxmox2 systemd-modules-load[670]: Inserted module 'ib_iser'
Jun 21 11:17:33 proxmox2 systemd-modules-load[670]: Inserted module 'vhost_net'
Jun 21 11:17:33 proxmox2 keyboard-setup.sh[665]: cannot open file /tmp/tmpkbd.0yra79
P3:
Code:
Jun 21 11:14:24 proxmox3 pmxcfs[3011]: [status] notice: received log
Jun 21 11:14:24 proxmox3 pmxcfs[3011]: [status] notice: received log
Jun 21 11:14:24 proxmox3 pmxcfs[3011]: [status] notice: received log
Jun 21 11:14:24 proxmox3 pmxcfs[3011]: [status] notice: received log
Jun 21 11:14:24 proxmox3 pmxcfs[3011]: [status] notice: received log
Jun 21 11:14:24 proxmox3 pmxcfs[3011]: [status] notice: received log
Jun 21 11:14:25 proxmox3 pmxcfs[3011]: [status] notice: received log
Jun 21 11:14:25 proxmox3 pmxcfs[3011]: [status] notice: received log
Jun 21 11:14:25 proxmox3 pmxcfs[3011]: [status] notice: received log
Jun 21 11:14:25 proxmox3 pmxcfs[3011]: [status] notice: received log
Jun 21 11:14:26 proxmox3 pmxcfs[3011]: [status] notice: received log
Jun 21 11:15:00 proxmox3 systemd[1]: Starting Proxmox VE replication runner...
Jun 21 11:15:01 proxmox3 CRON[100686]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Jun 21 11:15:23 proxmox3 watchdog-mux[1561]: client watchdog expired - disable watchdog updates
Jun 21 11:18:18 proxmox3 systemd-modules-load[679]: Inserted module 'iscsi_tcp'
Jun 21 11:18:18 proxmox3 kernel: [ 0.000000] Linux version 4.15.17-2-pve (tlamprecht@evita) (gcc version 6.3.0 20170516 (Debian 6.3.0-18+deb9u1)) #1 SMP PVE 4.15.17-10 (Tue, 22 May 2018 11:15:44 +0200) ()
Jun 21 11:18:18 proxmox3 kernel: [ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-4.15.17-2-pve root=/dev/mapper/pve-root ro quiet
Same type of logs for P7 and P10.
P8:
Code:
Jun 21 11:07:53 proxmox8 pmxcfs[2673]: [status] notice: received log
Jun 21 11:07:53 proxmox8 pmxcfs[2673]: [status] notice: received log
Jun 21 11:07:54 proxmox8 pmxcfs[2673]: [status] notice: received log
Jun 21 11:07:54 proxmox8 pmxcfs[2673]: [status] notice: received log
Jun 21 11:07:54 proxmox8 systemd[1]: Stopped PVE Cluster Ressource Manager Daemon.
Jun 21 11:14:18 proxmox8 systemd-modules-load[641]: Inserted module 'iscsi_tcp'
Jun 21 11:14:18 proxmox8 kernel: [ 0.000000] Linux version 4.15.17-2-pve (tlamprecht@evita) (gcc version 6.3.0 20170516 (Debian 6.3.0-18+deb9u1)) #1 SMP PVE 4.15.17-10 (Tue, 22 May 2018 11:15:44 +0200) ()
Jun 21 11:14:18 proxmox8 kernel: [ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-4.15.17-2-pve root=/dev/mapper/pve-root ro quiet
Jun 21 11:14:18 proxmox8 kernel: [ 0.000000] KERNEL supported cpus:
Jun 21 11:14:18 proxmox8 kernel: [ 0.000000] Intel GenuineIntel
.
.
.
Jun 21 11:14:26 proxmox8 systemd[1]: Started The Proxmox VE cluster filesystem.
Jun 21 11:14:26 proxmox8 systemd[1]: Starting PVE Status Daemon...
Jun 21 11:14:26 proxmox8 systemd[1]: Starting Corosync Cluster Engine...
Jun 21 11:14:26 proxmox8 systemd[1]: Started Regular background program processing daemon.
Jun 21 11:14:26 proxmox8 systemd[1]: Starting Proxmox VE firewall...
Jun 21 11:14:26 proxmox8 cron[2707]: (CRON) INFO (pidfile fd = 3)
Jun 21 11:14:26 proxmox8 cron[2707]: (CRON) INFO (Running @reboot jobs)
Jun 21 11:14:26 proxmox8 corosync[2703]: [MAIN ] Corosync Cluster Engine ('2.4.2-dirty'): started and ready to provide service.
Jun 21 11:14:26 proxmox8 corosync[2703]: notice [MAIN ] Corosync Cluster Engine ('2.4.2-dirty'): started and ready to provide service.
Jun 21 11:14:26 proxmox8 corosync[2703]: info [MAIN ] Corosync built-in features: dbus rdma monitoring watchdog augeas systemd upstart xmlconf qdevices qnetd snmp pie relro bindnow
Jun 21 11:14:26 proxmox8 corosync[2703]: [MAIN ] Corosync built-in features: dbus rdma monitoring watchdog augeas systemd upstart xmlconf qdevices qnetd snmp pie relro bindnow
Jun 21 11:14:26 proxmox8 corosync[2703]: notice [TOTEM ] Initializing transport (UDP/IP Multicast).
Jun 21 11:14:26 proxmox8 corosync[2703]: notice [TOTEM ] Initializing transmit/receive security (NSS) crypto: aes256 hash: sha1
Jun 21 11:14:26 proxmox8 corosync[2703]: [TOTEM ] Initializing transport (UDP/IP Multicast).
Jun 21 11:14:26 proxmox8 corosync[2703]: [TOTEM ] Initializing transmit/receive security (NSS) crypto: aes256 hash: sha1
Jun 21 11:14:26 proxmox8 corosync[2703]: notice [TOTEM ] Initializing transport (UDP/IP Multicast).
Jun 21 11:14:26 proxmox8 corosync[2703]: notice [TOTEM ] Initializing transmit/receive security (NSS) crypto: aes256 hash: sha1
Jun 21 11:14:26 proxmox8 corosync[2703]: [TOTEM ] Initializing transport (UDP/IP Multicast).
Jun 21 11:14:26 proxmox8 corosync[2703]: [TOTEM ] Initializing transmit/receive security (NSS) crypto: aes256 hash: sha1
Jun 21 11:14:26 proxmox8 corosync[2703]: notice [TOTEM ] The network interface [10.0.136.52] is now up.
Jun 21 11:14:26 proxmox8 corosync[2703]: notice [SERV ] Service engine loaded: corosync configuration map access [0]
Jun 21 11:14:26 proxmox8 corosync[2703]: info [QB ] server name: cmap
Jun 21 11:14:26 proxmox8 corosync[2703]: [TOTEM ] The network interface [10.0.136.52] is now up.
Jun 21 11:14:26 proxmox8 corosync[2703]: notice [SERV ] Service engine loaded: corosync configuration service [1]
Jun 21 11:14:26 proxmox8 corosync[2703]: info [QB ] server name: cfg
Jun 21 11:14:26 proxmox8 corosync[2703]: notice [SERV ] Service engine loaded: corosync cluster closed process group service v1.01 [2]
Jun 21 11:14:26 proxmox8 corosync[2703]: info [QB ] server name: cpg
Jun 21 11:14:26 proxmox8 corosync[2703]: notice [SERV ] Service engine loaded: corosync profile loading service [4]
Jun 21 11:14:26 proxmox8 corosync[2703]: notice [SERV ] Service engine loaded: corosync resource monitoring service [6]
Jun 21 11:14:26 proxmox8 corosync[2703]: warning [WD ] Watchdog /dev/watchdog exists but couldn't be opened.
Jun 21 11:14:26 proxmox8 corosync[2703]: warning [WD ] resource load_15min missing a recovery key.
Jun 21 11:14:26 proxmox8 corosync[2703]: warning [WD ] resource memory_used missing a recovery key.
Jun 21 11:14:26 proxmox8 corosync[2703]: [SERV ] Service engine loaded: corosync configuration map access [0]
Jun 21 11:14:26 proxmox8 corosync[2703]: info [WD ] no resources configured.
Jun 21 11:14:26 proxmox8 corosync[2703]: notice [SERV ] Service engine loaded: corosync watchdog service [7]
Jun 21 11:14:26 proxmox8 corosync[2703]: notice [QUORUM] Using quorum provider corosync_votequorum
Jun 21 11:14:26 proxmox8 corosync[2703]: notice [SERV ] Service engine loaded: corosync vote quorum service v1.0 [5]
Jun 21 11:14:26 proxmox8 corosync[2703]: info [QB ] server name: votequorum
Jun 21 11:14:26 proxmox8 corosync[2703]: notice [SERV ] Service engine loaded: corosync cluster quorum service v0.1 [3]
Jun 21 11:14:26 proxmox8 corosync[2703]: info [QB ] server name: quorum
Jun 21 11:14:26 proxmox8 corosync[2703]: notice [TOTEM ] The network interface [169.254.5.5] is now up.
Jun 21 11:14:26 proxmox8 corosync[2703]: warning [TOTEM ] JOIN or LEAVE message was thrown away during flush operation.
Jun 21 11:14:26 proxmox8 corosync[2703]: [QB ] server name: cmap
Jun 21 11:14:26 proxmox8 systemd[1]: Started Corosync Cluster Engine.
Jun 21 11:14:26 proxmox8 corosync[2703]: notice [TOTEM ] A new membership (10.0.136.52:74932) was formed. Members joined: 9
Jun 21 11:14:26 proxmox8 corosync[2703]: warning [CPG ] downlist left_list: 0 received
Jun 21 11:14:26 proxmox8 corosync[2703]: notice [QUORUM] Members[1]: 9
Jun 21 11:14:26 proxmox8 corosync[2703]: notice [MAIN ] Completed service synchronization, ready to provide service.
Jun 21 11:14:26 proxmox8 corosync[2703]: [SERV ] Service engine loaded: corosync configuration service [1]
Jun 21 11:14:26 proxmox8 systemd[1]: Starting PVE API Daemon...
Jun 21 11:14:26 proxmox8 corosync[2703]: [QB ] server name: cfg
Jun 21 11:14:26 proxmox8 corosync[2703]: [SERV ] Service engine loaded: corosync cluster closed process group service v1.01 [2]
Jun 21 11:14:26 proxmox8 corosync[2703]: [QB ] server name: cpg
Jun 21 11:14:26 proxmox8 corosync[2703]: [SERV ] Service engine loaded: corosync profile loading service [4]
Jun 21 11:14:26 proxmox8 corosync[2703]: [SERV ] Service engine loaded: corosync resource monitoring service [6]
Jun 21 11:14:26 proxmox8 corosync[2703]: [WD ] Watchdog /dev/watchdog exists but couldn't be opened.
Jun 21 11:14:26 proxmox8 corosync[2703]: [WD ] resource load_15min missing a recovery key.
Jun 21 11:14:26 proxmox8 corosync[2703]: [WD ] resource memory_used missing a recovery key.
Jun 21 11:14:26 proxmox8 corosync[2703]: [WD ] no resources configured.
Jun 21 11:14:26 proxmox8 corosync[2703]: [SERV ] Service engine loaded: corosync watchdog service [7]
Jun 21 11:14:26 proxmox8 corosync[2703]: [QUORUM] Using quorum provider corosync_votequorum
Jun 21 11:14:26 proxmox8 corosync[2703]: [SERV ] Service engine loaded: corosync vote quorum service v1.0 [5]
Jun 21 11:14:26 proxmox8 corosync[2703]: [QB ] server name: votequorum
Jun 21 11:14:26 proxmox8 corosync[2703]: [SERV ] Service engine loaded: corosync cluster quorum service v0.1 [3]
Jun 21 11:14:26 proxmox8 corosync[2703]: [QB ] server name: quorum
Jun 21 11:14:26 proxmox8 corosync[2703]: [TOTEM ] The network interface [169.254.5.5] is now up.
Jun 21 11:14:26 proxmox8 corosync[2703]: [TOTEM ] JOIN or LEAVE message was thrown away during flush operation.
Jun 21 11:14:26 proxmox8 corosync[2703]: [TOTEM ] A new membership (10.0.136.52:74932) was formed. Members joined: 9
Jun 21 11:14:26 proxmox8 corosync[2703]: [CPG ] downlist left_list: 0 received
Jun 21 11:14:26 proxmox8 corosync[2703]: [QUORUM] Members[1]: 9
Jun 21 11:14:26 proxmox8 corosync[2703]: [MAIN ] Completed service synchronization, ready to provide service.
Jun 21 11:14:26 proxmox8 pve-firewall[2754]: starting server
Jun 21 11:14:26 proxmox8 pvestatd[2755]: starting server
Jun 21 11:14:26 proxmox8 systemd[1]: Started Proxmox VE firewall.
Jun 21 11:14:26 proxmox8 systemd[1]: Started PVE Status Daemon.
Jun 21 11:14:26 proxmox8 kernel: [ 13.729489] ip6_tables: (C) 2000-2006 Netfilter Core Team
Jun 21 11:14:27 proxmox8 kernel: [ 13.823759] ip_set: protocol 6
Jun 21 11:14:27 proxmox8 pvedaemon[2777]: starting server
Jun 21 11:14:27 proxmox8 pvedaemon[2777]: starting 3 worker(s)
Jun 21 11:14:27 proxmox8 pvedaemon[2777]: worker 2780 started
Jun 21 11:14:27 proxmox8 pvedaemon[2777]: worker 2781 started
Jun 21 11:14:27 proxmox8 pvedaemon[2777]: worker 2782 started
Jun 21 11:14:27 proxmox8 systemd[1]: Started PVE API Daemon.
Jun 21 11:14:27 proxmox8 systemd[1]: Starting PVE API Proxy Server...
Jun 21 11:14:27 proxmox8 systemd[1]: Starting PVE Cluster Ressource Manager Daemon...
Jun 21 11:14:27 proxmox8 pve-ha-crm[2803]: starting server
Jun 21 11:14:27 proxmox8 pve-ha-crm[2803]: status change startup => wait_for_quorum
Jun 21 11:14:27 proxmox8 systemd[1]: Started PVE Cluster Ressource Manager Daemon.
Jun 21 11:14:27 proxmox8 systemd[1]: Starting PVE Local HA Ressource Manager Daemon...
Jun 21 11:14:28 proxmox8 pveproxy[2821]: starting server
Jun 21 11:14:28 proxmox8 pveproxy[2821]: starting 3 worker(s)
Jun 21 11:14:28 proxmox8 pveproxy[2821]: worker 2824 started
Jun 21 11:14:28 proxmox8 pveproxy[2821]: worker 2825 started
Jun 21 11:14:28 proxmox8 pveproxy[2821]: worker 2826 started
Jun 21 11:14:28 proxmox8 systemd[1]: Started PVE API Proxy Server.
Jun 21 11:14:28 proxmox8 systemd[1]: Starting PVE SPICE Proxy Server...
Jun 21 11:14:28 proxmox8 pve-ha-lrm[2845]: starting server
Jun 21 11:14:28 proxmox8 pve-ha-lrm[2845]: status change startup => wait_for_agent_lock
Jun 21 11:14:28 proxmox8 systemd[1]: Started PVE Local HA Ressource Manager Daemon.
Jun 21 11:14:28 proxmox8 spiceproxy[2850]: starting server
Jun 21 11:14:28 proxmox8 spiceproxy[2850]: starting 1 worker(s)
Jun 21 11:14:28 proxmox8 spiceproxy[2850]: worker 2853 started
Jun 21 11:14:28 proxmox8 systemd[1]: Started PVE SPICE Proxy Server.
Jun 21 11:14:28 proxmox8 systemd[1]: Starting PVE guests...
Jun 21 11:14:29 proxmox8 pve-guests[2856]: <root@pam> starting task UPID:proxmox8:00000B38:00000645:5B2B6C75:startall::root@pam:
Jun 21 11:14:29 proxmox8 pvesh[2856]: waiting for quorum ...
Jun 21 11:14:31 proxmox8 pmxcfs[2678]: [status] notice: update cluster info (cluster name tera-cluster, version = 25)
Jun 21 11:14:48 proxmox8 systemd[1]: Created slice User Slice of root.
Jun 21 11:14:48 proxmox8 systemd[1]: Starting User Manager for UID 0...
Jun 21 11:14:48 proxmox8 systemd[1]: Started Session 1 of user root.
Jun 21 11:14:48 proxmox8 systemd[3089]: Listening on GnuPG cryptographic agent and passphrase cache (restricted).
Jun 21 11:14:48 proxmox8 systemd[3089]: Listening on GnuPG cryptographic agent and passphrase cache (restricted).
Jun 21 11:14:48 proxmox8 systemd[3089]: Reached target Paths.
Jun 21 11:14:48 proxmox8 systemd[3089]: Listening on GnuPG cryptographic agent (access for web browsers).
Jun 21 11:14:48 proxmox8 systemd[3089]: Listening on GnuPG network certificate management daemon.
Jun 21 11:14:48 proxmox8 systemd[3089]: Reached target Timers.
Jun 21 11:14:48 proxmox8 systemd[3089]: Listening on GnuPG cryptographic agent and passphrase cache.
Jun 21 11:14:48 proxmox8 systemd[3089]: Reached target Sockets.
Jun 21 11:14:48 proxmox8 systemd[3089]: Reached target Basic System.
Jun 21 11:14:48 proxmox8 systemd[3089]: Reached target Default.
Jun 21 11:14:48 proxmox8 systemd[3089]: Startup finished in 14ms.
Jun 21 11:14:48 proxmox8 systemd[1]: Started User Manager for UID 0.
Jun 21 11:14:49 proxmox8 systemd-timesyncd[1105]: Synchronized to time server 195.154.189.15:123 (2.debian.pool.ntp.org).
Jun 21 11:15:00 proxmox8 systemd[1]: Starting Proxmox VE replication runner...
Jun 21 11:15:00 proxmox8 pvesr[3234]: trying to aquire cfs lock 'file-replication_cfg' ...
Jun 21 11:15:01 proxmox8 CRON[3253]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Jun 21 11:15:01 proxmox8 pvesr[3234]: trying to aquire cfs lock 'file-replication_cfg' ...
Jun 21 11:15:02 proxmox8 pvesr[3234]: trying to aquire cfs lock 'file-replication_cfg' ...
Jun 21 11:15:03 proxmox8 pvesr[3234]: trying to aquire cfs lock 'file-replication_cfg' ...
Jun 21 11:15:04 proxmox8 pvesr[3234]: trying to aquire cfs lock 'file-replication_cfg' ...
Jun 21 11:15:05 proxmox8 pvesr[3234]: trying to aquire cfs lock 'file-replication_cfg' ...
Jun 21 11:15:06 proxmox8 pvesr[3234]: trying to aquire cfs lock 'file-replication_cfg' ...
Jun 21 11:15:07 proxmox8 pvesr[3234]: trying to aquire cfs lock 'file-replication_cfg' ...
Jun 21 11:15:08 proxmox8 pvesr[3234]: trying to aquire cfs lock 'file-replication_cfg' ...
Jun 21 11:15:09 proxmox8 systemd[1]: Stopping User Manager for UID 0...
Jun 21 11:15:09 proxmox8 systemd[3089]: Stopped target Default.
Jun 21 11:15:09 proxmox8 systemd[3089]: Stopped target Basic System.
Jun 21 11:15:09 proxmox8 systemd[3089]: Stopped target Sockets.
Jun 21 11:15:09 proxmox8 systemd[3089]: Closed GnuPG network certificate management daemon.
Jun 21 11:15:09 proxmox8 systemd[3089]: Closed GnuPG cryptographic agent (ssh-agent emulation).
Jun 21 11:15:09 proxmox8 systemd[3089]: Closed GnuPG cryptographic agent (access for web browsers).
Jun 21 11:15:09 proxmox8 systemd[3089]: Closed GnuPG cryptographic agent and passphrase cache (restricted).
Jun 21 11:15:09 proxmox8 systemd[3089]: Stopped target Timers.
Jun 21 11:15:09 proxmox8 systemd[3089]: Stopped target Paths.
Jun 21 11:15:09 proxmox8 systemd[3089]: Closed GnuPG cryptographic agent and passphrase cache.
Jun 21 11:15:09 proxmox8 systemd[3089]: Reached target Shutdown.
Jun 21 11:15:09 proxmox8 systemd[3089]: Starting Exit the Session...
Jun 21 11:15:09 proxmox8 systemd[3089]: Received SIGRTMIN+24 from PID 3354 (kill).
Jun 21 11:15:09 proxmox8 systemd[1]: Stopped User Manager for UID 0.
Jun 21 11:15:09 proxmox8 systemd[1]: Removed slice User Slice of root.
Jun 21 11:15:09 proxmox8 pvesr[3234]: error with cfs lock 'file-replication_cfg': no quorum!
Jun 21 11:15:09 proxmox8 systemd[1]: pvesr.service: Main process exited, code=exited, status=13/n/a
Jun 21 11:15:09 proxmox8 systemd[1]: Failed to start Proxmox VE replication runner.
Jun 21 11:15:09 proxmox8 systemd[1]: pvesr.service: Unit entered failed state.
Jun 21 11:15:09 proxmox8 systemd[1]: pvesr.service: Failed with result 'exit-code'.
Jun 21 11:15:10 proxmox8 systemd[1]: Created slice User Slice of root.
Jun 21 11:15:10 proxmox8 systemd[1]: Starting User Manager for UID 0...
Jun 21 11:15:10 proxmox8 systemd[1]: Started Session 4 of user root.
Jun 21 11:15:10 proxmox8 systemd[3379]: Listening on GnuPG cryptographic agent and passphrase cache.
Jun 21 11:15:10 proxmox8 systemd[3379]: Listening on GnuPG cryptographic agent (ssh-agent emulation).
Jun 21 11:15:10 proxmox8 systemd[3379]: Reached target Timers.
Jun 21 11:15:10 proxmox8 systemd[3379]: Listening on GnuPG cryptographic agent and passphrase cache (restricted).
Jun 21 11:15:10 proxmox8 systemd[3379]: Reached target Paths.
Jun 21 11:15:10 proxmox8 systemd[3379]: Listening on GnuPG network certificate management daemon.
Jun 21 11:15:10 proxmox8 systemd[3379]: Listening on GnuPG cryptographic agent (access for web browsers).
Jun 21 11:15:10 proxmox8 systemd[3379]: Reached target Sockets.
Jun 21 11:15:10 proxmox8 systemd[3379]: Reached target Basic System.
Jun 21 11:15:10 proxmox8 systemd[3379]: Reached target Default.
Jun 21 11:15:10 proxmox8 systemd[3379]: Startup finished in 15ms.
Jun 21 11:15:10 proxmox8 systemd[1]: Started User Manager for UID 0.
Jun 21 11:15:44 proxmox8 corosync[2703]: notice [TOTEM ] A new membership (10.0.136.20:75036) was formed. Members joined: 8
Jun 21 11:15:44 proxmox8 corosync[2703]: [TOTEM ] A new membership (10.0.136.20:75036) was formed. Members joined: 8
Jun 21 11:15:44 proxmox8 corosync[2703]: warning [CPG ] downlist left_list: 0 received
Jun 21 11:15:44 proxmox8 corosync[2703]: [CPG ] downlist left_list: 0 received
Jun 21 11:15:44 proxmox8 corosync[2703]: warning [CPG ] downlist left_list: 6 received
Jun 21 11:15:44 proxmox8 corosync[2703]: [CPG ] downlist left_list: 6 received
Jun 21 11:15:44 proxmox8 corosync[2703]: notice [QUORUM] Members[2]: 8 9
Jun 21 11:15:44 proxmox8 corosync[2703]: notice [MAIN ] Completed service synchronization, ready to provide service.
Jun 21 11:15:44 proxmox8 corosync[2703]: [QUORUM] Members[2]: 8 9
Jun 21 11:15:44 proxmox8 corosync[2703]: [MAIN ] Completed service synchronization, ready to provide service.
Jun 21 11:15:44 proxmox8 pmxcfs[2678]: [dcdb] notice: members: 8/525829, 9/2678
Jun 21 11:15:44 proxmox8 pmxcfs[2678]: [dcdb] notice: starting data syncronisation
Jun 21 11:15:44 proxmox8 pmxcfs[2678]: [status] notice: members: 8/525829, 9/2678
Jun 21 11:15:44 proxmox8 pmxcfs[2678]: [status] notice: starting data syncronisation
Jun 21 11:15:44 proxmox8 pmxcfs[2678]: [dcdb] notice: members: 9/2678
Jun 21 11:15:44 proxmox8 pmxcfs[2678]: [dcdb] notice: all data is up to date
Jun 21 11:15:44 proxmox8 pmxcfs[2678]: [status] notice: received sync request (epoch 8/525829/00000067)
Jun 21 11:15:44 proxmox8 pmxcfs[2678]: [status] notice: received all states
Jun 21 11:15:44 proxmox8 pmxcfs[2678]: [status] notice: all data is up to date
Jun 21 11:15:44 proxmox8 pmxcfs[2678]: [status] notice: dfsm_deliver_queue: queue length 482
Jun 21 11:15:44 proxmox8 pmxcfs[2678]: [status] notice: received log
Jun 21 11:15:44 proxmox8 pmxcfs[2678]: [main] notice: ignore duplicate
Jun 21 11:15:44 proxmox8 pmxcfs[2678]: [status] notice: received log
Jun 21 11:15:44 proxmox8 pmxcfs[2678]: [main] notice: ignore duplicate
Jun 21 11:15:44 proxmox8 pmxcfs[2678]: [status] notice: received log
Jun 21 11:15:44 proxmox8 pmxcfs[2678]: [main] notice: ignore duplicate
P9:
no logs, offline.
P11:
Code:
Jun 21 11:14:24 proxmox11 pmxcfs[2521]: [status] notice: received log
Jun 21 11:14:25 proxmox11 pmxcfs[2521]: [status] notice: received log
Jun 21 11:14:25 proxmox11 pmxcfs[2521]: [status] notice: received log
Jun 21 11:14:25 proxmox11 pmxcfs[2521]: [status] notice: received log
Jun 21 11:14:25 proxmox11 pmxcfs[2521]: [status] notice: received log
Jun 21 11:14:26 proxmox11 pmxcfs[2521]: [status] notice: received log
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@Jun 21 11:17:13 proxmox11 systemd-modules-load[509]: Inserted module 'iscsi_tcp'
Jun 21 11:17:13 proxmox11 systemd-modules-load[509]: Inserted module 'ib_iser'
Jun 21 11:17:13 proxmox11 systemd-modules-load[509]: Inserted module 'vhost_net'
Jun 21 11:17:13 proxmox11 keyboard-setup.sh[519]: cannot open file /tmp/tmpkbd.fljNd0
Jun 21 11:17:13 proxmox11 systemd[1]: Starting Flush Journal to Persistent Storage...
Jun 21 11:17:13 proxmox11 systemd[1]: Starting udev Wait for Complete Device Initialization...
Jun 21 11:17:13 proxmox11 systemd[1]: Listening on Load/Save RF Kill Switch Status /dev/rfkill Watch
P-ARB:
Code:
Jun 21 11:15:29 proxmox-arb ceph-mgr[3780771]: ::ffff:195.49.132.87 - - [21/Jun/2018:11:15:29] "OPTIONS / HTTP/1.0" 302 131 "" ""
Jun 21 11:15:29 proxmox-arb pveproxy[2274070]: proxy detected vanished client connection
Jun 21 11:15:29 proxmox-arb pveproxy[2276100]: proxy detected vanished client connection
Jun 21 11:15:31 proxmox-arb pvedaemon[2246776]: worker exit
Jun 21 11:15:31 proxmox-arb pvedaemon[2095]: worker 2246776 finished
Jun 21 11:15:31 proxmox-arb pvedaemon[2095]: starting 1 worker(s)
Jun 21 11:15:31 proxmox-arb pvedaemon[2095]: worker 2290483 started
Jun 21 11:15:31 proxmox-arb ceph-mgr[3780771]: ::ffff:195.49.132.87 - - [21/Jun/2018:11:15:31] "OPTIONS / HTTP/1.0" 302 131 "" ""
Jun 21 11:15:33 proxmox-arb ceph-mgr[3780771]: ::ffff:195.49.132.87 - - [21/Jun/2018:11:15:33] "OPTIONS / HTTP/1.0" 302 131 "" ""
Jun 21 11:15:39 proxmox-arb snmpd[1539]: error on subcontainer 'ia_addr' insert (-1)
Jun 21 11:15:39 proxmox-arb snmpd[1539]: error on subcontainer 'ia_addr' insert (-1)
Jun 21 11:15:44 proxmox-arb corosync[3975603]: notice [TOTEM ] A new membership (10.0.136.20:75036) was formed. Members joined: 9 left: 6 5 2 7 1 3
Jun 21 11:15:44 proxmox-arb corosync[3975603]: notice [TOTEM ] Failed to receive the leave message. failed: 6 5 2 7 1 3
Jun 21 11:15:44 proxmox-arb corosync[3975603]: [TOTEM ] A new membership (10.0.136.20:75036) was formed. Members joined: 9 left: 6 5 2 7 1 3
Jun 21 11:15:44 proxmox-arb corosync[3975603]: [TOTEM ] Failed to receive the leave message. failed: 6 5 2 7 1 3
Jun 21 11:15:44 proxmox-arb corosync[3975603]: warning [CPG ] downlist left_list: 0 received
Jun 21 11:15:44 proxmox-arb corosync[3975603]: warning [CPG ] downlist left_list: 6 received
Jun 21 11:15:44 proxmox-arb corosync[3975603]: [CPG ] downlist left_list: 0 received
Jun 21 11:15:44 proxmox-arb corosync[3975603]: [CPG ] downlist left_list: 6 received
Jun 21 11:15:44 proxmox-arb pmxcfs[525829]: [dcdb] notice: members: 8/525829
Jun 21 11:15:44 proxmox-arb pmxcfs[525829]: [status] notice: members: 8/525829
Jun 21 11:15:44 proxmox-arb corosync[3975603]: notice [QUORUM] This node is within the non-primary component and will NOT provide any services.
Jun 21 11:15:44 proxmox-arb corosync[3975603]: notice [QUORUM] Members[2]: 8 9
Jun 21 11:15:44 proxmox-arb corosync[3975603]: notice [MAIN ] Completed service synchronization, ready to provide service.
Jun 21 11:15:44 proxmox-arb corosync[3975603]: [QUORUM] This node is within the non-primary component and will NOT provide any services.
Jun 21 11:15:44 proxmox-arb corosync[3975603]: [QUORUM] Members[2]: 8 9
Jun 21 11:15:44 proxmox-arb corosync[3975603]: [MAIN ] Completed service synchronization, ready to provide service.
Jun 21 11:15:44 proxmox-arb pmxcfs[525829]: [status] notice: node lost quorum
Jun 21 11:15:44 proxmox-arb pmxcfs[525829]: [dcdb] crit: received write while not quorate - trigger resync
Jun 21 11:15:44 proxmox-arb pmxcfs[525829]: [dcdb] crit: leaving CPG group
Jun 21 11:15:44 proxmox-arb pve-ha-lrm[536042]: unable to write lrm status file - unable to open file '/etc/pve/nodes/proxmox-arb/lrm_status.tmp.536042' - Permission denied
Jun 21 11:15:44 proxmox-arb pmxcfs[525829]: [status] notice: members: 8/525829, 9/2678
Jun 21 11:15:44 proxmox-arb pmxcfs[525829]: [status] notice: starting data syncronisation
Jun 21 11:15:44 proxmox-arb pmxcfs[525829]: [status] notice: received sync request (epoch 8/525829/00000067)
Jun 21 11:15:44 proxmox-arb pmxcfs[525829]: [status] notice: received all states
Jun 21 11:15:44 proxmox-arb pmxcfs[525829]: [status] notice: all data is up to date
Jun 21 11:15:44 proxmox-arb pmxcfs[525829]: [status] notice: dfsm_deliver_queue: queue length 482
Jun 21 11:15:44 proxmox-arb pvesr[2286104]: trying to aquire cfs lock 'file-replication_cfg' ...
Jun 21 11:15:44 proxmox-arb pmxcfs[525829]: [dcdb] notice: start cluster connection
Jun 21 11:15:44 proxmox-arb pve-ha-crm[536352]: loop take too long (79 seconds)
Jun 21 11:15:44 proxmox-arb pve-ha-crm[536352]: status change slave => wait_for_quorum
Jun 21 11:15:44 proxmox-arb pmxcfs[525829]: [dcdb] notice: members: 8/525829, 9/2678
Jun 21 11:15:44 proxmox-arb pmxcfs[525829]: [dcdb] notice: starting data syncronisation
Jun 21 11:15:44 proxmox-arb pmxcfs[525829]: [dcdb] notice: received sync request (epoch 8/525829/00000070)
Jun 21 11:15:44 proxmox-arb pmxcfs[525829]: [dcdb] notice: received all states
Jun 21 11:15:44 proxmox-arb pmxcfs[525829]: [dcdb] notice: leader is 8/525829
Jun 21 11:15:44 proxmox-arb pmxcfs[525829]: [dcdb] notice: synced members: 8/525829
Jun 21 11:15:44 proxmox-arb pmxcfs[525829]: [dcdb] notice: start sending inode updates
Jun 21 11:15:44 proxmox-arb pmxcfs[525829]: [dcdb] notice: sent all (17) updates
Jun 21 11:15:44 proxmox-arb pmxcfs[525829]: [dcdb] notice: all data is up to date
Jun 21 11:15:45 proxmox-arb pvesr[2286104]: trying to aquire cfs lock 'file-replication_cfg' ...
Jun 21 11:15:46 proxmox-arb pvesr[2286104]: trying to aquire cfs lock 'file-replication_cfg' ...
Jun 21 11:15:47 proxmox-arb pvesr[2286104]: trying to aquire cfs lock 'file-replication_cfg' ...
Jun 21 11:15:48 proxmox-arb pvesr[2286104]: trying to aquire cfs lock 'file-replication_cfg' ...
Jun 21 11:15:49 proxmox-arb pve-ha-lrm[536042]: loop take too long (83 seconds)
Jun 21 11:15:49 proxmox-arb pvesr[2286104]: trying to aquire cfs lock 'file-replication_cfg' ...
Jun 21 11:15:50 proxmox-arb pvesr[2286104]: trying to aquire cfs lock 'file-replication_cfg' ...
Jun 21 11:15:51 proxmox-arb pvesr[2286104]: trying to aquire cfs lock 'file-replication_cfg' ...
Jun 21 11:15:52 proxmox-arb pvesr[2286104]: trying to aquire cfs lock 'file-replication_cfg' ...
Jun 21 11:15:53 proxmox-arb pvesr[2286104]: error with cfs lock 'file-replication_cfg': no quorum!
Jun 21 11:15:53 proxmox-arb systemd[1]: pvesr.service: Main process exited, code=exited, status=13/n/a
Jun 21 11:15:53 proxmox-arb systemd[1]: Failed to start Proxmox VE replication runner.
Jun 21 11:15:53 proxmox-arb systemd[1]: pvesr.service: Unit entered failed state.
Jun 21 11:15:53 proxmox-arb systemd[1]: pvesr.service: Failed with result 'exit-code'.
Jun 21 11:16:00 proxmox-arb systemd[1]: Starting Proxmox VE replication runner...
Jun 21 11:16:00 proxmox-arb pvesr[2291730]: trying to aquire cfs lock 'file-replication_cfg' ...
Jun 21 11:16:01 proxmox-arb pveproxy[2274070]: proxy detected vanished client connection
Jun 21 11:16:01 proxmox-arb pveproxy[2274070]: proxy detected vanished client connection
Jun 21 11:16:01 proxmox-arb pvesr[2291730]: trying to aquire cfs lock 'file-replication_cfg' ...
Jun 21 11:16:02 proxmox-arb pveproxy[2276100]: proxy detected vanished client connection
Jun 21 11:16:02 proxmox-arb pvesr[2291730]: trying to aquire cfs lock 'file-replication_cfg' ...
Jun 21 11:16:03 proxmox-arb pvesr[2291730]: trying to aquire cfs lock 'file-replication_cfg' ...
Jun 21 11:16:04 proxmox-arb pveproxy[2288635]: proxy detected vanished client connection
Jun 21 11:16:04 proxmox-arb pvesr[2291730]: trying to aquire cfs lock 'file-replication_cfg' ...
Jun 21 11:16:05 proxmox-arb pveproxy[2274070]: proxy detected vanished client connection
Jun 21 11:16:05 proxmox-arb pveproxy[2276100]: proxy detected vanished client connection
Jun 21 11:16:05 proxmox-arb pvesr[2291730]: trying to aquire cfs lock 'file-replication_cfg' ...
Jun 21 11:16:06 proxmox-arb pvesr[2291730]: trying to aquire cfs lock 'file-replication_cfg' ...
Jun 21 11:16:07 proxmox-arb pveproxy[2276100]: proxy detected vanished client connection
Jun 21 11:16:07 proxmox-arb pveproxy[2274070]: proxy detected vanished client connection
Jun 21 11:16:07 proxmox-arb pveproxy[2274070]: proxy detected vanished client connection
Jun 21 11:16:07 proxmox-arb pvesr[2291730]: trying to aquire cfs lock 'file-replication_cfg' ...
Jun 21 11:16:08 proxmox-arb pveproxy[2288635]: proxy detected vanished client connection
Jun 21 11:16:08 proxmox-arb pveproxy[2288635]: proxy detected vanished client connection
Jun 21 11:16:08 proxmox-arb pveproxy[2288635]: proxy detected vanished client connection
Jun 21 11:16:08 proxmox-arb pveproxy[2274070]: proxy detected vanished client connection
Jun 21 11:16:08 proxmox-arb pveproxy[2274070]: proxy detected vanished client connection
Jun 21 11:16:08 proxmox-arb pveproxy[2274070]: proxy detected vanished client connection
Jun 21 11:16:08 proxmox-arb pvesr[2291730]: trying to aquire cfs lock 'file-replication_cfg' ...
Jun 21 11:16:09 proxmox-arb snmpd[1539]: error on subcontainer 'ia_addr' insert (-1)
Jun 21 11:16:09 proxmox-arb snmpd[1539]: error on subcontainer 'ia_addr' insert (-1)
Jun 21 11:16:09 proxmox-arb pvesr[2291730]: error with cfs lock 'file-replication_cfg': no quorum!
Jun 21 11:16:09 proxmox-arb systemd[1]: pvesr.service: Main process exited, code=exited, status=13/n/a
Jun 21 11:16:09 proxmox-arb systemd[1]: Failed to start Proxmox VE replication runner.
Jun 21 11:16:09 proxmox-arb systemd[1]: pvesr.service: Unit entered failed state.
Jun 21 11:16:09 proxmox-arb systemd[1]: pvesr.service: Failed with result 'exit-code'.
Jun 21 11:16:10 proxmox-arb pveproxy[2274070]: proxy detected vanished client connection
Jun 21 11:16:10 proxmox-arb pveproxy[2274070]: proxy detected vanished client connection
I don't understand why all but one node rebooted without any loose of quorum and no corosync logs.
Any idea ?
The pve version is:
pve-manager/5.2-1/0fcd7879 (running kernel: 4.15.17-2-pve)
Thank you.