Hi,
I've just encountered a problem this night. I've a cluster with 4 nodes with Proxmox 4 up to date. a HA GROUP has been created with 2 nodes. This two nodes are located in differents datacenters. This night the nodes have restarted themselves, with an interval of 3 seconds. All my HA strategy go down, and the VM(s) was stopped. Here are the logs (syslog) for the nodes just before restarting :
Nodes A1:
Mar 20 23:17:24 hostA1 kernel: [874541.408572] mport: dropped over-mtu packet: 1572 > 1500
Mar 20 23:17:24 hostA1 kernel: [874541.408620] mport: dropped over-mtu packet: 1572 > 1500
Mar 20 23:17:24 hostA1 kernel: [874541.408632] mport: dropped over-mtu packet: 1572 > 1500
Mar 20 23:17:24 hostA1 kernel: [874541.410499] mport: dropped over-mtu packet: 1572 > 1500
Mar 20 23:17:24 hostA1 kernel: [874541.410547] mport: dropped over-mtu packet: 1572 > 1500
Mar 20 23:17:24 hostA1 pvestatd[1912]: storage 'BACKUP_POC1' is not online
Mar 20 23:17:24 hostA1 corosync[1908]: [TOTEM ] FAILED TO RECEIVE
Mar 20 23:17:24 hostA1 corosync[1908]: [TOTEM ] A new membership (10.65.1.99:732) was formed. Members left: 3 2 1
Mar 20 23:17:24 hostA1 corosync[1908]: [TOTEM ] Failed to receive the leave message. failed: 3 2 1
Mar 20 23:17:24 hostA1 pmxcfs[1814]: [dcdb] notice: members: 4/1814
Mar 20 23:17:24 hostA1 pmxcfs[1814]: [status] notice: members: 4/1814
Mar 20 23:17:24 hostA1 corosync[1908]: [QUORUM] This node is within the non-primary component and will NOT provide any services.
Mar 20 23:17:24 hostA1 corosync[1908]: [QUORUM] Members[1]: 4
Mar 20 23:17:24 hostA1 corosync[1908]: [MAIN ] Completed service synchronization, ready to provide service.
Mar 20 23:17:24 hostA1 pmxcfs[1814]: [status] notice: node lost quorum
Mar 20 23:17:24 hostA1 pmxcfs[1814]: [dcdb] crit: received write while not quorate - trigger resync
Mar 20 23:17:24 hostA1 pmxcfs[1814]: [dcdb] crit: leaving CPG group
Mar 20 23:17:24 hostA1 pmxcfs[1814]: [dcdb] notice: start cluster connection
Mar 20 23:17:24 hostA1 pve-ha-lrm[2058]: lost lock 'ha_agent_hostA1_lock - cfs lock update failed - Device or resource busy
Mar 20 23:17:24 hostA1 pve-ha-crm[1952]: status change slave => wait_for_quorum
Mar 20 23:17:26 hostA1 pve-ha-lrm[2058]: status change active => lost_agent_lock
Mar 20 23:17:34 hostA1 pvestatd[1912]: storage 'BACKUP_POC1' is not online
Mar 20 23:17:44 hostA1 pvestatd[1912]: storage 'BACKUP_POC1' is not online
Mar 20 23:17:44 hostA1 corosync[1908]: [MAIN ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.
Mar 20 23:17:45 hostA1 corosync[1908]: [MAIN ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.
Mar 20 23:17:47 hostA1 corosync[1908]: [MAIN ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.
Mar 20 23:17:48 hostA1 corosync[1908]: [MAIN ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.
Mar 20 23:17:50 hostA1 corosync[1908]: [MAIN ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.
Mar 20 23:17:51 hostA1 corosync[1908]: [MAIN ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.
Mar 20 23:17:53 hostA1 corosync[1908]: [MAIN ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.
Mar 20 23:17:54 hostA1 pvestatd[1912]: storage 'BACKUP_POC1' is not online
Mar 20 23:17:54 hostA1 corosync[1908]: [MAIN ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.
Mar 20 23:17:56 hostA1 corosync[1908]: [MAIN ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.
Mar 20 23:17:57 hostA1 corosync[1908]: [MAIN ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.
Mar 20 23:17:59 hostA1 corosync[1908]: [MAIN ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.
Mar 20 23:18:00 hostA1 corosync[1908]: [MAIN ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.
Mar 20 23:18:01 hostA1 CRON[6908]: (root) CMD (/usr/local/rtm/bin/rtm 42 > /dev/null 2> /dev/null)
Mar 20 23:18:02 hostA1 corosync[1908]: [MAIN ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.
Mar 20 23:18:03 hostA1 corosync[1908]: [MAIN ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.
Mar 20 23:18:04 hostA1 pvestatd[1912]: storage 'BACKUP_POC1' is not online
Mar 20 23:18:05 hostA1 corosync[1908]: [MAIN ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.
Mar 20 23:18:06 hostA1 corosync[1908]: [MAIN ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.
Mar 20 23:18:08 hostA1 corosync[1908]: [MAIN ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.
Mar 20 23:18:09 hostA1 corosync[1908]: [MAIN ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.
Mar 20 23:18:11 hostA1 watchdog-mux[1694]: client watchdog expired - disable watchdog updates
Mar 20 23:18:11 hostA1 corosync[1908]: [MAIN ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.
Mar 20 23:18:12 hostA1 corosync[1908]: [MAIN ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.
Mar 20 23:18:14 hostA1 corosync[1908]: [MAIN ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.
Mar 20 23:18:14 hostA1 pvestatd[1912]: storage 'BACKUP_POC1' is not online
Mar 20 23:18:15 hostA1 corosync[1908]: [MAIN ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.
Mar 20 23:18:17 hostA1 corosync[1908]: [MAIN ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.
Mar 20 23:20:27 hostA1 rsyslogd: [origin software="rsyslogd" swVersion="8.4.2" x-pid="1807" x-info="http://www.rsyslog.com"] start
Node B1 :
Mar 20 23:17:24 hostB1 corosync[1965]: [TOTEM ] Retransmit List: 6 7 8 9
Mar 20 23:17:24 hostB1 corosync[1965]: [TOTEM ] Retransmit List: 6 7 8 9
Mar 20 23:17:24 hostB1 corosync[1965]: [TOTEM ] Retransmit List: 6 7 8 9
Mar 20 23:17:24 hostB1 corosync[1965]: [TOTEM ] Retransmit List: 6 7 8 9
Mar 20 23:17:24 hostB1 corosync[1965]: [TOTEM ] Retransmit List: 6 7 8 9
Mar 20 23:17:24 hostB1 corosync[1965]: [TOTEM ] Retransmit List: 6 7 8 9
Mar 20 23:17:24 hostB1 corosync[1965]: [TOTEM ] Retransmit List: 6 7 8 9
Mar 20 23:17:24 hostB1 corosync[1965]: [TOTEM ] Retransmit List: 6 7 8 9
Mar 20 23:17:24 hostB1 corosync[1965]: [TOTEM ] Retransmit List: 6 7 8 9
Mar 20 23:17:24 hostB1 corosync[1965]: [TOTEM ] Retransmit List: 6 7 8 9
Mar 20 23:17:24 hostB1 corosync[1965]: [TOTEM ] Retransmit List: 6 7 8 9
Mar 20 23:17:24 hostB1 corosync[1965]: [TOTEM ] Retransmit List: 6 7 8 9
Mar 20 23:17:24 hostB1 corosync[1965]: [TOTEM ] Retransmit List: 6 7 8 9
Mar 20 23:17:24 hostB1 corosync[1965]: [TOTEM ] Retransmit List: 6 7 8 9
Mar 20 23:17:24 hostB1 corosync[1965]: [TOTEM ] A new membership (10.65.1.1:736) was formed. Members joined: 3 left: 3 2 4
Mar 20 23:17:24 hostB1 corosync[1965]: [TOTEM ] Failed to receive the leave message. failed: 3 2 4
Mar 20 23:17:24 hostB1 corosync[1965]: [TOTEM ] Retransmit List: 1
Mar 20 23:17:24 hostB1 pmxcfs[1885]: [dcdb] notice: members: 1/1885
Mar 20 23:17:24 hostB1 pmxcfs[1885]: [dcdb] notice: members: 1/1885, 3/1902
Mar 20 23:17:24 hostB1 pmxcfs[1885]: [dcdb] notice: starting data syncronisation
Mar 20 23:17:24 hostB1 corosync[1965]: [QUORUM] This node is within the non-primary component and will NOT provide any services.
Mar 20 23:17:24 hostB1 corosync[1965]: [QUORUM] Members[2]: 3 1
Mar 20 23:17:24 hostB1 corosync[1965]: [MAIN ] Completed service synchronization, ready to provide service.
Mar 20 23:17:24 hostB1 pmxcfs[1885]: [dcdb] notice: cpg_send_message retried 1 times
Mar 20 23:17:24 hostB1 pmxcfs[1885]: [status] notice: members: 1/1885
Mar 20 23:17:24 hostB1 pmxcfs[1885]: [status] notice: node lost quorum
Mar 20 23:17:24 hostB1 pmxcfs[1885]: [status] notice: members: 1/1885, 3/1902
Mar 20 23:17:24 hostB1 pmxcfs[1885]: [status] notice: starting data syncronisation
Mar 20 23:17:24 hostB1 pmxcfs[1885]: [dcdb] notice: received sync request (epoch 1/1885/00000007)
Mar 20 23:17:24 hostB1 pmxcfs[1885]: [status] notice: received sync request (epoch 1/1885/00000007)
Mar 20 23:17:24 hostB1 pmxcfs[1885]: [dcdb] notice: received all states
Mar 20 23:17:24 hostB1 pmxcfs[1885]: [dcdb] notice: leader is 1/1885
Mar 20 23:17:24 hostB1 pmxcfs[1885]: [dcdb] notice: synced members: 1/1885, 3/1902
Mar 20 23:17:24 hostB1 pmxcfs[1885]: [dcdb] notice: start sending inode updates
Mar 20 23:17:24 hostB1 pmxcfs[1885]: [dcdb] notice: sent all (0) updates
Mar 20 23:17:24 hostB1 pmxcfs[1885]: [dcdb] notice: all data is up to date
Mar 20 23:17:24 hostB1 pmxcfs[1885]: [dcdb] notice: dfsm_deliver_queue: queue length 5
Mar 20 23:17:24 hostB1 pmxcfs[1885]: [dcdb] crit: received write while not quorate - trigger resync
Mar 20 23:17:24 hostB1 pmxcfs[1885]: [dcdb] crit: leaving CPG group
Mar 20 23:17:24 hostB1 pmxcfs[1885]: [status] notice: received all states
Mar 20 23:17:24 hostB1 pmxcfs[1885]: [status] notice: all data is up to date
Mar 20 23:17:24 hostB1 pmxcfs[1885]: [status] notice: dfsm_deliver_queue: queue length 27
Mar 20 23:17:24 hostB1 pmxcfs[1885]: [dcdb] notice: start cluster connection
Mar 20 23:17:24 hostB1 pve-ha-lrm[2022]: lost lock 'ha_agent_hostB1_lock - cfs lock update failed - Device or resource busy
Mar 20 23:17:24 hostB1 pve-ha-crm[2010]: status change slave => wait_for_quorum
Mar 20 23:17:24 hostB1 pmxcfs[1885]: [dcdb] notice: members: 1/1885
Mar 20 23:17:24 hostB1 pmxcfs[1885]: [dcdb] notice: all data is up to date
Mar 20 23:17:25 hostB1 pmxcfs[1885]: [dcdb] notice: members: 1/1885, 3/1902
Mar 20 23:17:25 hostB1 pmxcfs[1885]: [dcdb] notice: starting data syncronisation
Mar 20 23:17:25 hostB1 pmxcfs[1885]: [dcdb] notice: received sync request (epoch 1/1885/0000000A)
Mar 20 23:17:25 hostB1 pmxcfs[1885]: [dcdb] notice: received all states
Mar 20 23:17:25 hostB1 pmxcfs[1885]: [dcdb] notice: leader is 1/1885
Mar 20 23:17:25 hostB1 pmxcfs[1885]: [dcdb] notice: synced members: 1/1885, 3/1902
Mar 20 23:17:25 hostB1 pmxcfs[1885]: [dcdb] notice: start sending inode updates
Mar 20 23:17:25 hostB1 pmxcfs[1885]: [dcdb] notice: sent all (0) updates
Mar 20 23:17:25 hostB1 pmxcfs[1885]: [dcdb] notice: all data is up to date
Mar 20 23:17:27 hostB1 pve-ha-lrm[2022]: status change active => lost_agent_lock
Mar 20 23:18:01 hostB1 CRON[27196]: (root) CMD (/usr/local/rtm/bin/rtm 58 > /dev/null 2> /dev/null)
Mar 20 23:18:12 hostB1 watchdog-mux[1752]: client watchdog expired - disable watchdog updates
Mar 20 23:20:24 hostB1 rsyslogd: [origin software="rsyslogd" swVersion="8.4.2" x-pid="1778" x-info="http://www.rsyslog.com"] start
Please Help
Best regards
I've just encountered a problem this night. I've a cluster with 4 nodes with Proxmox 4 up to date. a HA GROUP has been created with 2 nodes. This two nodes are located in differents datacenters. This night the nodes have restarted themselves, with an interval of 3 seconds. All my HA strategy go down, and the VM(s) was stopped. Here are the logs (syslog) for the nodes just before restarting :
Nodes A1:
Mar 20 23:17:24 hostA1 kernel: [874541.408572] mport: dropped over-mtu packet: 1572 > 1500
Mar 20 23:17:24 hostA1 kernel: [874541.408620] mport: dropped over-mtu packet: 1572 > 1500
Mar 20 23:17:24 hostA1 kernel: [874541.408632] mport: dropped over-mtu packet: 1572 > 1500
Mar 20 23:17:24 hostA1 kernel: [874541.410499] mport: dropped over-mtu packet: 1572 > 1500
Mar 20 23:17:24 hostA1 kernel: [874541.410547] mport: dropped over-mtu packet: 1572 > 1500
Mar 20 23:17:24 hostA1 pvestatd[1912]: storage 'BACKUP_POC1' is not online
Mar 20 23:17:24 hostA1 corosync[1908]: [TOTEM ] FAILED TO RECEIVE
Mar 20 23:17:24 hostA1 corosync[1908]: [TOTEM ] A new membership (10.65.1.99:732) was formed. Members left: 3 2 1
Mar 20 23:17:24 hostA1 corosync[1908]: [TOTEM ] Failed to receive the leave message. failed: 3 2 1
Mar 20 23:17:24 hostA1 pmxcfs[1814]: [dcdb] notice: members: 4/1814
Mar 20 23:17:24 hostA1 pmxcfs[1814]: [status] notice: members: 4/1814
Mar 20 23:17:24 hostA1 corosync[1908]: [QUORUM] This node is within the non-primary component and will NOT provide any services.
Mar 20 23:17:24 hostA1 corosync[1908]: [QUORUM] Members[1]: 4
Mar 20 23:17:24 hostA1 corosync[1908]: [MAIN ] Completed service synchronization, ready to provide service.
Mar 20 23:17:24 hostA1 pmxcfs[1814]: [status] notice: node lost quorum
Mar 20 23:17:24 hostA1 pmxcfs[1814]: [dcdb] crit: received write while not quorate - trigger resync
Mar 20 23:17:24 hostA1 pmxcfs[1814]: [dcdb] crit: leaving CPG group
Mar 20 23:17:24 hostA1 pmxcfs[1814]: [dcdb] notice: start cluster connection
Mar 20 23:17:24 hostA1 pve-ha-lrm[2058]: lost lock 'ha_agent_hostA1_lock - cfs lock update failed - Device or resource busy
Mar 20 23:17:24 hostA1 pve-ha-crm[1952]: status change slave => wait_for_quorum
Mar 20 23:17:26 hostA1 pve-ha-lrm[2058]: status change active => lost_agent_lock
Mar 20 23:17:34 hostA1 pvestatd[1912]: storage 'BACKUP_POC1' is not online
Mar 20 23:17:44 hostA1 pvestatd[1912]: storage 'BACKUP_POC1' is not online
Mar 20 23:17:44 hostA1 corosync[1908]: [MAIN ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.
Mar 20 23:17:45 hostA1 corosync[1908]: [MAIN ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.
Mar 20 23:17:47 hostA1 corosync[1908]: [MAIN ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.
Mar 20 23:17:48 hostA1 corosync[1908]: [MAIN ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.
Mar 20 23:17:50 hostA1 corosync[1908]: [MAIN ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.
Mar 20 23:17:51 hostA1 corosync[1908]: [MAIN ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.
Mar 20 23:17:53 hostA1 corosync[1908]: [MAIN ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.
Mar 20 23:17:54 hostA1 pvestatd[1912]: storage 'BACKUP_POC1' is not online
Mar 20 23:17:54 hostA1 corosync[1908]: [MAIN ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.
Mar 20 23:17:56 hostA1 corosync[1908]: [MAIN ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.
Mar 20 23:17:57 hostA1 corosync[1908]: [MAIN ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.
Mar 20 23:17:59 hostA1 corosync[1908]: [MAIN ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.
Mar 20 23:18:00 hostA1 corosync[1908]: [MAIN ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.
Mar 20 23:18:01 hostA1 CRON[6908]: (root) CMD (/usr/local/rtm/bin/rtm 42 > /dev/null 2> /dev/null)
Mar 20 23:18:02 hostA1 corosync[1908]: [MAIN ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.
Mar 20 23:18:03 hostA1 corosync[1908]: [MAIN ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.
Mar 20 23:18:04 hostA1 pvestatd[1912]: storage 'BACKUP_POC1' is not online
Mar 20 23:18:05 hostA1 corosync[1908]: [MAIN ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.
Mar 20 23:18:06 hostA1 corosync[1908]: [MAIN ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.
Mar 20 23:18:08 hostA1 corosync[1908]: [MAIN ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.
Mar 20 23:18:09 hostA1 corosync[1908]: [MAIN ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.
Mar 20 23:18:11 hostA1 watchdog-mux[1694]: client watchdog expired - disable watchdog updates
Mar 20 23:18:11 hostA1 corosync[1908]: [MAIN ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.
Mar 20 23:18:12 hostA1 corosync[1908]: [MAIN ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.
Mar 20 23:18:14 hostA1 corosync[1908]: [MAIN ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.
Mar 20 23:18:14 hostA1 pvestatd[1912]: storage 'BACKUP_POC1' is not online
Mar 20 23:18:15 hostA1 corosync[1908]: [MAIN ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.
Mar 20 23:18:17 hostA1 corosync[1908]: [MAIN ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.
Mar 20 23:20:27 hostA1 rsyslogd: [origin software="rsyslogd" swVersion="8.4.2" x-pid="1807" x-info="http://www.rsyslog.com"] start
Node B1 :
Mar 20 23:17:24 hostB1 corosync[1965]: [TOTEM ] Retransmit List: 6 7 8 9
Mar 20 23:17:24 hostB1 corosync[1965]: [TOTEM ] Retransmit List: 6 7 8 9
Mar 20 23:17:24 hostB1 corosync[1965]: [TOTEM ] Retransmit List: 6 7 8 9
Mar 20 23:17:24 hostB1 corosync[1965]: [TOTEM ] Retransmit List: 6 7 8 9
Mar 20 23:17:24 hostB1 corosync[1965]: [TOTEM ] Retransmit List: 6 7 8 9
Mar 20 23:17:24 hostB1 corosync[1965]: [TOTEM ] Retransmit List: 6 7 8 9
Mar 20 23:17:24 hostB1 corosync[1965]: [TOTEM ] Retransmit List: 6 7 8 9
Mar 20 23:17:24 hostB1 corosync[1965]: [TOTEM ] Retransmit List: 6 7 8 9
Mar 20 23:17:24 hostB1 corosync[1965]: [TOTEM ] Retransmit List: 6 7 8 9
Mar 20 23:17:24 hostB1 corosync[1965]: [TOTEM ] Retransmit List: 6 7 8 9
Mar 20 23:17:24 hostB1 corosync[1965]: [TOTEM ] Retransmit List: 6 7 8 9
Mar 20 23:17:24 hostB1 corosync[1965]: [TOTEM ] Retransmit List: 6 7 8 9
Mar 20 23:17:24 hostB1 corosync[1965]: [TOTEM ] Retransmit List: 6 7 8 9
Mar 20 23:17:24 hostB1 corosync[1965]: [TOTEM ] Retransmit List: 6 7 8 9
Mar 20 23:17:24 hostB1 corosync[1965]: [TOTEM ] A new membership (10.65.1.1:736) was formed. Members joined: 3 left: 3 2 4
Mar 20 23:17:24 hostB1 corosync[1965]: [TOTEM ] Failed to receive the leave message. failed: 3 2 4
Mar 20 23:17:24 hostB1 corosync[1965]: [TOTEM ] Retransmit List: 1
Mar 20 23:17:24 hostB1 pmxcfs[1885]: [dcdb] notice: members: 1/1885
Mar 20 23:17:24 hostB1 pmxcfs[1885]: [dcdb] notice: members: 1/1885, 3/1902
Mar 20 23:17:24 hostB1 pmxcfs[1885]: [dcdb] notice: starting data syncronisation
Mar 20 23:17:24 hostB1 corosync[1965]: [QUORUM] This node is within the non-primary component and will NOT provide any services.
Mar 20 23:17:24 hostB1 corosync[1965]: [QUORUM] Members[2]: 3 1
Mar 20 23:17:24 hostB1 corosync[1965]: [MAIN ] Completed service synchronization, ready to provide service.
Mar 20 23:17:24 hostB1 pmxcfs[1885]: [dcdb] notice: cpg_send_message retried 1 times
Mar 20 23:17:24 hostB1 pmxcfs[1885]: [status] notice: members: 1/1885
Mar 20 23:17:24 hostB1 pmxcfs[1885]: [status] notice: node lost quorum
Mar 20 23:17:24 hostB1 pmxcfs[1885]: [status] notice: members: 1/1885, 3/1902
Mar 20 23:17:24 hostB1 pmxcfs[1885]: [status] notice: starting data syncronisation
Mar 20 23:17:24 hostB1 pmxcfs[1885]: [dcdb] notice: received sync request (epoch 1/1885/00000007)
Mar 20 23:17:24 hostB1 pmxcfs[1885]: [status] notice: received sync request (epoch 1/1885/00000007)
Mar 20 23:17:24 hostB1 pmxcfs[1885]: [dcdb] notice: received all states
Mar 20 23:17:24 hostB1 pmxcfs[1885]: [dcdb] notice: leader is 1/1885
Mar 20 23:17:24 hostB1 pmxcfs[1885]: [dcdb] notice: synced members: 1/1885, 3/1902
Mar 20 23:17:24 hostB1 pmxcfs[1885]: [dcdb] notice: start sending inode updates
Mar 20 23:17:24 hostB1 pmxcfs[1885]: [dcdb] notice: sent all (0) updates
Mar 20 23:17:24 hostB1 pmxcfs[1885]: [dcdb] notice: all data is up to date
Mar 20 23:17:24 hostB1 pmxcfs[1885]: [dcdb] notice: dfsm_deliver_queue: queue length 5
Mar 20 23:17:24 hostB1 pmxcfs[1885]: [dcdb] crit: received write while not quorate - trigger resync
Mar 20 23:17:24 hostB1 pmxcfs[1885]: [dcdb] crit: leaving CPG group
Mar 20 23:17:24 hostB1 pmxcfs[1885]: [status] notice: received all states
Mar 20 23:17:24 hostB1 pmxcfs[1885]: [status] notice: all data is up to date
Mar 20 23:17:24 hostB1 pmxcfs[1885]: [status] notice: dfsm_deliver_queue: queue length 27
Mar 20 23:17:24 hostB1 pmxcfs[1885]: [dcdb] notice: start cluster connection
Mar 20 23:17:24 hostB1 pve-ha-lrm[2022]: lost lock 'ha_agent_hostB1_lock - cfs lock update failed - Device or resource busy
Mar 20 23:17:24 hostB1 pve-ha-crm[2010]: status change slave => wait_for_quorum
Mar 20 23:17:24 hostB1 pmxcfs[1885]: [dcdb] notice: members: 1/1885
Mar 20 23:17:24 hostB1 pmxcfs[1885]: [dcdb] notice: all data is up to date
Mar 20 23:17:25 hostB1 pmxcfs[1885]: [dcdb] notice: members: 1/1885, 3/1902
Mar 20 23:17:25 hostB1 pmxcfs[1885]: [dcdb] notice: starting data syncronisation
Mar 20 23:17:25 hostB1 pmxcfs[1885]: [dcdb] notice: received sync request (epoch 1/1885/0000000A)
Mar 20 23:17:25 hostB1 pmxcfs[1885]: [dcdb] notice: received all states
Mar 20 23:17:25 hostB1 pmxcfs[1885]: [dcdb] notice: leader is 1/1885
Mar 20 23:17:25 hostB1 pmxcfs[1885]: [dcdb] notice: synced members: 1/1885, 3/1902
Mar 20 23:17:25 hostB1 pmxcfs[1885]: [dcdb] notice: start sending inode updates
Mar 20 23:17:25 hostB1 pmxcfs[1885]: [dcdb] notice: sent all (0) updates
Mar 20 23:17:25 hostB1 pmxcfs[1885]: [dcdb] notice: all data is up to date
Mar 20 23:17:27 hostB1 pve-ha-lrm[2022]: status change active => lost_agent_lock
Mar 20 23:18:01 hostB1 CRON[27196]: (root) CMD (/usr/local/rtm/bin/rtm 58 > /dev/null 2> /dev/null)
Mar 20 23:18:12 hostB1 watchdog-mux[1752]: client watchdog expired - disable watchdog updates
Mar 20 23:20:24 hostB1 rsyslogd: [origin software="rsyslogd" swVersion="8.4.2" x-pid="1778" x-info="http://www.rsyslog.com"] start
Please Help
Best regards