have a question for “corosync[3173]: [CPG ] downlist left_list: 0 received”

Discussion in 'Proxmox VE (Deutsch)' started by Jason123, Jun 13, 2019.

  1. Jason123

    Jason123 New Member

    Joined:
    Sep 11, 2018
    Messages:
    14
    Likes Received:
    0
    Hi,guys

    Recently,I build a cluster of 6 nodes with ceph. Unfortunately, I find my cluster have a problem that was restarted on Jun 12.it make my all vms restarted on Jun 12. what is happened? How can I keep my cluster run stability?

    there is the log of Jun 12:
    Code:
    Jun 12 09:09:04 NODE02 corosync[3326]:  [TOTEM ] A new membership (10.71.177.201:1416) was formed. Members joined: 1 3 4 5 6
    Jun 12 09:09:04 NODE02 corosync[3326]: warning [CPG   ] downlist left_list: 0 received
    Jun 12 09:09:04 NODE02 corosync[3326]: warning [CPG   ] downlist left_list: 0 received
    Jun 12 09:09:04 NODE02 corosync[3326]:  [CPG   ] downlist left_list: 0 received
    Jun 12 09:09:04 NODE02 corosync[3326]: warning [CPG   ] downlist left_list: 0 received
    Jun 12 09:09:04 NODE02 corosync[3326]:  [CPG   ] downlist left_list: 0 received
    Jun 12 09:09:04 NODE02 corosync[3326]: warning [CPG   ] downlist left_list: 0 received
    Jun 12 09:09:04 NODE02 corosync[3326]: warning [CPG   ] downlist left_list: 0 received
    Jun 12 09:09:04 NODE02 corosync[3326]:  [CPG   ] downlist left_list: 0 received
    Jun 12 09:09:04 NODE02 corosync[3326]: warning [CPG   ] downlist left_list: 0 received
    Jun 12 09:09:04 NODE02 corosync[3326]:  [CPG   ] downlist left_list: 0 received
    Jun 12 09:09:04 NODE02 corosync[3326]:  [CPG   ] downlist left_list: 0 received
    Jun 12 09:09:04 NODE02 corosync[3326]:  [CPG   ] downlist left_list: 0 received
    Jun 12 09:09:04 NODE02 pmxcfs[3066]: [dcdb] notice: members: 2/3066, 5/3308
    Jun 12 09:09:04 NODE02 corosync[3326]: notice  [QUORUM] This node is within the primary component and will provide service.
    Jun 12 09:09:04 NODE02 corosync[3326]: notice  [QUORUM] Members[6]: 1 2 3 4 5 6
    Jun 12 09:09:04 NODE02 corosync[3326]: notice  [MAIN  ] Completed service synchronization, ready to provide service.
    Jun 12 09:09:04 NODE02 pmxcfs[3066]: [dcdb] notice: starting data syncronisation
    Jun 12 09:09:04 NODE02 corosync[3326]:  [QUORUM] This node is within the primary component and will provide service.
    Jun 12 09:09:04 NODE02 corosync[3326]:  [QUORUM] Members[6]: 1 2 3 4 5 6
    Jun 12 09:09:04 NODE02 corosync[3326]:  [MAIN  ] Completed service synchronization, ready to provide service.
    
     
  2. Alwin

    Alwin Proxmox Staff Member
    Staff Member

    Joined:
    Aug 1, 2017
    Messages:
    2,309
    Likes Received:
    206
    I suppose the previous log lines show that the cluster membership fell apart.

    Can you please post your network configuration (/etc/network/interfaces)?
    Do have HA activated?
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  3. Jason123

    Jason123 New Member

    Joined:
    Sep 11, 2018
    Messages:
    14
    Likes Received:
    0
    how can i check my HA activated or not? I mean what command i can use.

    here my network config, i use 5 eth(eth0 eth2 as my ceph path, eth1 eth3 having a bond "balance-alb" for vmbr1(vlan),eth4 for vmbr0)
    Code:
    auto bond0
    iface bond0 inet static
        address  110.110.110.6
        netmask  255.255.255.0
        bond-slaves eth0 eth2
        bond-miimon 100
        bond-mode broadcast
    
    auto bond1
    iface bond1 inet manual
        bond-slaves eth1 eth3
        bond-miimon 100
        bond-mode balance-alb
    
    auto vmbr0
    iface vmbr0 inet static
        address  192.168.1.82
        netmask  255.255.255.0
        gateway  192.168.1.254
        bridge-ports eth4
        bridge-stp off
        bridge-fd 0
    
    auto vmbr1
    iface vmbr1 inet manual
        bridge-ports bond1
        bridge-stp off
        bridge-fd 0
        bridge-vlan-aware yes
        bridge-vids 2-4094
     
  4. Alwin

    Alwin Proxmox Staff Member
    Staff Member

    Joined:
    Aug 1, 2017
    Messages:
    2,309
    Likes Received:
    206
    On which interface is corosync running (pvecm status)?

    You can check with the following command.
    Code:
    ha-manager status
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  5. Jason123

    Jason123 New Member

    Joined:
    Sep 11, 2018
    Messages:
    14
    Likes Received:
    0
    here is command return, I think there are healthy now?
    Code:
    root@NODE01:~# ha-manager status
    quorum OK
    master NODE03 (active, Fri Jun 14 16:56:39 2019)
    lrm NODE01 (active, Fri Jun 14 16:56:39 2019)
    lrm NODE02 (active, Fri Jun 14 16:56:39 2019)
    lrm NODE03 (active, Fri Jun 14 16:56:39 2019)
    lrm NODE04 (active, Fri Jun 14 16:56:48 2019)
    lrm NODE05 (active, Fri Jun 14 16:56:45 2019)
    lrm NODE06 (active, Fri Jun 14 16:56:39 2019)
    

    Code:
    root@NODE01:~# pvecm status
    Quorum information
    ------------------
    Date:             Fri Jun 14 16:58:50 2019
    Quorum provider:  corosync_votequorum
    Nodes:            6
    Node ID:          0x00000001
    Ring ID:          1/1616
    Quorate:          Yes
    
    Votequorum information
    ----------------------
    Expected votes:   6
    Highest expected: 6
    Total votes:      6
    Quorum:           4
    Flags:            Quorate
    
    Membership information
    ----------------------
        Nodeid      Votes Name
    0x00000001          1 192.168.1.82 (local)
    0x00000002          1 192.168.1.83
    0x00000003          1 192.168.1.84
    0x00000004          1 192.168.1.85
    0x00000005          1 192.168.1.86
    0x00000006          1 192.168.1.87
    
     
  6. Alwin

    Alwin Proxmox Staff Member
    Staff Member

    Joined:
    Aug 1, 2017
    Messages:
    2,309
    Likes Received:
    206
    I suppose your stack runs on one switch? If so, how is the load on that switch?

    And what more is to find in the log files?
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  7. Jason123

    Jason123 New Member

    Joined:
    Sep 11, 2018
    Messages:
    14
    Likes Received:
    0
    ok,I will check the log whether there are other problems. if it have, I will return in here.

    thank you
    Have a good weekend
     
  8. Jason123

    Jason123 New Member

    Joined:
    Sep 11, 2018
    Messages:
    14
    Likes Received:
    0
    Hi,Alwin

    I find some new messages in my cluster. maybe those messages can help you to find the reason why my node auto restarted.

    here is syslogs on NODE05:

    Code:
    Jun 14 11:51:53 NODE05 corosync[3614]: notice  [TOTEM ] Retransmit List: 2f3bb7 2f3bb8 2f3bb9 2f3bbb 2f3bbc 2f3bbd 2f3bbe 2f3bc0 2f3bc1
    Jun 14 11:51:53 NODE05 corosync[3614]:  [TOTEM ] Retransmit List: 2f3bb7 2f3bb8 2f3bb9 2f3bbb 2f3bbc 2f3bbd 2f3bbe 2f3bc0 2f3bc1
    Jun 14 11:51:53 NODE05 corosync[3614]: notice  [TOTEM ] Retransmit List: 2f3bb7 2f3bb8 2f3bb9 2f3bbb 2f3bbc 2f3bbd 2f3bbe 2f3bc0 2f3bc1
    Jun 14 11:51:53 NODE05 corosync[3614]:  [TOTEM ] Retransmit List: 2f3bb7 2f3bb8 2f3bb9 2f3bbb 2f3bbc 2f3bbd 2f3bbe 2f3bc0 2f3bc1
    Jun 14 11:51:53 NODE05 corosync[3614]: notice  [TOTEM ] Retransmit List: 2f3bb7 2f3bb8 2f3bb9 2f3bbb 2f3bbc 2f3bbd 2f3bbe 2f3bc0 2f3bc1
    Jun 14 11:51:53 NODE05 corosync[3614]:  [TOTEM ] Retransmit List: 2f3bb7 2f3bb8 2f3bb9 2f3bbb 2f3bbc 2f3bbd 2f3bbe 2f3bc0 2f3bc1
    Jun 14 11:51:53 NODE05 corosync[3614]: notice  [TOTEM ] Retransmit List: 2f3bb7 2f3bb8 2f3bb9 2f3bbb 2f3bbc 2f3bbd 2f3bbe 2f3bc0 2f3bc1
    Jun 14 11:51:53 NODE05 corosync[3614]:  [TOTEM ] Retransmit List: 2f3bb7 2f3bb8 2f3bb9 2f3bbb 2f3bbc 2f3bbd 2f3bbe 2f3bc0 2f3bc1
    Jun 14 11:51:53 NODE05 corosync[3614]: notice  [TOTEM ] Retransmit List: 2f3bb7 2f3bb8 2f3bb9 2f3bbb 2f3bbc 2f3bbd 2f3bbe 2f3bc0 2f3bc1
    Jun 14 11:51:53 NODE05 corosync[3614]:  [TOTEM ] Retransmit List: 2f3bb7 2f3bb8 2f3bb9 2f3bbb 2f3bbc 2f3bbd 2f3bbe 2f3bc0 2f3bc1
    Jun 14 11:51:53 NODE05 corosync[3614]: notice  [TOTEM ] Retransmit List: 2f3bb7 2f3bb8 2f3bb9 2f3bbb 2f3bbc 2f3bbd 2f3bbe 2f3bc0 2f3bc1
    Jun 14 11:51:53 NODE05 corosync[3614]:  [TOTEM ] Retransmit List: 2f3bb7 2f3bb8 2f3bb9 2f3bbb 2f3bbc 2f3bbd 2f3bbe 2f3bc0 2f3bc1
    Jun 14 11:51:53 NODE05 corosync[3614]: notice  [TOTEM ] Retransmit List: 2f3bb7 2f3bb8 2f3bb9 2f3bbb 2f3bbc 2f3bbd 2f3bbe 2f3bc0 2f3bc1
    Jun 14 11:51:53 NODE05 corosync[3614]:  [TOTEM ] Retransmit List: 2f3bb7 2f3bb8 2f3bb9 2f3bbb 2f3bbc 2f3bbd 2f3bbe 2f3bc0 2f3bc1
    Jun 14 11:51:53 NODE05 corosync[3614]: notice  [TOTEM ] Retransmit List: 2f3bb7 2f3bb8 2f3bb9 2f3bbb 2f3bbc 2f3bbd 2f3bbe 2f3bc0 2f3bc1
    Jun 14 11:51:53 NODE05 corosync[3614]:  [TOTEM ] Retransmit List: 2f3bb7 2f3bb8 2f3bb9 2f3bbb 2f3bbc 2f3bbd 2f3bbe 2f3bc0 2f3bc1
    Jun 14 11:51:53 NODE05 corosync[3614]: notice  [TOTEM ] Retransmit List: 2f3bb7 2f3bb8 2f3bb9 2f3bbb 2f3bbc 2f3bbd 2f3bbe 2f3bc0 2f3bc1
    Jun 14 11:51:53 NODE05 corosync[3614]:  [TOTEM ] Retransmit List: 2f3bb7 2f3bb8 2f3bb9 2f3bbb 2f3bbc 2f3bbd 2f3bbe 2f3bc0 2f3bc1
    Jun 14 11:51:53 NODE05 corosync[3614]: notice  [TOTEM ] Retransmit List: 2f3bb7 2f3bb8 2f3bb9 2f3bbb 2f3bbc 2f3bbd 2f3bbe 2f3bc0 2f3bc1
    Jun 14 11:51:53 NODE05 corosync[3614]:  [TOTEM ] Retransmit List: 2f3bb7 2f3bb8 2f3bb9 2f3bbb 2f3bbc 2f3bbd 2f3bbe 2f3bc0 2f3bc1
    Jun 14 11:51:53 NODE05 corosync[3614]: notice  [TOTEM ] Retransmit List: 2f3bb7 2f3bb8 2f3bb9 2f3bbb 2f3bbc 2f3bbd 2f3bbe 2f3bc0 2f3bc1
    Jun 14 11:51:53 NODE05 corosync[3614]:  [TOTEM ] Retransmit List: 2f3bb7 2f3bb8 2f3bb9 2f3bbb 2f3bbc 2f3bbd 2f3bbe 2f3bc0 2f3bc1
    Jun 14 11:51:53 NODE05 corosync[3614]: notice  [TOTEM ] Retransmit List: 2f3bb7 2f3bb8 2f3bb9 2f3bbb 2f3bbc 2f3bbd 2f3bbe 2f3bc0 2f3bc1
    Jun 14 11:51:53 NODE05 corosync[3614]:  [TOTEM ] Retransmit List: 2f3bb7 2f3bb8 2f3bb9 2f3bbb 2f3bbc 2f3bbd 2f3bbe 2f3bc0 2f3bc1
    Jun 14 11:51:53 NODE05 corosync[3614]: notice  [TOTEM ] Retransmit List: 2f3bb7 2f3bb8 2f3bb9 2f3bbb 2f3bbc 2f3bbd 2f3bbe 2f3bc0 2f3bc1
    Jun 14 11:51:53 NODE05 corosync[3614]:  [TOTEM ] Retransmit List: 2f3bb7 2f3bb8 2f3bb9 2f3bbb 2f3bbc 2f3bbd 2f3bbe 2f3bc0 2f3bc1
    Jun 14 11:51:53 NODE05 corosync[3614]: notice  [TOTEM ] Retransmit List: 2f3bb7 2f3bb8 2f3bb9 2f3bbb 2f3bbc 2f3bbd 2f3bbe 2f3bc0 2f3bc1
    Jun 14 11:51:53 NODE05 corosync[3614]:  [TOTEM ] Retransmit List: 2f3bb7 2f3bb8 2f3bb9 2f3bbb 2f3bbc 2f3bbd 2f3bbe 2f3bc0 2f3bc1
    Jun 14 11:51:53 NODE05 corosync[3614]: notice  [TOTEM ] Retransmit List: 2f3bb7 2f3bb8 2f3bb9 2f3bbb 2f3bbc 2f3bbd 2f3bbe 2f3bc0 2f3bc1
    Jun 14 11:51:53 NODE05 corosync[3614]:  [TOTEM ] Retransmit List: 2f3bb7 2f3bb8 2f3bb9 2f3bbb 2f3bbc 2f3bbd 2f3bbe 2f3bc0 2f3bc1
    Jun 14 11:51:53 NODE05 corosync[3614]: notice  [TOTEM ] Retransmit List: 2f3bb7 2f3bb8 2f3bb9 2f3bbb 2f3bbc 2f3bbd 2f3bbe 2f3bc0 2f3bc1
    Jun 14 11:51:53 NODE05 corosync[3614]:  [TOTEM ] Retransmit List: 2f3bb7 2f3bb8 2f3bb9 2f3bbb 2f3bbc 2f3bbd 2f3bbe 2f3bc0 2f3bc1
    Jun 14 11:51:53 NODE05 corosync[3614]: notice  [TOTEM ] Retransmit List: 2f3bb7 2f3bb8 2f3bb9 2f3bbb 2f3bbc 2f3bbd 2f3bbe 2f3bc0 2f3bc1
    Jun 14 11:51:53 NODE05 corosync[3614]:  [TOTEM ] Retransmit List: 2f3bb7 2f3bb8 2f3bb9 2f3bbb 2f3bbc 2f3bbd 2f3bbe 2f3bc0 2f3bc1
    Jun 14 11:51:53 NODE05 corosync[3614]: notice  [TOTEM ] Retransmit List: 2f3bb7 2f3bb8 2f3bb9 2f3bbb 2f3bbc 2f3bbd 2f3bbe 2f3bc0 2f3bc1
    Jun 14 11:51:53 NODE05 corosync[3614]:  [TOTEM ] Retransmit List: 2f3bb7 2f3bb8 2f3bb9 2f3bbb 2f3bbc 2f3bbd 2f3bbe 2f3bc0 2f3bc1
    Jun 14 11:51:53 NODE05 corosync[3614]: notice  [TOTEM ] Retransmit List: 2f3bb7 2f3bb8 2f3bb9 2f3bbb 2f3bbc 2f3bbd 2f3bbe 2f3bc0 2f3bc1
    Jun 14 11:51:53 NODE05 corosync[3614]:  [TOTEM ] Retransmit List: 2f3bb7 2f3bb8 2f3bb9 2f3bbb 2f3bbc 2f3bbd 2f3bbe 2f3bc0 2f3bc1
    Jun 14 11:51:53 NODE05 corosync[3614]: notice  [TOTEM ] Retransmit List: 2f3bb7 2f3bb8 2f3bb9 2f3bbb 2f3bbc 2f3bbd 2f3bbe 2f3bc0 2f3bc1
    Jun 14 11:51:53 NODE05 corosync[3614]:  [TOTEM ] Retransmit List: 2f3bb7 2f3bb8 2f3bb9 2f3bbb 2f3bbc 2f3bbd 2f3bbe 2f3bc0 2f3bc1
    Jun 14 11:51:53 NODE05 corosync[3614]: notice  [TOTEM ] Retransmit List: 2f3bb7 2f3bb8 2f3bb9 2f3bbb 2f3bbc 2f3bbd 2f3bbe 2f3bc0 2f3bc1
    Jun 14 11:51:53 NODE05 corosync[3614]:  [TOTEM ] Retransmit List: 2f3bb7 2f3bb8 2f3bb9 2f3bbb 2f3bbc 2f3bbd 2f3bbe 2f3bc0 2f3bc1
    Jun 14 11:51:57 NODE05 corosync[3614]: notice  [TOTEM ] A processor failed, forming new configuration.
    Jun 14 11:51:57 NODE05 corosync[3614]:  [TOTEM ] A processor failed, forming new configuration.
    Jun 14 11:51:57 NODE05 corosync[3614]: notice  [TOTEM ] A new membership (192.168.1.82:1500) was formed. Members left: 4
    Jun 14 11:51:57 NODE05 corosync[3614]: notice  [TOTEM ] Failed to receive the leave message. failed: 4
    Jun 14 11:51:57 NODE05 corosync[3614]:  [TOTEM ] A new membership (192.168.1.82:1500) was formed. Members left: 4
    Jun 14 11:51:57 NODE05 corosync[3614]:  [TOTEM ] Failed to receive the leave message. failed: 4
    Jun 14 11:51:57 NODE05 corosync[3614]: warning [CPG   ] downlist left_list: 1 received
    Jun 14 11:51:57 NODE05 corosync[3614]:  [CPG   ] downlist left_list: 1 received
    Jun 14 11:51:57 NODE05 corosync[3614]:  [CPG   ] downlist left_list: 1 received
    Jun 14 11:51:57 NODE05 corosync[3614]: warning [CPG   ] downlist left_list: 1 received
    Jun 14 11:51:57 NODE05 corosync[3614]: warning [CPG   ] downlist left_list: 1 received
    Jun 14 11:51:57 NODE05 corosync[3614]: warning [CPG   ] downlist left_list: 1 received
    Jun 14 11:51:57 NODE05 corosync[3614]: warning [CPG   ] downlist left_list: 1 received
    Jun 14 11:51:57 NODE05 corosync[3614]:  [CPG   ] downlist left_list: 1 received
    Jun 14 11:51:57 NODE05 corosync[3614]:  [CPG   ] downlist left_list: 1 received
    Jun 14 11:51:57 NODE05 corosync[3614]:  [CPG   ] downlist left_list: 1 received
    Jun 14 11:51:57 NODE05 pmxcfs[3362]: [dcdb] notice: members: 1/3299, 2/2928, 3/3415, 5/3362, 6/3093
    Jun 14 11:51:57 NODE05 pmxcfs[3362]: [dcdb] notice: starting data syncronisation
    Jun 14 11:51:57 NODE05 corosync[3614]: notice  [QUORUM] Members[5]: 1 2 3 5 6
    Jun 14 11:51:57 NODE05 corosync[3614]: notice  [MAIN  ] Completed service synchronization, ready to provide service.
    Jun 14 11:51:57 NODE05 pmxcfs[3362]: [status] notice: members: 1/3299, 2/2928, 3/3415, 5/3362, 6/3093
    Jun 14 11:51:57 NODE05 pmxcfs[3362]: [status] notice: starting data syncronisation
    Jun 14 11:51:57 NODE05 corosync[3614]:  [QUORUM] Members[5]: 1 2 3 5 6
    Jun 14 11:51:57 NODE05 corosync[3614]:  [MAIN  ] Completed service synchronization, ready to provide service.
    Jun 14 11:51:57 NODE05 corosync[3614]: notice  [TOTEM ] A new membership (192.168.1.82:1504) was formed. Members joined: 4
    Jun 14 11:51:57 NODE05 corosync[3614]:  [TOTEM ] A new membership (192.168.1.82:1504) was formed. Members joined: 4
    Jun 14 11:51:57 NODE05 corosync[3614]: warning [CPG   ] downlist left_list: 0 received
    Jun 14 11:51:57 NODE05 corosync[3614]:  [CPG   ] downlist left_list: 0 received
    Jun 14 11:51:57 NODE05 corosync[3614]:  [CPG   ] downlist left_list: 0 received
    Jun 14 11:51:57 NODE05 corosync[3614]: warning [CPG   ] downlist left_list: 0 received
    Jun 14 11:51:57 NODE05 corosync[3614]: warning [CPG   ] downlist left_list: 0 received
    Jun 14 11:51:57 NODE05 corosync[3614]: warning [CPG   ] downlist left_list: 0 received
    Jun 14 11:51:57 NODE05 corosync[3614]:  [CPG   ] downlist left_list: 0 received
    Jun 14 11:51:57 NODE05 corosync[3614]:  [CPG   ] downlist left_list: 0 received
    Jun 14 11:51:57 NODE05 corosync[3614]: warning [CPG   ] downlist left_list: 0 received
    Jun 14 11:51:57 NODE05 corosync[3614]: warning [CPG   ] downlist left_list: 0 received
    Jun 14 11:51:57 NODE05 corosync[3614]:  [CPG   ] downlist left_list: 0 received
    Jun 14 11:51:57 NODE05 corosync[3614]:  [CPG   ] downlist left_list: 0 received
    Jun 14 11:51:57 NODE05 pmxcfs[3362]: [dcdb] notice: members: 1/3299, 2/2928, 3/3415, 4/3094, 5/3362, 6/3093
    Jun 14 11:51:57 NODE05 pmxcfs[3362]: [dcdb] notice: queue not emtpy - resening 6 messages
    Jun 14 11:51:57 NODE05 pmxcfs[3362]: [status] notice: members: 1/3299, 2/2928, 3/3415, 4/3094, 5/3362, 6/3093
    Jun 14 11:51:57 NODE05 pmxcfs[3362]: [status] notice: queue not emtpy - resening 75 messages
    Jun 14 11:51:57 NODE05 corosync[3614]: notice  [QUORUM] Members[6]: 1 2 3 4 5 6
    Jun 14 11:51:57 NODE05 corosync[3614]: notice  [MAIN  ] Completed service synchronization, ready to provide service.
    Jun 14 11:51:57 NODE05 corosync[3614]:  [QUORUM] Members[6]: 1 2 3 4 5 6
    Jun 14 11:51:57 NODE05 corosync[3614]:  [MAIN  ] Completed service synchronization, ready to provide service.
    Jun 14 11:51:57 NODE05 pmxcfs[3362]: [dcdb] notice: cpg_send_message retried 2 times
    Jun 14 11:51:57 NODE05 pmxcfs[3362]: [status] notice: cpg_send_message retried 1 times
    Jun 14 11:51:57 NODE05 pmxcfs[3362]: [status] notice: received sync request (epoch 1/3299/00000019)
    Jun 14 11:51:57 NODE05 pmxcfs[3362]: [dcdb] notice: received sync request (epoch 1/3299/00000021)
    Jun 14 11:51:57 NODE05 pmxcfs[3362]: [dcdb] notice: received sync request (epoch 1/3299/00000022)
    Jun 14 11:51:57 NODE05 pmxcfs[3362]: [dcdb] crit: ignore sync request from wrong member 4/3094
    Jun 14 11:51:57 NODE05 pmxcfs[3362]: [dcdb] notice: received sync request (epoch 4/3094/00000021)
    Jun 14 11:51:57 NODE05 pmxcfs[3362]: [status] crit: ignore sync request from wrong member 4/3094
    Jun 14 11:51:57 NODE05 pmxcfs[3362]: [status] notice: received sync request (epoch 4/3094/0000001B)
    Jun 14 11:51:57 NODE05 pmxcfs[3362]: [status] notice: received sync request (epoch 1/3299/0000001A)
    Jun 14 11:51:57 NODE05 pmxcfs[3362]: [dcdb] notice: received all states
    Jun 14 11:51:57 NODE05 pmxcfs[3362]: [dcdb] notice: leader is 1/3299
    Jun 14 11:51:57 NODE05 pmxcfs[3362]: [dcdb] notice: synced members: 1/3299, 2/2928, 3/3415, 4/3094, 5/3362, 6/3093
    Jun 14 11:51:57 NODE05 pmxcfs[3362]: [dcdb] notice: all data is up to date
    Jun 14 11:51:57 NODE05 pmxcfs[3362]: [dcdb] notice: dfsm_deliver_queue: queue length 13
    Jun 14 11:51:57 NODE05 pmxcfs[3362]: [status] notice: received all states
    Jun 14 11:51:57 NODE05 pmxcfs[3362]: [status] notice: all data is up to date
    Jun 14 11:51:57 NODE05 pmxcfs[3362]: [status] notice: dfsm_deliver_queue: queue length 75
    Jun 14 is a recent time of " auto restart ",which is happened on 12:34.

    Code:
    Jun 14 12:33:05 NODE05 kernel: [184531.181097] igb 0000:18:00.3 eth5: igb: eth5 NIC Link is Down
    Jun 14 12:33:05 NODE05 kernel: [184531.181207] vmbr0: port 1(eth5) entered disabled state
    Jun 14 12:33:07 NODE05 pvestatd[3962]: storage 'backupdata' is not online
    Jun 14 12:33:08 NODE05 corosync[3614]: notice  [TOTEM ] A processor failed, forming new configuration.
    Jun 14 12:33:08 NODE05 corosync[3614]:  [TOTEM ] A processor failed, forming new configuration.
    Jun 14 12:33:08 NODE05 kernel: [184534.081451] igb 0000:18:00.3 eth5: igb: eth5 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
    Jun 14 12:33:08 NODE05 kernel: [184534.081597] vmbr0: port 1(eth5) entered blocking state
    Jun 14 12:33:08 NODE05 kernel: [184534.081601] vmbr0: port 1(eth5) entered forwarding state
    Jun 14 12:33:12 NODE05 corosync[3614]: notice  [TOTEM ] A new membership (192.168.1.86:1508) was formed. Members left: 1 2 3 4 6
    Jun 14 12:33:12 NODE05 corosync[3614]: notice  [TOTEM ] Failed to receive the leave message. failed: 1 2 3 4 6
    Jun 14 12:33:12 NODE05 corosync[3614]: warning [CPG   ] downlist left_list: 5 received
    Jun 14 12:33:12 NODE05 corosync[3614]: notice  [QUORUM] This node is within the non-primary component and will NOT provide any services.
    Jun 14 12:33:12 NODE05 corosync[3614]: notice  [QUORUM] Members[1]: 5
    Jun 14 12:33:12 NODE05 corosync[3614]: notice  [MAIN  ] Completed service synchronization, ready to provide service.
    Jun 14 12:33:12 NODE05 corosync[3614]:  [TOTEM ] A new membership (192.168.1.86:1508) was formed. Members left: 1 2 3 4 6
    Jun 14 12:33:12 NODE05 corosync[3614]:  [TOTEM ] Failed to receive the leave message. failed: 1 2 3 4 6
    Jun 14 12:33:12 NODE05 corosync[3614]:  [CPG   ] downlist left_list: 5 received
    Jun 14 12:33:12 NODE05 pmxcfs[3362]: [dcdb] notice: members: 5/3362
    Jun 14 12:33:12 NODE05 corosync[3614]:  [QUORUM] This node is within the non-primary component and will NOT provide any services.
    Jun 14 12:33:12 NODE05 corosync[3614]:  [QUORUM] Members[1]: 5
    Jun 14 12:33:12 NODE05 pmxcfs[3362]: [status] notice: node lost quorum
    Jun 14 12:33:12 NODE05 corosync[3614]:  [MAIN  ] Completed service synchronization, ready to provide service.
    Jun 14 12:33:12 NODE05 pmxcfs[3362]: [status] notice: members: 5/3362
    Jun 14 12:33:12 NODE05 pmxcfs[3362]: [dcdb] crit: received write while not quorate - trigger resync
    Jun 14 12:33:12 NODE05 pmxcfs[3362]: [dcdb] crit: leaving CPG group
    Jun 14 12:33:12 NODE05 pmxcfs[3362]: [dcdb] notice: start cluster connection
    Jun 14 12:33:12 NODE05 pmxcfs[3362]: [dcdb] notice: members: 5/3362
    Jun 14 12:33:12 NODE05 pmxcfs[3362]: [dcdb] notice: all data is up to date
    Jun 14 12:33:12 NODE05 pve-ha-lrm[4189]: lost lock 'ha_agent_NODE05_lock - cfs lock update failed - Device or resource busy
    Jun 14 12:33:12 NODE05 pve-ha-crm[4121]: status change slave => wait_for_quorum
    Jun 14 12:33:14 NODE05 pve-ha-lrm[4189]: status change active => lost_agent_lock
    Jun 14 12:33:17 NODE05 pvestatd[3962]: storage 'backupdata' is not online
    Jun 14 12:33:27 NODE05 pvestatd[3962]: storage 'backupdata' is not online
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    Jun 14 12:37:23 NODE05 systemd-modules-load[787]: Inserted module 'iscsi_tcp'
    Jun 14 12:37:23 NODE05 systemd-modules-load[787]: Inserted module 'ib_iser'
    Jun 14 12:37:23 NODE05 systemd-udevd[820]: Network interface NamePolicy= disabled on kernel command line, ignoring.
    Jun 14 12:37:23 NODE05 keyboard-setup.sh[780]: cannot open file /tmp/tmpkbd.TNVtfE
    Jun 14 12:37:23 NODE05 systemd-modules-load[787]: Inserted module 'vhost_net'
    Jun 14 12:37:23 NODE05 systemd[1]: Starting Flush Journal to Persistent Storage...
    Jun 14 12:37:23 NODE05 systemd[1]: Started Flush Journal to Persistent Storage.
    Jun 14 12:37:23 NODE05 systemd[1]: Started udev Coldplug all Devices.
    Jun 14 12:37:23 NODE05 systemd[1]: Starting udev Wait for Complete Device Initialization...
    Jun 14 12:37:23 NODE05 systemd[1]: Listening on Load/Save RF Kill Switch Status /dev/rfkill Watch.
    Jun 14 12:37:23 NODE05 systemd[1]: Found device /dev/mapper/pve-swap.
    Jun 14 12:37:23 NODE05 systemd[1]: Activating swap /dev/mapper/pve-swap...
    Jun 14 12:37:23 NODE05 systemd[1]: Found device /dev/mapper/pve-data.
    Jun 14 12:37:23 NODE05 systemd[1]: Activated swap /dev/mapper/pve-swap.
    Jun 14 12:37:23 NODE05 systemd[1]: Reached target Swap.
    Jun 14 12:37:23 NODE05 systemd[1]: Started Set the console keyboard layout.
    Jun 14 12:37:23 NODE05 systemd[1]: Found device 1 1.
    Jun 14 12:37:23 NODE05 systemd[1]: Created slice system-ceph\x2ddisk.slice.
    Jun 14 12:37:23 NODE05 systemd[1]: Started udev Wait for Complete Device Initialization.
    Jun 14 12:37:23 NODE05 systemd[1]: Starting Activation of LVM2 logical volumes...
    Jun 14 12:37:23 NODE05 lvm[1660]:   3 logical volume(s) in volume group "pve" now active
    Jun 14 12:37:23 NODE05 systemd[1]: Started Activation of LVM2 logical volumes.
    Jun 14 12:37:23 NODE05 systemd[1]: Reached target Encrypted Volumes.
    Jun 14 12:37:23 NODE05 systemd[1]: Reached target ZFS pool import target.
    Jun 14 12:37:23 NODE05 systemd[1]: Starting Mount ZFS filesystems...
    Jun 14 12:37:23 NODE05 systemd[1]: Starting Activation of LVM2 logical volumes...
    Jun 14 12:37:23 NODE05 lvm[1672]:   3 logical volume(s) in volume group "pve" now active
    Jun 14 12:37:23 NODE05 systemd[1]: Started Activation of LVM2 logical volumes.
     
  9. Alwin

    Alwin Proxmox Staff Member
    Staff Member

    Joined:
    Aug 1, 2017
    Messages:
    2,309
    Likes Received:
    206
    It seems your cluster network is not stable. Please run the omping command (see the link below) for at least 5 min to see if multicast is working correctly.
    https://pve.proxmox.com/pve-docs/chapter-pvecm.html#_cluster_network

    Also the eth5 is not seen in the network config you posted.
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice