1 node is out of sync from cluster

PiotrDev

Member
Sep 15, 2019
10
0
21
39
here are some details from logs: https://pastebin.com/dhUUKSGm
basically one node got off from cluster, in logs I see problem with file replica file (but that file is on other working nodes, failing node has empty directory /etc/pve/priv - and directory don't even have "w" flag for write)
can you suggest where I should look for..?

pvecm status of "broken" node:

Code:
root@anna4 /etc/pve # pvecm status
ipcc_send_rec[1] failed: Connection refused
ipcc_send_rec[2] failed: Connection refused
ipcc_send_rec[3] failed: Connection refused
Unable to load access control list: Connection refused

and one of 2 working nodes:
Code:
root@anna3 /etc/pve # pvecm status
Quorum information
------------------
Date:             Tue Feb 18 18:16:22 2020
Quorum provider:  corosync_votequorum
Nodes:            2
Node ID:          0x00000001
Ring ID:          1.41668
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   3
Highest expected: 3
Total votes:      2
Quorum:           2 
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 10.10.1.3 (local)
0x00000003          1 10.10.1.5
 
Last edited:
after restarting pvestatd on disconnected node, pvecm status/nodes started showing something, but it seems to think other machines are unreacheable

Code:
root@anna4 /etc/pve # pvecm nodes

Membership information
----------------------
    Nodeid      Votes Name
         2          1 anna4 (local)
        
root@anna4 /etc/pve # pvecm status
Quorum information
------------------
Date:             Tue Feb 18 18:29:26 2020
Quorum provider:  corosync_votequorum
Nodes:            1
Node ID:          0x00000002
Ring ID:          1/267956
Quorate:          No

Votequorum information
----------------------
Expected votes:   3
Highest expected: 3
Total votes:      1
Quorum:           2 Activity blocked
Flags:           

Membership information
----------------------
    Nodeid      Votes Name
0x00000002          1 10.10.1.4 (local)
 
How is your network configured? Do you have a dedicated physical NIC for corosync?