cluster split brain!

gerami

Member
Dec 27, 2020
3
0
6
34
Hi,

I had a 5 node proxmox cluster working just fine. Then we replaced one of the nodes with new hardware but kept the same IP. According to the documentation we had to delete the node from the cluster, reinstall proxmox and add it back to the cluster. (so tricky).

I tried to delete the node but because the node was offline it wasn't easy! I deleted to node dir from the /etc/pve/nodes/ but all the nodes disappeared so I moved back the dir!

Then I realized Target node is empty! It happened after deleting the node. So I executed pvecm updatecerts -f to fix this but didn't help.

Then I tried to add new node (with new installed proxmox) to the cluster. I logged in to one of the nodes in the cluster and realized join node is not enabled. Then I executed pvecm delnode my-node to delete it. Then join node enabled and I used it to add the new node to the cluster. But then I realized the node I used to add the new node to the cluster is not in the cluster anymore!!!
I investigated more and that's my current state of the cluster:

Code:
root@h01c06:~# pvecm status
Cluster information
-------------------
Name:             DevZone
Config Version:   9
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Sun Dec 27 15:14:30 2020
Quorum provider:  corosync_votequorum
Nodes:            4
Node ID:          0x00000001
Ring ID:          1.5b0f
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   5
Highest expected: 5
Total votes:      4
Quorum:           3 
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 192.168.200.7 (local)
0x00000003          1 192.168.200.3
0x00000004          1 192.168.200.12
0x00000006          1 192.168.200.30

Code:
root@proxmox03:~# pvecm status
Cluster information
-------------------
Name:             DevZone
Config Version:   9
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Sun Dec 27 15:09:21 2020
Quorum provider:  corosync_votequorum
Nodes:            3
Node ID:          0x00000003
Ring ID:          3.603f
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   5
Highest expected: 5
Total votes:      3
Quorum:           3 
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000003          1 192.168.200.3 (local)
0x00000004          1 192.168.200.12
0x00000006          1 192.168.200.30

Code:
root@proxmox02:~# pvecm status
Cluster information
-------------------
Name:             DevZone
Config Version:   9
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Sun Dec 27 15:09:46 2020
Quorum provider:  corosync_votequorum
Nodes:            3
Node ID:          0x00000004
Ring ID:          3.605f
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   5
Highest expected: 5
Total votes:      3
Quorum:           3 
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000003          1 192.168.200.3
0x00000004          1 192.168.200.12 (local)
0x00000006          1 192.168.200.30

Code:
root@proxmox05:~# pvecm status
Cluster information
-------------------
Name:             DevZone
Config Version:   9
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Sun Dec 27 15:15:08 2020
Quorum provider:  corosync_votequorum
Nodes:            3
Node ID:          0x00000006
Ring ID:          3.6213
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   5
Highest expected: 5
Total votes:      3
Quorum:           3 
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000003          1 192.168.200.3
0x00000004          1 192.168.200.12
0x00000006          1 192.168.200.30 (local)

Code:
root@proxmox04:~# pvecm status
Cluster information
-------------------
Name:             DevZone
Config Version:   9
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Sun Dec 27 15:17:39 2020
Quorum provider:  corosync_votequorum
Nodes:            1
Node ID:          0x00000002
Ring ID:          2.5b0f
Quorate:          No

Votequorum information
----------------------
Expected votes:   5
Highest expected: 5
Total votes:      1
Quorum:           3 Activity blocked
Flags:           

Membership information
----------------------
    Nodeid      Votes Name
0x00000002          1 192.168.200.8 (local)


Now the new node is not in any cluster. The node I used to add that node to the cluster considers itself to be in the cluster but the other 3 node don't consider it to be in the cluster!!!

Any suggestion?
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!