Replacing PVE node in cluster, new node loses quorum

mlanner

Member
Apr 1, 2009
184
1
18
Berkeley, CA
Hi,

We're having some problems replacing a PVE node in a cluster. These are the steps we've taken so far:
  1. Turn off pve03 (out of 3).
  2. From pve01, remove pve03 from the cluster with: pvecm delnode pve03
  3. Unrack the hardware.
  4. Mount a new server to become the new pve03.
  5. Perform clean install of PVE on the new server with the same IP and hostname as the the old node, pve03.
  6. Upgrade new pve03 node.
  7. From pve03, add it to cluster with pvecm add pve01 (or with IP address of pve01).
  8. Done; pve03 is now in the cluster.
The problem we're seeing is that pve03 keeps losing quorum with pve01 and pve02 and when using pvecm status, we see the following output:

Code:
root@pve03:~# pvecm s
Quorum information
------------------
Date:             Wed Mar 27 12:46:05 2019
Quorum provider:  corosync_votequorum
Nodes:            1
Node ID:          0x00000001
Ring ID:          1/843000
Quorate:          No

Votequorum information
----------------------
Expected votes:   3
Highest expected: 3
Total votes:      1
Quorum:           2 Activity blocked
Flags:         

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 192.168.20.13 (local)
Reading here in the forums, it is commonly mentioned that this is likely a unicast vs. multicast issue. Given that the cluster has been working flawlessly for years now, before the hardware replacement, I just don't see how this would be an issue with multicast, especially since we're reusing the exact same switch and switch ports.

The only thing I can explain this with at this point, is that somehow the re-use of the same IP and hostname is creating issues somewhere in the cluster configuration. As far as I can tell, the
/etc/pve/corosync.conf looks good. I've compared it to that of another cluster we have in a different data center and can't find any meaningful differences.

Does anyone have any ideas? Thanks in advance!
 
Last edited:

Andrei Bogatsky

New Member
Mar 28, 2019
2
0
1
36
Piggybacking on the above message...

I've tried installing using a different hostname (pve04) and IP (192.168.20.14) and the same thing is occurring. `omping` shows both unicast and multicast are working correctly however we're still losing quorum after several minutes. Some timeout I imagine. Restarting the corosync service brings it back for another ~5 minutes and then we lose it again. Rinse and repeat. Just as `mlanner` stated in the post above, there are no discernible differences between the "new" nodes configuration and the existing ones.
 

Stoiko Ivanov

Proxmox Staff Member
Staff member
May 2, 2018
1,444
133
63
Glad you found and resolved your issue! Please mark the thread as 'SOLVED' so that others know what to expect. Thanks!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE and Proxmox Mail Gateway. We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!