New install - authentication key already exists

Paulo Maligaya

New Member
Jul 23, 2016
18
0
1
40
I have 2 newly installed proxmox hosts, and I'm trying to create a cluster. I have proxmox01 as my primary node and proxmox02 as my second node.

I was able to create the cluster on proxmox01.

root@proxmox01:~# pvecm create CADRE-PVE
Corosync Cluster Engine Authentication key generator.
Gathering 1024 bits for key from /dev/urandom.
Writing corosync key to /etc/corosync/authkey.

root@proxmox01:~# pvecm status
Quorum information
------------------
Date: Wed Sep 14 05:16:48 2016
Quorum provider: corosync_votequorum
Nodes: 1
Node ID: 0x00000001
Ring ID: 1/4
Quorate: Yes

Votequorum information
----------------------
Expected votes: 1
Highest expected: 1
Total votes: 1
Quorum: 1
Flags: Quorate

Membership information
----------------------
Nodeid Votes Name
0x00000001 1 192.168.0.1 (local)


Now, I tried to add the second node to the cluster:

root@proxmox02:~# pvecm add 192.168.0.1
copy corosync auth key
stopping pve-cluster service
backup old database
waiting for quorum...

and it is stuck like that for like 20mins and counting.. I don't know if it was the right move but I canceled the operation as I'm seeing this in the log keeps on piling up:

Sep 14 06:12:26 proxmox02 corosync[11541]: [MAIN ] Completed service synchronization, ready to provide service.
Sep 14 06:12:28 proxmox02 corosync[11541]: [TOTEM ] A new membership (192.168.0.2:2528) was formed. Members
Sep 14 06:12:28 proxmox02 corosync[11541]: [QUORUM] Members[1]: 2
Sep 14 06:12:28 proxmox02 corosync[11541]: [MAIN ] Completed service synchronization, ready to provide service.
Sep 14 06:12:29 proxmox02 corosync[11541]: [TOTEM ] A new membership (192.168.0.2:2532) was formed. Members
Sep 14 06:12:29 proxmox02 corosync[11541]: [QUORUM] Members[1]: 2

I restarted pve-cluster, pvestatd, pveproxy and that message from the log stop accumulating.

So I attempted to re-add the proxmox02 again:

root@proxmox02:~# pvecm add 192.168.0.1
authentication key already exists

I tried to append a "--force" option to make proxmox02 join the cluster.

root@proxmox02:~# pvecm add 192.168.0.1 --force
node proxmox02 already defined
copy corosync auth key
stopping pve-cluster service
backup old database
generating node certificates
merge known_hosts file
restart services
successfully added node 'proxmox02' to cluster.

It looks fine, but proxmox02 still failed to join the cluster:

root@proxmox01:~# pvecm status
Quorum information
------------------
Date: Wed Sep 14 06:35:03 2016
Quorum provider: corosync_votequorum
Nodes: 1
Node ID: 0x00000001
Ring ID: 1/3336
Quorate: Yes

Votequorum information
----------------------
Expected votes: 1
Highest expected: 1
Total votes: 1
Quorum: 1
Flags: Quorate

Membership information
----------------------
Nodeid Votes Name
0x00000001 1 192.168.0.1 (local)

It's even offline (red mark) in the proxmox01's web UI.

Is there any method I can re-join proxmox02 and clean-up proxmox01? There is no way for me to remove proxmox02 node since it's not even a member of the cluster.

Also an added information:

Before I was able to setup the cluster with no issue when the MTU settings in my bridge ingterface (weave) was set to 1410, now that I increased the number to 8900 (jumbo frame) I have this issue.

Also, I used 4.2.8-1-pve kernel instead of the latest one (4.4). For the reason, this is something to do with the weave bridge interface.

Anyhelp is appreciated.. TIA
 
Before I was able to setup the cluster with no issue when the MTU settings in my bridge ingterface (weave) was set to 1410, now that I increased the number to 8900 (jumbo frame) I have this issue.

Maybe a switch with wrong MTU configuration in between?
 
@dietmar Thank you for your prompt response! You mean, the network switch itself and not the interface in the hosts?

If so, I may need to coordinate this with our network engineering team. But for the meantime, I can probably use the default mtu size given by the (virtual) bridge.

But the real question is, is there anyway I can remove/rejoin proxmox02, and clean up any entries of proxmox02 in node1 without reinstalling the machines?
 
@dietmar Thanks for your inputs! I really appreciate it!

Anyway, I'm able to join proxmox02 in the cluster after we reinstalled the server.

It's quite odd, because the only changes I have for this setup are A.) I used older kernel (4.2.8-1-pve) B.) I've increased the MTU size to 8900 (from 1410) for the weave interface prior to cluster creation.

weave:0 Link encap:Ethernet HWaddr 0e:d9:49:5e:55:65
inet addr:192.168.0.2 Bcast:192.168.0.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1410 Metric:1

You'll noticed that MTU size is 1410 and I was able to get proxmox02 join the cluster without hanging/timeout.

Though, I'm not sure yet what's the cause of this issue aside from those two mentioned changes. I may need to re-adjust the MTU size again to much higher value since my issue with lower MTU size before was node2 is getting offline when I'm deploying a guest VM from node1.
 
@dietmar I think I know what causes the issue from this thread. it was the MTU size indeed. quite odd that setting the MTU to 8900 proxmox02 wasn't able to join the cluster. reverting it to the default MTU size (1410) make it work.

I guess my question now is, what is the ideal MTU size for Proxmox? 1500? 8900 (jumbo frame)?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!