adding new node failed, now can't log into the web interface

kingl

New Member
Jul 13, 2015
11
0
1
I have a 2 nodes cluster, it was working fine.

I tried to add the third node to the cluster using "pvecm add 132.197.63.112" 132.197.63.112 is one of the 2 nodes cluster's ip.

it took a long time and kind of hang on "waiting for quorum..."

The authenticity of host '132.197.63.112 (132.197.63.112)' can't be established.
ECDSA key fingerprint is 6a:10:eb:87:dd:41:90:52:2f:f2:53:3d:9f:b2:a3:c2.
Are you sure you want to continue connecting (yes/no)? yes
root@132.197.63.112's password:
copy corosync auth key
stopping pve-cluster service
Stopping pve cluster filesystem: pve-cluster.
backup old database
Starting pve cluster filesystem : pve-cluster.
Starting cluster:
Checking if cluster has been disabled at boot... [ OK ]
Checking Network Manager... [ OK ]
Global setup... [ OK ]
Loading kernel modules... [ OK ]
Mounting configfs... [ OK ]
Starting cman... [ OK ]
Waiting for quorum... [ OK ]
Starting fenced... [ OK ]
Starting dlm_controld... [ OK ]
Tuning DLM kernel config... [ OK ]
Unfencing self... [ OK ]
waiting for quorum...

I stopped it.

now I can't log into the cluster web interface.

pveversion -v
proxmox-ve-2.6.32: 3.4-156 (running kernel: 2.6.32-39-pve)
pve-manager: 3.4-6 (running version: 3.4-6/102d4547)
pve-kernel-2.6.32-39-pve: 2.6.32-156
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.7-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.10-2
pve-cluster: 3.0-17
qemu-server: 3.4-6
pve-firmware: 1.1-4
libpve-common-perl: 3.0-24
libpve-access-control: 3.0-16
libpve-storage-perl: 3.0-33
pve-libspice-server1: 0.12.4-3
vncterm: 1.1-8
vzctl: 4.0-1pve6
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 2.2-10
ksm-control-daemon: 1.1-1
glusterfs-client: 3.5.2-1

the pervious working cluster has 2 nodes : atlassiantools2, atlassiantools3

the new node is called : atlassiantools1

the new cluster.confatlassiantools2:/etc/pve# cat /etc/pve/cluster.conf
<?xml version="1.0"?>
<cluster name="pmaatlassian01" config_version="4">
<cman keyfile="/var/lib/pve-cluster/corosync.authkey">
</cman>
<clusternodes>
<clusternode name="atlassiantools3" votes="1" nodeid="1"/>
<clusternode name="atlassiantools1" votes="1" nodeid="2"/></clusternodes>
</cluster>

the old working atlassiantools2:/etc/pve# cat /etc/pve/cluster.conf.old
<?xml version="1.0"?>
<cluster name="pmaatlassian01" config_version="4">
<cman keyfile="/var/lib/pve-cluster/corosync.authkey">
</cman>
<clusternodes>
<clusternode name="atlassiantools3" votes="1" nodeid="1"/>
</clusternodes>
</cluster>

questions :

1. how do I enable web interface log in?

2. how to remove the new node or read it?

thank you very much!
 
the log in realm has been the default "Linux PAM standard authentication"

on the first 2 nodes :

atlassiantools3:/etc/pve/nodes# ls -lrt
total 0
drwxr-x--- 2 root www-data 0 May 19 21:55 atlassiantools3
drwxr-x--- 2 root www-data 0 Jun 30 16:10 atlassiantools2
drwxr-x--- 2 root www-data 0 Jul 13 14:03 atlassiantools1

however there is no pve-ssl.key or pve-ssl.pem in the new node atlassiantools1 dir.
 
I don't know why adding new node(atlassiantools1) replaced the existing node(atlassiantools2)

root@atlassiantools3:/etc/pve# pvecm nodes
Node Sts Inc Joined Name
1 M 4 2015-06-30 16:07:17 atlassiantools3
2 M 16 2015-07-13 15:41:29 atlassiantools1


root@atlassiantools3:/etc/pve# clustat
Cluster Status for pmaatlassian01 @ Mon Jul 13 16:28:41 2015
Member Status: Quorate
Member Name ID Status
------ ---- ---- ------
atlassiantools3 1 Online, Local
atlassiantools1 2 Online


root@atlassiantools3:/etc/pve# pvecm status
Version: 6.2.0
Config Version: 4
Cluster Name: pmaatlassian01
Cluster Id: 6017
Cluster Member: Yes
Cluster Generation: 20
Membership state: Cluster-Member
Nodes: 2
Expected votes: 1
Total votes: 2
Node votes: 1
Quorum: 2
Active subsystems: 5
Flags:
Ports Bound: 0
Node name: atlassiantools3
Node ID: 1
Multicast addresses: 239.192.23.152
Node addresses: 132.197.63.112


now everything under /etc/pve is read-only

pvecm delnode atlassiantools1
cluster not ready - no quorum?

tried restart ove-cluster service and "pvecm expected 1"
 
This sounds awfully familiar. Check your hosts file, it must have the correct ip for your node and pvelocalhost (check a working node to see what I'm talking about). Also check your DNS. All similar problems I've encountered were caused by misconfigured DNS or hosts file.
 
thank you for your reply, i did check the /etc/hosts file

on node 1
132.197.63.110 atlassiantools1 pvlocalhost
132.197.63.111 atlassiantools2
132.197.63.112 atlassiantools3

on node 2
132.197.63.110 atlassiantools1
132.197.63.111 atlassiantools2 pvlocalhost
132.197.63.112 atlassiantools3

on node 2
132.197.63.110 atlassiantools1
132.197.63.111 atlassiantools2
132.197.63.112 atlassiantools3 pvlocalhost

the only thing i am not sure is the pvlocalhost, is pvlocalhost necessary?

the dns looks ok.

btw i can log into the proxmox hosts now but not cluster, i can't remove node even after restart pve-cluster restart and "pvecm expected 1"

can't edit the cluster.conf due the read only.
 
In that case I have no idea. Something's funky with your setup. You didn't copy the 3rd node, did you? Like cloning an existing PVE HN. I'm suspecting a colliding ID somewhere, seeing one of the old servers got replaced. I'd recommend a full reinstall of your 3rd node if you haven't started VMs on it yet.

BTW, I don't know if it has any significance, but 'pvlocalhost' should be 'pvelocalhost' in your hosts file.
 
i plan to reinstall the 3rd node, there is no vm on it yet.

however i need to restore the previous 2 nodes cluster. now i am stuck at not being able to remove node from the cluster.

the hosts file is pvelocalhost, it's a typo.

thank you for your help.
 
i plan to reinstall the 3rd node, there is no vm on it yet.

however i need to restore the previous 2 nodes cluster. now i am stuck at not being able to remove node from the cluster.

the hosts file is pvelocalhost, it's a typo.

thank you for your help.
Hi,
take an backup of /etc/pve (important) and try to rejoin the second node to the cluster to reach the quorum again?!


Udo
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!