delete and readd a node

bread-baker

Member
Mar 6, 2010
432
0
16
I had 2 nodes, fbc158 and fbc10 . fbc158 was added first.

I was having trouble with fbc10 , so i deleted it and rebooted that node. I figured I could try to add again..

Code:
ssh fbc158
pvecm delnode fbc10



when I try to re-add the node, from the node I want to readd
Code:
ssh fbc10

pvecm add 10.100.100.158
authentication key already exists


on the one remaining node , fbc10 still shows up:
Code:
ssh fbc158

pvecm n
Node  Sts   Inc   Joined               Name
   1   M     16   2011-10-02 10:21:07  fbc158
   2   X     24                        fbc10

root@homenet-fbc158 ~ # pvecm status
Version: 6.2.0
Config Version: 3
Cluster Name: fbcandover
Cluster Id: 37452
Cluster Member: Yes
Cluster Generation: 28
Membership state: Cluster-Member
Nodes: 1
Expected votes: 1
Total votes: 1
Node votes: 1
Quorum: 2 Activity blocked
Active subsystems: 6
Flags: 
Ports Bound: 0 11  
Node name: fbc158
Node ID: 1
Multicast addresses: 239.192.146.222 
Node addresses: 10.100.100.158


could someone tell me which key to delete? or what to do to re add fbc10 ?

ps I tried to follow http://pve.proxmox.com/wiki/Proxmox_VE_2.0_Cluster , but probably my issues are due to operator error.
 
Last edited:
to devs -
the tests I am doing are on non production systems.

probably most of my issues with cluster are due to operator error . but maybe not.

in any case if it helps proxmox, I can give you ssh access to the systems so that you can do tests or test fixes.
 
also perms are different for /etc/pve/priv on the 2 systems:

Code:
root@homenet-fbc158 /etc/pve # ll
total 4
-rw-r----- 1 root www-data  451 Sep 30 17:50 authkey.pub
-rw-r----- 1 root www-data  237 Oct  2 10:44 cluster.conf
-rw-r----- 1 root www-data  285 Oct  2 10:44 cluster.conf.old
-rw-r----- 1 root www-data   16 Sep 30 17:55 datacenter.cfg
lrwxr-x--- 1 root www-data    0 Dec 31  1969 local -> nodes/fbc158
drwxr-x--- 2 root www-data    0 Sep 30 17:50 nodes
lrwxr-x--- 1 root www-data    0 Dec 31  1969 openvz -> nodes/fbc158/openvz
drwx------ 2 root www-data    0 Sep 30 17:50 priv
-rw-r----- 1 root www-data 1533 Sep 30 17:50 pve-root-ca.pem
-rw-r----- 1 root www-data 1679 Sep 30 17:50 pve-www.key
lrwxr-x--- 1 root www-data    0 Dec 31  1969 qemu-server -> nodes/fbc158/qemu-server
-rw-r----- 1 root www-data  160 Oct  2 14:01 storage.cfg
root@homenet-fbc158 /etc/pve # ll priv
total 3
-rw------- 1 root www-data 1679 Sep 30 17:50 authkey.key
-rw------- 1 root www-data 1190 Oct  2 13:50 authorized_keys
-rw------- 1 root www-data 1768 Oct  2 10:01 known_hosts
drwx------ 2 root www-data    0 Sep 30 17:55 lock
-rw------- 1 root www-data 1679 Sep 30 17:50 pve-root-ca.key
-rw------- 1 root www-data    3 Oct  2 10:01 pve-root-ca.srl

Code:
root@homenet-fbc10 /etc/pve # ll
total 4
-r--r----- 1 root www-data  451 Sep 30 17:50 authkey.pub
-r--r----- 1 root www-data  237 Oct  2 10:44 cluster.conf
-r--r----- 1 root www-data  285 Oct  2 10:44 cluster.conf.old
-r--r----- 1 root www-data   16 Sep 30 17:55 datacenter.cfg
lr-xr-x--- 1 root www-data    0 Dec 31  1969 local -> nodes/fbc10
dr-xr-x--- 2 root www-data    0 Sep 30 17:50 nodes
lr-xr-x--- 1 root www-data    0 Dec 31  1969 openvz -> nodes/fbc10/openvz
dr-x------ 2 root www-data    0 Sep 30 17:50 priv
-r--r----- 1 root www-data 1533 Sep 30 17:50 pve-root-ca.pem
-r--r----- 1 root www-data 1679 Sep 30 17:50 pve-www.key
lr-xr-x--- 1 root www-data    0 Dec 31  1969 qemu-server -> nodes/fbc10/qemu-server
-r--r----- 1 root www-data  104 Oct  1 22:54 storage.cfg
root@homenet-fbc10 /etc/pve # ll priv
total 3
-r-------- 1 root www-data 1679 Sep 30 17:50 authkey.key
-r-------- 1 root www-data 1583 Oct  2 10:18 authorized_keys
-r-------- 1 root www-data 1768 Oct  2 10:01 known_hosts
dr-x------ 2 root www-data    0 Sep 30 17:55 lock
-r-------- 1 root www-data 1679 Sep 30 17:50 pve-root-ca.key
-r-------- 1 root www-data    3 Oct  2 10:01 pve-root-ca.srl


I'm now able to add storage on fbc158 ..

but I did a few different thinngs, so am not certain what fixed it. editing /etc/pve/priv/authorized_keys and removing fbc10's pub may have fixed it.
 
You need quorum to do anything useful on a cluster. I guess we need to document all that cluster basics first.
 
should I just reinstall proxmox on the deleted cluster node? or is there another way to reinitialize it ?

Yes, that is the rocommended way.

One the other node set expected votes to 1 (to gain quorum):

# pvecm expected 1

then remove the stale node:

# pvecm delnode <oldnode>

Does that work for you?
 
Yes, that is the rocommended way.

One the other node set expected votes to 1 (to gain quorum):

# pvecm expected 1

then remove the stale node:

# pvecm delnode <oldnode>

Does that work for you?


Code:
root@homenet-fbc158 /fbc/etc/ssh # pvecm expected 1
root@homenet-fbc158 /fbc/etc/ssh # pvecm delnode  fbc10
node fbc10 does not exist in /etc/pve/cluster.conf
root@homenet-fbc158 /fbc/etc/ssh # cat /etc/pve/cluster.conf
<?xml version="1.0"?>
<cluster name="fbcandover" config_version="3">

  <cman keyfile="/var/lib/pve-cluster/corosync.authkey">
  </cman>

  <clusternodes>
  <clusternode name="fbc158" votes="1" nodeid="1"/>
  </clusternodes>

</cluster>


now is the above supposed to allow the old node to be added back?
 
now is the above supposed to allow the old node to be added back?

We currently have many checks to prevent that. It's a dangerous action in production systems.

You wrote '... so i deleted it and rebooted that node.', but it is unclear for my what you have deleted?
 
Hello, is something new on re-adding node. I get error "authentication key already exists"
How can I manage with ssh-keys for cluster?

Thanks,
Korosec
 
if you really know what you are doing, you can use the force flag, see "man pvecm"
 
I also try but then I get this:

# pvecm add XXX.XXX.XXX.XXX -force
The authenticity of host 'XXX.XXX.XXX.XXX (XXX.XXX.XXX.XXX)' can't be established.
ECDSA key fingerprint is bf:8e:5a:dd:0e:ae:9e:26:5d:10:ef:d6:3b:1b:f9:61.
Are you sure you want to continue connecting (yes/no)? yes
root@XXX.XXX.XXX.XXX's password:
I/O warning : failed to load external entity "/etc/pve/cluster.conf"
ccs_tool: Error: unable to parse requested configuration file


command 'ccs_tool lsnode -c /etc/pve/cluster.conf' failed: exit code 1
unable to add node: command failed (ssh XXX.XXX.XXX.XXX -o BatchMode=yes pvecm addnode virt-10 --force 1)

Do you have any solution for this.

Best regards
 
Do you have any solution for this.

Best regards

pls open a new thread for this new problem, provide as many details as possible.
 
Hello, I solve the problem with syncing two files:
/var/lib/pve-cluster/corosync.authkey
/etc/pve/cluster.conf

Is this OK?

Best regards,
Borut
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!