Is it possible delete a cluster and reuse its nodes?

Pablo Alcaraz

Member
Jul 6, 2017
53
8
13
54
I have a cluster of 1 node and I cannot add a 2nd node.

I described my case here: https://forum.proxmox.com/threads/migration-failed-because-ssh-connection.37808/

Is is possible to delete all the cluster without reinstall Proxmox VE? I want to do that and recreate the cluster from the beginning because I have 10+ virtual machines running in 1 node and 0 VM running in the 2nd node. I do not want to re create those machines.

Or if possible find a solution so my current and only node in the cluster accepts the 2nd node as described here: https://forum.proxmox.com/threads/migration-failed-because-ssh-connection.37808/
 
I found the problem. I think there is a bug in the command pvecm

I deleted the cluster. But the cluster was not the problem. The problem happens when you use pvecm add or create in Proxmox 5.1. For example, if you execute:

pvecm add IP-ADDRESS-CLUSTER

this command invoke a ssh-copy-id without parameter -i (same pvecm create). Or they invoke something that invoke ssh-copy-id without -i.

As you know, ssh-copy-id by default (without -i) invokes ssh-add -L that (from the man page) "... Lists public key parameters of all identities currently represented by the agent...".

So, IF you open a console using your GUI AND you login to the node using ssh AND use the ssh agent AND the command "ssh-add -L" lists some certificate of yours, THEN pvecm command will copy all YOUR public keys cached in the agent in Proxmox VE cluster configuration making them available urbi et orbi. This happens when you create the cluster too.

It has several ANDs but it will happen with each Linux user normally fancy enough to use a GUI and perhaps with Mac users too.

That is a horrible leak of your/my certificates, but until now, there is not error.

There is not error, until you add a 2nd node to the cluster. In this case, the certificates leaked by pvecm command are listed again so the command will try to re-store them in the cluster again, but they are already there, so you got a nice:

unable to copy ssh ID: exit code 1

and after that you never ever will be able to add a node to the cluster again.


The workaround is to use the browser console to execute any command in PVE hosts, or type commands in a text console opened physically in the host or remove your cached certificates from ssh agent (unacceptable, because if you have them there is for a good reason).
Still, those workarounds are not good enough because console browser could be available or not and perhaps the host is physically unavailable.

So this is a bug and this is a pub ssh certificate leak of the command pvecm.

PS: Please add:

pvecm remove node
pvecm destroy cluster
 
I got all the time the message: "unable to copy ssh ID"

so if the option -i is present, it is other thing. But I could not pass that point until I deleted the cluster and recreated again.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!