Joining cluster troubles in production environement !

supervache

Member
Dec 6, 2019
27
0
6
34
Hello.
I have a problem with a cluster, so I give you the context and then I explain you the problem I have.

Context :
I have 2 multi-nodes clusters
- one on proxmox V5 / Debian 9
- One with proxmox V6 / Debian 10

My goal is to migrate all my proxmox V5 CT on my new cluster (proxmox V6) and each times I have migrate all CTs of a proxmox V5 node, I reinstall a fresh proxmox V6, and I join it to the new cluster.

my current state is :
proxmox-0001 -> OLD cluster
proxmox-0002 -> Migrating to NEW cluster
proxmox-0003 -> OLD cluster
proxmox-0004 -> NEW cluster
proxmox-0005 -> NEW cluster (first node, I have create the cluster on it)
proxmox-0006 -> NEW cluster

My problem :
The procedure to join cluster seems to be failed for proxmox-0002 : I don't have any message that indicates me it's all right, and proxmoxmox-0002 appear in my server view on each nodes of the cluster, but it appear as "broken".

Each nodes are really slow now (web interface includeed). and commands like : pct list don't respond anything...

On any members of the cluster I have this problem, and on any nodes, when I navigate to DATACENTER/cluster, I have a list with my nodes, but the form allows me to create a cluster (not join it) what is insane.

cluster.png

I tried to reinstall proxmox-0002, because I can't access to webinterface and They aren't CT on it. But my cluster seems broken. I can't create new CT on my cluster becase nodes list is empty (see the other attachment).

empty.png


How can I fix this ?

I tried to reboot proxmox-0005 and proxmox-0002 nodes, but still broken :(

I specify that I have production servers that run on this cluster.

Really thank you for your urgent help.
 
Last edited:
I think I found a solution to restore my cluster (removing proxmox-0002) :

First I reboot my proxmox-0002 in rescue mode (an OVH live OS).

Then on proxmox-0005 I ran :
Code:
pvecm delnode proxmox-0002
ls -l /etc/pve/nodes/                    # Shows me a folder for proxmox-0002 so I "remove" it
mv /etc/pve/nodes/proxmox-0002/ /root/    # Actually I save it in my /root directory
cat /etc/corosync/corosync.conf            # No more informations about proxmox-0002

Now I can create CT and my proxmox join informations are back.

BUT :
After 2 reinstallation, I can't add proxmox-0002 to my cluster. It's a problem for me
Can you help me please ? If I don't found solution I would have to create a new cluster again and migrate my production serveur on it again...

maybe there is persistent data in a database? I have no idea how to deal with the problem but it is a handicap.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!