Cannot create cluster at all

Bisser

New Member
May 13, 2019
13
0
1
54
I go to the web gui I push the button Create Cluster. I type the name of the cluster and push the Create button.
I get an error - cluster config '/etc/pve/corosync.conf' already exists (500).

What to do next?

I got to that situation in the following way as long as I remember:
In the beginning i successfully created a cluster then I tried to add a node. Then I was shown some error which I don't remember and there was a spinning wheel. I waited for sometime then I closed the browser and opened it again. The second node I was adding got messed up and I reinstalled it because there was nothing on it. On the node where I created the cluster the failed node still shows in the web ui as not accessible. Then I searched the web and found some commands to run to delete the cluster. I don't remember everything I tired. So now I am able to push the create cluster button on the web but I get the above error. Please advise.
 
check the following :

- on the node you are trying to 'create the cluster' run : pvecm nodes from commandline/console session
If this returns information the node you are trying to create the cluster on already is acting like a member of a previous attempt to create a cluster.

To be quick on explanation, dont try to recreate the cluster, as it seems to have been created, focus on the error of the node-add , your problem is there.
 
this is what I get from pvecm nodes
Nodeid Votes Name
1 1 51.89.X.X (local)

I cannot add a new node because I cannot access the Join Information button - it is Disabled.
 
I cannot add a new node because I cannot access the Join Information button - it is Disabled.

So you neither could click the "Create Cluster" but also not the "Join Information"??

Can you post the output of:

Code:
pvesh get /cluster/config/join --output-format=yaml
 
Create cluster is enabled but I cannot create cluster because I get cluster config '/etc/pve/corosync.conf' already exists (500).
Join Information is disabled. Join Cluster is enabled.

the result from
pvesh get /cluster/config/join --output-format=yaml
is
unable to read '/etc/pve/nodes/PX-XXX1/pve-ssl.pem' - No such file or directory

Output of
cat /etc/pve/corosync.conf

logging {
debug: off
to_syslog: yes
}

nodelist {
node {
name: GE-XXX3
nodeid: 1
quorum_votes: 1
ring0_addr: 51.89.X.XXX
}
node {
name: PX-XXX1
nodeid: 2
quorum_votes: 1
ring0_addr: 139.99.XXX.X
}
}

quorum {
provider: corosync_votequorum
}

totem {
cluster_name: Bxxxxxx
config_version: 2
interface {
bindnetaddr: 51.89.X.XXX
ringnumber: 0
}
ip_version: ipv4
secauth: on
version: 2
}
 
unable to read '/etc/pve/nodes/PX-XXX1/pve-ssl.pem' - No such file or directory

Huh, did you fiddled with your SSL certificates? As above seems to be the underlying issue of your situation..

can you run a
Code:
pvecm updatecerts
and try again?
 
I haven't touched anything about certificates. But I remember on the second server when I was trying to add it to the cluster the error shown was something about SSL.
It could be because I tried to change the node name of the server I was adding but that created even a bigger mess so I just reverted the node name back it still didn't work so I reinstalled the second server but now I have a problem with the main node.

pvecm updatecerts - returns
(re)generate node files
merge authorized SSH keys and known hosts

pvesh get /cluster/config/join --output-format=yaml
again the same
unable to read '/etc/pve/nodes/PX-XXX1/pve-ssl.pem' - No such file or directory


Isn't there a button to just get rid of everything cluster related and start from scratch?
 
Isn't there a button to just get rid of everything cluster related and start from scratch?

That alone won't help you, the missing SSL Certificate file needs to be fixed too....

But it seems that in your corosync configuration both nodes are already added, so you probably are not quorated (see pvecm status) and thus the updatecerts command could not re-generated the missing SSL file.

Maybe it's really the best thing to kill the cluster, here are the steps required - as there's no simple button, separating a clusters in a general way is not really possible, as they share resources depending on the specific setup.

Note, this is something I'd only do in your situation and not recommended in any way in genearl (for others reading this):
Code:
# DANGEROUS, only do for single node cluster or those where all other nodes got re-installed/purged

# below two steps are only required if not quorate
systemctl stop pve-cluster
# restart in local mode
pmxcfs -l

# remove all corosync cluster configuration traces
rm -f /etc/pve/corosync.conf
rm -rf /etc/corosync/*
systemctl stop corosync pve-cluster
systemctl start pve-cluster
 
I saw plenty of articles of messed up clusters. This whole process seems quite fragile. In my case I think the time I spent reading what to do is much more than to simply reinstall and move the VMs. This time I will install the servers and the first thing to do is the cluster. Once it is running then I will install the VMs. Hopefully it doesn't fall apart at some point.
 
While I was searching for solutions there seems to be so many people who have cluster problems cluster getting stuck and nothing would work but reinstall. I will try the steps you suggested but first I will copy everything to another server just in case. I don't want to lose anything.
 
While I was searching for solutions there seems to be so many people who have cluster problems cluster getting stuck and nothing would work but reinstall. I will try the steps you suggested but first I will copy everything to another server just in case. I don't want to lose anything.
Been running a 4-node cluster now without huge problems for over 1.5 years, just the perils i usually create myself.
And till now with help of the docs / forums i've alway been able to get it all back to a 'nice and tidy' state.
 
Ok this cluster thing doesn't work. I did a clean install on 2 servers they are hosted at OVH. VPS Proxmox VE 5 (ZFS). Both servers have full access to each other - no firewall restrictions. What I did:
1. Opened the web interface on SVR1 and clicked Create Cluster on SVR1. The cluster was created.
2. Opened the web interface on SVR2 and clicked Join Cluster. I copied the join information from SVR1 and used its root pass.
3. I got the following on SVR2
Establishing API connection with host '51.89.21.201'
Login succeeded.
Request addition of this node
Join request OK, finishing setup locally
stopping pve-cluster service

Also in the background of SVR2 I can see a spinning wheel and it says: permission denied - invalid PVE ticket (401).
This froze like that and it has been staying like that for about 15 minutes already. I don't think anything else will happen. This is what happened the last time I tried as well.

I rebooted both servers

And I can no longer enter the web interface of SVR2.
SVR1 shows 2 nodes but the second one is not active and SVR1 says it is a standalone node - not a cluster anymore.

Clean install and doesn't work. Strange. Any suggestions are welcome. I will not attempt to repair it I just want to make it work from clean install. If anyone can provide instruction of how can this be done it would be great. Thanks.
 
Last edited:
Hi Bisser, I ended up in the same situation as yours, but I wonder if recreating the cluster means loosing the ceph also ?

Or if I setup the OSD the same way it will keep the data ? or import the ceph cluster ?
 
I gave up on it. I never had it working even after a clean install. I took all proxmox images and converted them with qemu-img to vhdx and now I am running Hyper-V replication.
 
Also in the background of SVR2 I can see a spinning wheel and it says: permission denied - invalid PVE ticket (401).
This froze like that and it has been staying like that for about 15 minutes already. I don't think anything else will happen. This is what happened the last time I tried as well.

I rebooted both servers

FWIW: at this points it's not recommended to reboot, better to log into the console (ssh/iKVM/...) and check out some logs for error messages, else nobody can ever know why what/happend. From this description it seems that the join was somewhat successful, but the nodes did not get quorate afterwards, and that may just be the OVH network..

Hi Bisser, I ended up in the same situation as yours, but I wonder if recreating the cluster means loosing the ceph also ?

Not sure about same situation? The OP failed to (re)create a cluster at OVH? What HW and network is not known.

What's your specific situation, if you say you have ceph it seems that you already have a cluster successfully created??
 
there is the ceph cluster and the proxmox cluster.

I could resolve my issue by performing the following procedure :
https://pve.proxmox.com/pve-docs/pve-admin-guide.html at paragraph 6.5.1 Separate A Node Without Reinstalling

Then I had to remove some other files in the node folder on the remaining nodes on the cluster and the out-of-cluster node to be able to join the cluster back.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!