Cannot create cluster at all

Bisser · Sep 4, 2019

I go to the web gui I push the button Create Cluster. I type the name of the cluster and push the Create button.
I get an error - cluster config '/etc/pve/corosync.conf' already exists (500).

What to do next?

I got to that situation in the following way as long as I remember:
In the beginning i successfully created a cluster then I tried to add a node. Then I was shown some error which I don't remember and there was a spinning wheel. I waited for sometime then I closed the browser and opened it again. The second node I was adding got messed up and I reinstalled it because there was nothing on it. On the node where I created the cluster the failed node still shows in the web ui as not accessible. Then I searched the web and found some commands to run to delete the cluster. I don't remember everything I tired. So now I am able to push the create cluster button on the web but I get the above error. Please advise.

Glowsome · Sep 5, 2019

check the following :

- on the node you are trying to 'create the cluster' run : pvecm nodes from commandline/console session
If this returns information the node you are trying to create the cluster on already is acting like a member of a previous attempt to create a cluster.

To be quick on explanation, dont try to recreate the cluster, as it seems to have been created, focus on the error of the node-add , your problem is there.

Bisser · Sep 5, 2019

this is what I get from pvecm nodes
Nodeid Votes Name
1 1 51.89.X.X (local)

I cannot add a new node because I cannot access the Join Information button - it is Disabled.

t.lamprecht · Sep 5, 2019

Bisser said:
I cannot add a new node because I cannot access the Join Information button - it is Disabled.

So you neither could click the "Create Cluster" but also not the "Join Information"??

Can you post the output of:

Code:

pvesh get /cluster/config/join --output-format=yaml

t.lamprecht · Sep 5, 2019

Can you also please post the corosync.conf content:

Code:

cat /etc/pve/corosync.conf

Bisser · Sep 5, 2019

Create cluster is enabled but I cannot create cluster because I get cluster config '/etc/pve/corosync.conf' already exists (500).
Join Information is disabled. Join Cluster is enabled.

the result from
pvesh get /cluster/config/join --output-format=yaml
is
unable to read '/etc/pve/nodes/PX-XXX1/pve-ssl.pem' - No such file or directory

Output of
cat /etc/pve/corosync.conf

logging {
debug: off
to_syslog: yes
}

nodelist {
node {
name: GE-XXX3
nodeid: 1
quorum_votes: 1
ring0_addr: 51.89.X.XXX
}
node {
name: PX-XXX1
nodeid: 2
quorum_votes: 1
ring0_addr: 139.99.XXX.X
}
}

quorum {
provider: corosync_votequorum
}

totem {
cluster_name: Bxxxxxx
config_version: 2
interface {
bindnetaddr: 51.89.X.XXX
ringnumber: 0
}
ip_version: ipv4
secauth: on
version: 2
}

t.lamprecht · Sep 5, 2019

Bisser said:
unable to read '/etc/pve/nodes/PX-XXX1/pve-ssl.pem' - No such file or directory

Huh, did you fiddled with your SSL certificates? As above seems to be the underlying issue of your situation..

can you run a

Code:

pvecm updatecerts

and try again?

Bisser · Sep 5, 2019

I haven't touched anything about certificates. But I remember on the second server when I was trying to add it to the cluster the error shown was something about SSL.
It could be because I tried to change the node name of the server I was adding but that created even a bigger mess so I just reverted the node name back it still didn't work so I reinstalled the second server but now I have a problem with the main node.

pvecm updatecerts - returns
(re)generate node files
merge authorized SSH keys and known hosts

pvesh get /cluster/config/join --output-format=yaml
again the same
unable to read '/etc/pve/nodes/PX-XXX1/pve-ssl.pem' - No such file or directory

Isn't there a button to just get rid of everything cluster related and start from scratch?

t.lamprecht · Sep 5, 2019

Bisser said:
Isn't there a button to just get rid of everything cluster related and start from scratch?

That alone won't help you, the missing SSL Certificate file needs to be fixed too....

But it seems that in your corosync configuration both nodes are already added, so you probably are not quorated (see pvecm status) and thus the updatecerts command could not re-generated the missing SSL file.

Maybe it's really the best thing to kill the cluster, here are the steps required - as there's no simple button, separating a clusters in a general way is not really possible, as they share resources depending on the specific setup.

Note, this is something I'd only do in your situation and not recommended in any way in genearl (for others reading this):

Code:

# DANGEROUS, only do for single node cluster or those where all other nodes got re-installed/purged

# below two steps are only required if not quorate
systemctl stop pve-cluster
# restart in local mode
pmxcfs -l

# remove all corosync cluster configuration traces
rm -f /etc/pve/corosync.conf
rm -rf /etc/corosync/*
systemctl stop corosync pve-cluster
systemctl start pve-cluster

t.lamprecht · Sep 5, 2019

And then re-do a "pvecm updatecerts" and check if the /etc/pve/nodes/PX-XXX1/pve-ssl.pem file is here

Bisser · Sep 5, 2019

I saw plenty of articles of messed up clusters. This whole process seems quite fragile. In my case I think the time I spent reading what to do is much more than to simply reinstall and move the VMs. This time I will install the servers and the first thing to do is the cluster. Once it is running then I will install the VMs. Hopefully it doesn't fall apart at some point.

t.lamprecht · Sep 5, 2019

Re-install works too. And what articels? Did the few steps from my post not worked?

Bisser · Sep 5, 2019

While I was searching for solutions there seems to be so many people who have cluster problems cluster getting stuck and nothing would work but reinstall. I will try the steps you suggested but first I will copy everything to another server just in case. I don't want to lose anything.

Glowsome · Sep 5, 2019

Bisser said:
While I was searching for solutions there seems to be so many people who have cluster problems cluster getting stuck and nothing would work but reinstall. I will try the steps you suggested but first I will copy everything to another server just in case. I don't want to lose anything.

Been running a 4-node cluster now without huge problems for over 1.5 years, just the perils i usually create myself.
And till now with help of the docs / forums i've alway been able to get it all back to a 'nice and tidy' state.

Bisser · Sep 6, 2019

Ok this cluster thing doesn't work. I did a clean install on 2 servers they are hosted at OVH. VPS Proxmox VE 5 (ZFS). Both servers have full access to each other - no firewall restrictions. What I did:
1. Opened the web interface on SVR1 and clicked Create Cluster on SVR1. The cluster was created.
2. Opened the web interface on SVR2 and clicked Join Cluster. I copied the join information from SVR1 and used its root pass.
3. I got the following on SVR2
Establishing API connection with host '51.89.21.201'
Login succeeded.
Request addition of this node
Join request OK, finishing setup locally
stopping pve-cluster service

Also in the background of SVR2 I can see a spinning wheel and it says: permission denied - invalid PVE ticket (401).
This froze like that and it has been staying like that for about 15 minutes already. I don't think anything else will happen. This is what happened the last time I tried as well.

I rebooted both servers

And I can no longer enter the web interface of SVR2.
SVR1 shows 2 nodes but the second one is not active and SVR1 says it is a standalone node - not a cluster anymore.

Clean install and doesn't work. Strange. Any suggestions are welcome. I will not attempt to repair it I just want to make it work from clean install. If anyone can provide instruction of how can this be done it would be great. Thanks.

Glowsome · Sep 6, 2019

Not saying it should'nt work, but when i setup the cluster i did it all over commandline - following https://pve.proxmox.com/wiki/Cluster_Manager#pvecm_create_cluster last node i (re-)added was one i took out cause of disk-replacements and reinstalled it on Deb 10 , then switched to Pve 6.x , and added it to the cluster without any issues encountered

kifeo · Nov 14, 2019

Hi Bisser, I ended up in the same situation as yours, but I wonder if recreating the cluster means loosing the ceph also ?

Or if I setup the OSD the same way it will keep the data ? or import the ceph cluster ?

Bisser · Nov 14, 2019

I gave up on it. I never had it working even after a clean install. I took all proxmox images and converted them with qemu-img to vhdx and now I am running Hyper-V replication.

t.lamprecht · Nov 15, 2019

Bisser said:
Also in the background of SVR2 I can see a spinning wheel and it says: permission denied - invalid PVE ticket (401).
This froze like that and it has been staying like that for about 15 minutes already. I don't think anything else will happen. This is what happened the last time I tried as well.

I rebooted both servers

FWIW: at this points it's not recommended to reboot, better to log into the console (ssh/iKVM/...) and check out some logs for error messages, else nobody can ever know why what/happend. From this description it seems that the join was somewhat successful, but the nodes did not get quorate afterwards, and that may just be the OVH network..

kifeo said:
Hi Bisser, I ended up in the same situation as yours, but I wonder if recreating the cluster means loosing the ceph also ?

Not sure about same situation? The OP failed to (re)create a cluster at OVH? What HW and network is not known.

What's your specific situation, if you say you have ceph it seems that you already have a cluster successfully created??

kifeo · Nov 15, 2019

there is the ceph cluster and the proxmox cluster.

I could resolve my issue by performing the following procedure :
https://pve.proxmox.com/pve-docs/pve-admin-guide.html at paragraph 6.5.1 Separate A Node Without Reinstalling

Then I had to remove some other files in the node folder on the remaining nodes on the cluster and the out-of-cluster node to be able to join the cluster back.

Cannot create cluster at all

New Member

Renowned Member

New Member

Proxmox Staff Member

Proxmox Staff Member

New Member

Proxmox Staff Member

New Member

Proxmox Staff Member

Proxmox Staff Member

New Member

Proxmox Staff Member

New Member

Renowned Member

New Member

Renowned Member

Well-Known Member

New Member

Proxmox Staff Member

Well-Known Member

We value your privacy