Broken cluster don't know what to do.

Dunsparth · Feb 22, 2024

Hi, i had 5 nodes all in one cluster.

i ran
systemctl stop pve-cluster
systemctl stop corosync
pmxcfs -l
rm /etc/pve/corosync.conf
rm -r /etc/corosync/*
killall pmxcfs
systemctl start pve-cluster

on all my nodes and 3 of them are gone from the cluster how i wanted it.
but the 1st and 5th one are still appearing in the GUI even tho they arent clustered and i can't access them unless connected to seperate GUI instances.

pvecm status
returns
Error: Corosync config '/etc/pve/corosync.conf' does not exist - is this node part of a cluster?

pvecm expected 1
returns
Error: Corosync config '/etc/pve/corosync.conf' does not exist - is this node part of a cluster?

if i type
# pvecm delnode pve05
it returns
Node/IP: pve05 is not a known host of the cluster.

and vice versa if i do.
pvecm delnode pve
Node/IP: pve is not a known host of the cluster.

something is messed up somewhere and i have no idea what to do
ive gone thru and read
https://pve.proxmox.com/wiki/Cluster_Manager

and checked many forum post and still cant get a fix to this thing need some professional help thanks.

alexskysilk · Feb 23, 2024

so, here are the steps:

1. turn everything off.
2. turn on one of your member nodes. wait till its up
3. turn on another member node. wait till its up. are you able to ping each other on their COROSYNC ADDRESS?
4. if yes, is the cluster "up?"
4b. if no, fix your networking issues and repeat 3
5. if cluster is up, repeat with next node until all expected nodes are up
5b if cluster is still not up, you have two options: continue to fiddle with corosync and cluster configuration files, OR make copies of all your vms, blow away ALL the nodes and reset everything up. the latter is the safer (and likely faster) option.

Dunsparth · Feb 23, 2024

alexskysilk said:
so, here are the steps:

1. turn everything off.
2. turn on one of your member nodes. wait till its up
3. turn on another member node. wait till its up. are you able to ping each other on their COROSYNC ADDRESS?
4. if yes, is the cluster "up?"
4b. if no, fix your networking issues and repeat 3
5. if cluster is up, repeat with next node until all expected nodes are up
5b if cluster is still not up, you have two options: continue to fiddle with corosync and cluster configuration files, OR make copies of all your vms, blow away ALL the nodes and reset everything up. the latter is the safer (and likely faster) option.

Going to try this now what command do i use to ping to the corosync addresses so i can report back as soon as i go thru these steps.

Dunsparth · Feb 23, 2024

alexskysilk said:
so, here are the steps:

1. turn everything off.
2. turn on one of your member nodes. wait till its up
3. turn on another member node. wait till its up. are you able to ping each other on their COROSYNC ADDRESS?
4. if yes, is the cluster "up?"
4b. if no, fix your networking issues and repeat 3
5. if cluster is up, repeat with next node until all expected nodes are up
5b if cluster is still not up, you have two options: continue to fiddle with corosync and cluster configuration files, OR make copies of all your vms, blow away ALL the nodes and reset everything up. the latter is the safer (and likely faster) option.

i'm able to to use the ping command on both nodes using there IP's.
and they will talk back and forth to eachother that way.

even if i try to go and create a new cluster i am getting this error.

corosync-keygen: Could not create /etc/corosync/authkey: No such file or directory
Corosync Cluster Engine Authentication key generator.
Gathering 2048 bits for key from /dev/urandom.
TASK ERROR: command '/usr/sbin/corosync-keygen -lk /etc/corosync/authkey' failed: exit code 2

alexskysilk · Feb 23, 2024

I have no idea what you've done, but that last comment makes me assume a ton of regression will be needed before we can even figure it out. you're in step 5b now; you have your options.

RocketSam · Feb 23, 2024

Well you may try to remove Cluster and corresponding data. I didn't do it myself so no guarantee at all.
List and delete all nodes

pvecm nodes
pvecm delnode pve2
pvecm delnode pve3

Wait for several minutes and then stop cluster services
systemctl stop pvestatd pvedaemon pve-cluster corosync
Now we need to remove cluster config:
sqlite3 /var/lib/pve-cluster/config.db
> DELETE FROM tree WHERE name = 'corosync.conf';
> .quit
rm -f /var/lib/pve-cluster/.pmxcfs.lockfile
rm /etc/pve/corosync.conf
rm /etc/corosync/*
rm /var/lib/corosync/*
systemctl start pvestatd pvedaemon pve-cluster corosync

alexskysilk · Feb 23, 2024

RocketSam said:
Well you may try to remove Cluster and corresponding data. I didn't do it myself so no guarantee at all.
List and delete all nodes

While this works, it doesnt remove keys etc; safer to just start from scratch.

Dunsparth · Feb 25, 2024

RocketSam said:
Well you may try to remove Cluster and corresponding data. I didn't do it myself so no guarantee at all.
List and delete all nodes

pvecm nodes
pvecm delnode pve2
pvecm delnode pve3

Wait for several minutes and then stop cluster services
systemctl stop pvestatd pvedaemon pve-cluster corosync
Now we need to remove cluster config:
sqlite3 /var/lib/pve-cluster/config.db
> DELETE FROM tree WHERE name = 'corosync.conf';
> .quit
rm -f /var/lib/pve-cluster/.pmxcfs.lockfile
rm /etc/pve/corosync.conf
rm /etc/corosync/*
rm /var/lib/corosync/*
systemctl start pvestatd pvedaemon pve-cluster corosync

pvecm nodes wont work all it does is return
Error: Corosync config '/etc/pve/corosync.conf' does not exist - is this node part of a cluster?

im going to have to wipe these things and start fresh, i hooked up a monitor to the server and rebooted it earlier and getting all these errors in the CLI.

So now i have to figure out how i can back up my containers that i have on the server without loosing everything..

esi_y · Feb 25, 2024

Dunsparth said:
but the 1st and 5th one are still appearing in the GUI even tho they arent clustered

Just go and delete them from /etc/pve/nodes/$nodename

A regular rm -rf would do - as you are not clustered, you might need to do this individually on each standalone node.

Dunsparth said:
and i can't access them unless connected to seperate GUI instances.

That's what you wanted though, correct?

esi_y · Feb 25, 2024

Dunsparth said:
i hooked up a monitor to the server and rebooted it earlier and getting all these errors in the CLI.

So what's up with your SMB shares?

Dunsparth said:
So now i have to figure out how i can back up my containers that i have on the server without loosing everything..

vzdump [1]

Something tells me you will need those SMB shares though...

[1] https://pve.proxmox.com/pve-docs/vzdump.1.html

Dunsparth · Feb 25, 2024

tempacc346235 said:
Just go and delete them from /etc/pve/nodes/$nodename

A regular rm -rf would do - as you are not clustered, you might need to do this individually on each standalone node.

That's what you wanted though, correct?

Well that worked Thank you.

Now im going to need to figure out why i am getting all these errors in the console.

esi_y · Feb 25, 2024

Dunsparth said:
Well that worked Thank you.

Now im going to need to figure out why i am getting all these errors in the console.

Have a look at journal -b when they happen, i.e. what precedes them. Do you have failing samba shares?

Dunsparth · Feb 25, 2024

tempacc346235 said:
Have a look at journal -b when they happen, i.e. what precedes them. Do you have failing samba shares?

Thank you so much man its all back up and running again.
I'm not going to be able to cluster it all together but itleast its running and i can maybe get a backup server going to at the least back everything up then install proxmox on this box.

esi_y · Feb 26, 2024

Dunsparth said:
I'm not going to be able to cluster it all together

What's the problem?

Search

Search

Broken cluster don't know what to do.

Dunsparth

New Member

alexskysilk

Distinguished Member

Dunsparth

New Member

Dunsparth

New Member

alexskysilk

Distinguished Member

RocketSam

New Member

alexskysilk

Distinguished Member

Dunsparth

New Member

Attachments

esi_y

Active Member

esi_y

Active Member

Dunsparth

New Member

esi_y

Active Member

Dunsparth

New Member

esi_y

Active Member