Former Cluster, just disappears

BloodyIron

Renowned Member
Jan 14, 2013
288
22
83
it.lanified.com
I'm not sure when this happened, but recently when I check the "Cluster" section of my Datacentre, it says there's no cluster. Except, it has been a cluster for over 5 years now, currently with 5x nodes.

The whole cluster is 5.3-11, and I have been maintaining and upgrading it since about 2.3, following the proper documentation over the years for upgrading and such.

Now, I have no idea what happened. There "join info" button is greyed out, I can "Create Cluster" or "Join Cluster".

But I can still do Clustery things, like Live Migrate, manage the whole cluster from any node, etc.

What on EARTH is going on?!?! Halp!
 
can your nodes resolve the nodesnames of the others?
what is the output of
Code:
pvesh get /cluster/config/join
 
Okay it looks like I messed up a good while ago, and I'm not sure the best course to correct this.

I have 5x nodes.

Small1
Small2
Dormant1
Dormant2
BigBoy1

Early history, the cluster was

Small1
Small2

Then I got new servers, and added them to the cluster

Small1
Small2
Dormant1
Dormant2

However, Dormant1 and Dormant2 were very loud and power inefficient. I primarily have them because they were $0, and have a lot of RAM (DDR2 though), so I would only turn them on when I need to lab large stuff, hence the name "Dormant". As such, I gave Small1 and Small2, 2 votes each, and Dormant1 and Dormant2, 1 vote each. I would also turn on Dormant1 and Dormant2 if I needed to upgrade/update/reboot Small1 OR Small2 for whatever reason.

Then I got a new server, and I think this is where I don goofed. I added BigBoy1.

Small1
Small2
Dormant1
Dormant2
BigBoy1

However, I just turned Dormant1 and Dormant2 on, and not only do they not see BigBoy1, but they are also 5.2.x, where my other nodes are 5.3.x.

So I think I goofed in that I forgot to turn Dormant1 and Dormant2 on when BigBoy1 joined the cluster. And now Dormant1 and Dormant2, when I turn them only, only see themselves as "online". And when Dormant1 and Dormant2 are on, the cluster do not see them as on.

When I run the command you advised me to do, I got the output:

"hostname lookup 'dormant1' failed - failed to get address info for: doormant1: Name or service not knoown"

So, at this point, it looks like my cluster is in a bad state, and I'm not sure what the appropriate steps are to address this. Please help!
 
"hostname lookup 'dormant1' failed - failed to get address info for: doormant1: Name or service not knoown"
first i would add your nodes to the /etc/hosts so that they can resolve them

So I think I goofed in that I forgot to turn Dormant1 and Dormant2 on when BigBoy1 joined the cluster. And now Dormant1 and Dormant2, when I turn them only, only see themselves as "online". And when Dormant1 and Dormant2 are on, the cluster do not see them as on.
i would stop pve-cluster and corosync on those nodes, copy the corosync.conf from /etc/corosync/ to the nodes restart corosync and pve-cluster then they should see each other again
 
Okay I just want to clarify so I precisely follow your direction here

  1. Add dormant1 and dormant2 to the hosts of small1, small2 and bigboy1
  2. On dormant1 and dormant2, stop pve-cluster
  3. FROM small1, copy /etc/corosync/corosync.conf TO dormant1 and dormant2 in the same location
  4. On dormant1 and dormant2, restart corosync and pve-cluster
Is the hosts thing a temporary thing? Because right now small1 does not have any entry in /etc/hosts for anything but itself.

Also, thanks for your help! :D Please let me know if I missed any details here.


first i would add your nodes to the /etc/hosts so that they can resolve them


i would stop pve-cluster and corosync on those nodes, copy the corosync.conf from /etc/corosync/ to the nodes restart corosync and pve-cluster then they should see each other again
 
Any chance I can get your clarification on above please? :) I'm holding off on executing to hear from you.

first i would add your nodes to the /etc/hosts so that they can resolve them


i would stop pve-cluster and corosync on those nodes, copy the corosync.conf from /etc/corosync/ to the nodes restart corosync and pve-cluster then they should see each other again
 
  • Add dormant1 and dormant2 to the hosts of small1, small2 and bigboy1
  • On dormant1 and dormant2, stop pve-cluster
  • FROM small1, copy /etc/corosync/corosync.conf TO dormant1 and dormant2 in the same location
  • On dormant1 and dormant2, restart corosync and pve-cluster
looks good

Is the hosts thing a temporary thing? Because right now small1 does not have any entry in /etc/hosts for anything but itself.
generally the nodes should be able to resolve all other node names, this can be via /etc/hosts or any other means (e.g. your local dns server)
 
  • Like
Reactions: BloodyIron
WOOT! IT WORKED!

The cluster section under Data Centre now has "Join Cluster" enabled and "Create Cluster" greyed out! It says it's a cluster, and all that!

Thanks a tonne!

I'm commenting out the manual resolution in the hosts for the time being (now that they're rejoined), simply to keep homogeneous operations. And if that somehow breaks things, I may set it back up. Yay! :DDD

looks good


generally the nodes should be able to resolve all other node names, this can be via /etc/hosts or any other means (e.g. your local dns server)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!