Hi All, I appreciate any feedback I get here.
I have an existing 5 node cluster. It runs Ceph, but I dont believe this is applicable to the situation, but thought to mention it.
I also have an existing 3 node cluster. No shared storage.
I've migrated everything away from the 3 node cluster and have rebuilt the 3 servers to 6.4-9. All 3 were/are in a stand alone state.
I've pushed through updated on all 5 servers in the 5 node cluster and have completed reboots on 2 of the 5 nodes to update the kernel.
Both clusters exist on the same network, can ping each other, etc.
I tried to join each of the 3 rebuilt nodes to the 5 node cluster and not one of them completed successfully. They all would up stopping the Cluster Service and then never starting it again. It would show permission denied i the background screen and then eventually in the task viewer as well.
When I would hit the web login, it would error with failed login. If i went to the console, it would login with the same login that had failed in the web login.
If i went to the 5 node cluster, it would show the additional node, in the node list, but list as stand alone, and never finish the join. In this image below, PVE04-PVE08 existed in the cluster prior (still do) and PVE02 was the one being joined in.
To get everything back to normal i would have to run
and then it would error out that the node does not exist, and generate a new Cluster Version and things would resume as normal.
This is the output from pvecm status on one of the standalone nodes...all 3 are similar:
As I look at this more this morning with a fresh set of eyes, I see something weird (and unexpected). When i look at the status of the existing cluster, i see the below output: pvecm status
The names in the membership information show IP addresses that I would not expect to see. They are assigned to be part of CEPH. Where i am still waiting on my new SSDs for these 3 servers, I did not add the ceph networking yet, so the Network is unavailable.
I am looking for a little guidance now I guess. I understand that my installs that I have at this point need to be redone on each server that failed to join because the Server Finger Printed into the Cluster even though it isnt there now, and then created their own cluster. I'm ok with re-installing if i have to, or if its the path of least resistance at this point. As an edit/addition...The console works, the webpage refuses to load.
My question is, even though the join information uses the 10.0.0.0/24 address as part of the join, will it fall to the appropriate link, assuming it is present. While i am OK with re-installing, i'd prefer not to have to do it a few times to figure this out.
Any help is apprecaited.
I have an existing 5 node cluster. It runs Ceph, but I dont believe this is applicable to the situation, but thought to mention it.
I also have an existing 3 node cluster. No shared storage.
I've migrated everything away from the 3 node cluster and have rebuilt the 3 servers to 6.4-9. All 3 were/are in a stand alone state.
I've pushed through updated on all 5 servers in the 5 node cluster and have completed reboots on 2 of the 5 nodes to update the kernel.
Both clusters exist on the same network, can ping each other, etc.
I tried to join each of the 3 rebuilt nodes to the 5 node cluster and not one of them completed successfully. They all would up stopping the Cluster Service and then never starting it again. It would show permission denied i the background screen and then eventually in the task viewer as well.
When I would hit the web login, it would error with failed login. If i went to the console, it would login with the same login that had failed in the web login.
If i went to the 5 node cluster, it would show the additional node, in the node list, but list as stand alone, and never finish the join. In this image below, PVE04-PVE08 existed in the cluster prior (still do) and PVE02 was the one being joined in.
To get everything back to normal i would have to run
Code:
pvecm delnode nodename
This is the output from pvecm status on one of the standalone nodes...all 3 are similar:
As I look at this more this morning with a fresh set of eyes, I see something weird (and unexpected). When i look at the status of the existing cluster, i see the below output: pvecm status
The names in the membership information show IP addresses that I would not expect to see. They are assigned to be part of CEPH. Where i am still waiting on my new SSDs for these 3 servers, I did not add the ceph networking yet, so the Network is unavailable.
I am looking for a little guidance now I guess. I understand that my installs that I have at this point need to be redone on each server that failed to join because the Server Finger Printed into the Cluster even though it isnt there now, and then created their own cluster. I'm ok with re-installing if i have to, or if its the path of least resistance at this point. As an edit/addition...The console works, the webpage refuses to load.
My question is, even though the join information uses the 10.0.0.0/24 address as part of the join, will it fall to the appropriate link, assuming it is present. While i am OK with re-installing, i'd prefer not to have to do it a few times to figure this out.
Any help is apprecaited.
Last edited: