Proxmox clustering issues

kperez99

New Member
Jul 18, 2024
6
0
1
Hi All,

I hope you are doing well. I am testing it out Proxmox and I have the following scenario connecting at home:

1721314383315.png

I have followed the instructions as this documentation, explains: https://pve.proxmox.com/wiki/Cluster_Manager

but when I create a cluster and join the cluster to the second node, it seems during the process node 2 loses connection in the GUI, but SSH is communicating with each other successfully... Is there can anyone who could guide me what issue could be?
 
I experiencing an error, from node 2 this issue: '/etc/pve/nodes/pve2/pve-ssl.pem' does not exist! (500)
 
When a server is joining a cluster, it will reset the SSL-Certificate, so having to refresh your browser (and accept the certificate-warning) is to be expected.
That said though, if it really isn't generating the certificates correctly, there might be multiple reasons for that.

First off all, could you run the following commands on each node (since you can still reach them through SSH)
Code:
pvecm status
pvenode cert info
timedatectl
cat /etc/hosts
 
  • Like
Reactions: kperez99
Hi Thanks for the initial checks, I have attached the info as requested some of the commands you mentioned like pvecm status and pvenodes cert info it goes hang and I am unable to retrieve the information. Also I notice when I tried to navigate over node one and sometimes the GUI goes offline and then recovers back "specially when I try to search information about the second node joined to the cluster"
 

Attachments

  • node2 - pvecm status hangs.png
    node2 - pvecm status hangs.png
    4.1 KB · Views: 6
  • node1 -pve status.png
    node1 -pve status.png
    76.9 KB · Views: 6
  • node1 - hangs pvenodes_cert_info.png
    node1 - hangs pvenodes_cert_info.png
    6.7 KB · Views: 6
  • node 2 - timedatectl.png
    node 2 - timedatectl.png
    27.4 KB · Views: 6
  • node 2 - etc_hosts.png
    node 2 - etc_hosts.png
    42.5 KB · Views: 6
  • node 1 - timedatectl.png
    node 1 - timedatectl.png
    37.4 KB · Views: 6
  • node 1 - etc_hosts.png
    node 1 - etc_hosts.png
    38.5 KB · Views: 6
Ok, they at least seem to be in a bit of a weird state.
A few things to note:
All the information shown is internal-only information, so not privacy-senstive, no need to hide parts of it ;)
For node 1 you seem to have not set up a (proper) hostname, which for node 2 is part of the "domain" it tries to find. which might be part of the problem, although I'm not sure.

There are two routes we could take:
1. If nothing else is still set up on node 1 (or you can make an off-server backup of it), it might be quicker to start fresh and do a couple of steps different in general / the same on both nodes.
2. If you want to continue to troubleshoot, we can try do that, but without a guarantee that it will work, and we might fix it now only to break again in the future and needing further repairs.

For option 1, the important step to keep a watch out for that I can see right now would be to set the correct FQDN, so pve1.kobu-smart.local for node 1 and pve2.kobu-smart.local for node 2. And also as a general tip: Check on your router/firewall where the DHCP-Range starts, and put your proxmox-hosts to an IP outside of that range. For example if the range goes from .10-.250, put your hosts on .8 and .9 for example. Keep the subnet-mask (/24), gateway and DNS-Server the same as it provides you though. Before you join the servers in the cluster again, make sure they can ping eachother by both name and IP, and that you can SSH-login into them in both directions. Also I generally wait 30 seconds at least between making the cluster and getting the details, and then joining it from the other server, although that might just be being over-cautious.

For option 2: probably the easiest method would be to remove one of the nodes (following [1] in the "Seperate Node without re-installing" method, or preferably even the method where you DO re-install to re-install one of the nodes, preferable re-install node 1 since that one looks to be incorrect hostname). On node 2 you might also need to follow these [2] / [3] instructions (same topic, different posts) to "reset" the SSL-Certificate on there to get the webpage working again, AFTER the disconnect. Also, if you during this also want to change your IP on the "remaining" host, don't forget to also change it in your hosts-file [4]

Speaking of the hosts-file in general btw, it's quite common to add the details of the other server into the hosts file of the other servers, to help find eachother by name.
Basically the line that you scribbled out on each side, to add that below that line in the other server, so you'd get something like this on each server:
Code:
127.0.0.1 localhost.localdomain localhost
192.168.178.8 pve1.kobu-smart.local pve1
192.168.178.9 pve2.kobu-smart.local pve2

# The following.......
(No need to add the IPv6's to the other one)

[1] https://pve.proxmox.com/wiki/Cluster_Manager#_remove_a_cluster_node
[2] https://forum.proxmox.com/threads/restore-self-signed-ssl-and-ca-for-node.96800/#post-419220
[3] https://forum.proxmox.com/threads/restore-self-signed-ssl-and-ca-for-node.96800/#post-596457
[4] https://www.servethehome.com/how-to-change-primary-proxmox-ve-ip-address/
 
Last edited:
Hi sw-omit,

I've tried following your instructions, I've corrected the hostnames by pve.kobu-smart. local & pve2.kobu-smart.local, also I have isolated these two hosts out of the DHCP server I have in place by XX.150 & XX.160 Ip addresses, as my DHCP range is between 1-100. The same situation occurs Certificates from the second node go hang, even though I notice, that whenever I try to access to "/etc/pve" from the second node, it doesn't respond, and the only way I can rid off of it I have to remove the node from the cluster and restore the root certificates to regain access to the web GUI. Unfortunately, I won't be able to reinstall proxmox in node 1 as I have a couple of VMs running in production. Is there any advice you can provide me?
 
1721728495441.png

As per described also you see in here the both nodes added to the cluster , but in the web gui looks like this:

1721728551985.png

1721728578362.png
 
I've finally opt-out to purchase a third mini pc to be able to do the cluster, I have tried to add the third node, but unfortunately does not going successful, only two nodes got clustered, and the third node goes hanging and never ended. If anyone can assist me. Would be appreciate.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!