How to use private IP for proxmox cluster nodes to communicate

proxuser10

New Member
May 14, 2025
11
0
1
I have two dedicated machines in hetzner with public IPs. They are connected to a vSwitch with a private IP on a bridge that spans that VLAN connection. I created a cluster from host 1 using 172.16.1.2 as link 0. When I try to join host 2 to the cluster the dialog shows that host2 is using public IP of host 1 to authenticate when joining. Is that connection to the public IP of host 1 a one-time thing? I am trying to have the proxmox cluster over the 172.16.1.0 network rather than on the public network. How do I do that?

I can ping from 172.16.1.2 to 172.16.1.3 and back Port 8006 also shows as open

on host 1
code_language.shell:
auto enp5s0.4000
iface enp5s0.4000 inet static
address 172.16.1.2/24
vlan-raw-device enp5s0
mtu 1400

on host 2
code_language.shell:
auto enp5s0.4000
iface enp5s0.4000 inet static
address 172.16.1.3/24
vlan-raw-device enp5s0
mtu 1400

Thank you for your time!
 
Last edited:
Thank you. The pvecm command was what I was looking for. The private IP failed hostname verification. I added two entries for both hosts in /etc/hosts, on each host. Looks like the connection fails still for hostname. I am able to ssh into root@172.16.1.2 successfully when run standalone. What could be missing?

code_language.shell:
pvecm add 172.16.1.2 --link0 172.16.1.3
Please enter superuser (root) password for '172.16.1.2': *******
Establishing API connection with host '172.16.1.2'
The authenticity of host '172.16.1.2' can't be established.
X509 SHA256 key fingerprint is xyz
Are you sure you want to continue connecting (yes/no)? yes
500 Can't connect to 172.16.1.2:8006 (hostname verification failed)
 
Last edited:
Use hostnamectl to make sure both hosts have their proper FQDN hostname set, double check /etc/hosts are identical (eg. 10.0.0.1 hostname hostname.domain.tld). Did you change host names on either host at one point?
 
I dont see the fqdn with hostnamectl. Only hostname. But hostname -f lists the fqdn correctly. I tried pinging by fqdn and that works too. I copied the two lines from one /etc/hosts to the other. I have verified that they are identical. However, the order in my /etc/hosts file is 10.0.0.1 hostname.domain.tld hostname.My hostname is just proxmox1 and proxmox2 though. I havent changed it since I set it up.
 
What's the exact contents of these files in your 01 node (the only one with cluster at the moment)?:

- /etc/hosts
- /etc/pve/.members

Your /etc/hosts must point to an IP of the host, given you want nodes to communicate via the internal lan, should be like this:

172.16.1.2 proxmox1.domain.tld proxmox1

You don't need entries for other nodes of the cluster in /etc/hosts for cluster to work.

Your /etc/pve/.members must reflect that IP for each node currently in the cluster. If it doesn't, either restart pve-cluster and pveproxy services or restart the whole node. That will re-read /etc/hosts and apply that IP to the local node.
 
My /etc/hosts file had this

code_language.shell:
### Hetzner Online GmbH installimage
127.0.0.1 localhost.localdomain localhost
public.ip.1 proxmox1.xyz proxmox1
172.16.1.2 proxmox1.xyz proxmox1
172.16.1.3 proxmox2.xyz proxmox2

and
code_language.shell:
# cat /etc/pve/.members
{
"nodename": "proxmox1",
"version": 3,
"cluster": { "name": "test", "version": 1, "nodes": 1, "quorate": 1 },
"nodelist": {
  "proxmox1": { "id": 1, "online": 1, "ip": "public.ip.1"}
  }
}

I wound up moving the public.ip.1 line to two lines below in /etc/hosts, restarted pve-cluster and now
code_language.shell:
# cat /etc/pve/.members
{
"nodename": "proxmox1",
"version": 3,
"cluster": { "name": "test", "version": 1, "nodes": 1, "quorate": 1 },
"nodelist": {
  "proxmox1": { "id": 1, "online": 1, "ip": "172.16.1.2"}
  }
}

But I still have the same auth issue with adding a member. ssh still works fine. I restarted proxmox1 and proxmox2. It doesnt help. Any ideas?
 
/etc/hosts must have only one entry referencing the host, remove the entry with the public IP. Also, restart pve-proxy service in node 01, as the error you get is when connecting from node 02 to the API of node 01. Then, use webUI to join node 02 to the cluster using assited join.

You dont mention if you use custom SSL certificates (i.e. let's encrypt), but pvecm has issues when joining nodes if you are not using the default self-signed SSL certificate, because those custom certificates CN property only contain hostname.domain.tld but not the IP(s) of the host.
 
Last edited:
I believe we have made some progress but we are indeed at a certs problem now. I dont have any custom SSL certs. But after joining the second node, I could no longer log in. I reverted the /etc/hosts change to include public ip trying to troubleshoot and this time I could still join using assisted join. I could log into first node but not the second node. The webpage loads and login screen shows up. First node shows this error
'/etc/pve/nodes/proxmox2/pve-ssl.pem' does not exist! (500). Problem is identical to this thread https://forum.proxmox.com/threads/joining-cluster-pve-ssl-pem-error.131263/.

On proxmox1
code_language.shell:
# ls /etc/pve/nodes/
proxmox1  proxmox2

code_language.shell:
root@proxmox1 ~ # ls /etc/pve/nodes/proxmox2/
lxc  openvz  priv  qemu-server

On proxmox2 I dont see nodes folder at all
code_language.shell:
# ls /etc/pve/
corosync.conf  local  lxc  openvz  qemu-server

I tried
code_language.shell:
 # pvecm updatecerts
waiting for pmxcfs mount to appear and get quorate...
waiting for pmxcfs mount to appear and get quorate...
waiting for pmxcfs mount to appear and get quorate...
waiting for pmxcfs mount to appear and get quorate...
waiting for pmxcfs mount to appear and get quorate...
waiting for pmxcfs mount to appear and get quorate...
got timeout when trying to ensure cluster certificates and base file hierarchy is set up - no quorum (yet) or hung pmxcfs?

Ran some other t/s command and looks like another corosync instance is running??
code_language.shell:
# corosync -t
May 17 19:12:53.406 notice  [MAIN  ] Corosync Cluster Engine exiting normally
root@proxmox2 ~ # corosync -f
May 17 19:12:59.164 notice  [MAIN  ] Corosync Cluster Engine  starting up
May 17 19:12:59.164 info    [MAIN  ] Corosync built-in features: dbus monitoring watchdog systemd xmlconf vqsim nozzle snmp pie relro bindnow
May 17 19:12:59.166 error   [MAIN  ] Another Corosync instance is already running.
May 17 19:12:59.166 error   [MAIN  ] Corosync Cluster Engine exiting with status 18 at main.c:1601.
 
Last edited: