Problems creating proxmox cluster on hetzner

pdaniel · Mar 18, 2024

Hello,

I have a problem creating a proxmox cluster on hetzner bare metal machines. I tried everything, but the process is getting blocked for some reason. I will provide the details of my configuration bellow and the errors that I get.

Server 1 network config :

# network interface settings; autogenerated

# Please do NOT modify this file directly, unless you know what

# you're doing.

source /etc/network/interfaces.d/*
auto lo
iface lo inet loopback
iface lo inet6 loopback
iface eno1 inet manual
auto vmbr0
iface vmbr0 inet static
address xxx.xxx.xxx.xxx
netmask xxx.xxx.xxx.xxx
gateway xxx.xxx.xxx.xxx
bridge-ports eno1
bridge-stp off
bridge-fd 0
up sysctl -p
up sysctl -w net.ipv4.ip_forward=1
up sysctl -w net.ipv4.conf.eno1.send_redirects=0
up sysctl -w net.ipv6.conf.all.forwarding=1
post-up echo 2048 > /sys/class/net/vmbr0/bridge/hash_max
post-up echo 1 > /sys/class/net/vmbr0/bridge/multicast_snooping
post-up echo 0 > /proc/sys/net/ipv6/conf/vmbr0/accept_ra

#vlan between nodes
auto vmbr4001
iface vmbr4001 inet static
bridge_ports eno1.4001
bridge_stp off
bridge_fd 0
address 10.0.100.10
netmask 24
#COROSYNC1
iface eth0 inet manual
iface eth1 inet manual

in /etc/hosts for server 1 I have - 10.0.100.10 pve1.mydomain.mydomain pve1

Server 2 config

source /etc/network/interfaces.d/*

auto lo

iface lo inet loopback

iface lo inet6 loopback

iface enp4s0 inet manual

auto vmbr0
iface vmbr0 inet static
address xxx
netmask xxx
gateway xxx
bridge-ports enp4s0
bridge-stp off
bridge-fd 0
up sysctl -p
up sysctl -w net.ipv4.ip_forward=1
up sysctl -w net.ipv4.conf.enp4s0.send_redirects=0
up sysctl -w net.ipv6.conf.all.forwarding=1
post-up echo 2048 > /sys/class/net/vmbr0/bridge/hash_max
post-up echo 1 > /sys/class/net/vmbr0/bridge/multicast_snooping
post-up echo 0 > /proc/sys/net/ipv6/conf/vmbr0/accept_ra

#vlan between nodes
auto vmbr4001
iface vmbr4001 inet static
bridge_ports enp4s0.4001
bridge_stp off
bridge_fd 0
address 10.0.100.11
netmask 24

#COROSYNC1
iface eth0 inet manual

The clock is in sync and show the same values on both nodes.
I creat the cluster on server 1.

in /etc/hosts for server 2 I have - 10.0.100.11 pve2.mydomain.mydomain pve2

I am creating the cluster from command line :

root@pve2 ~ # pvecm create prod
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
LANGUAGE = (unset),
LC_ALL = (unset),
LC_CTYPE = "UTF-8",
LANG = "en_US.UTF-8"
are supported and installed on your system.
perl: warning: Falling back to a fallback locale ("en_US.UTF-8").
Corosync Cluster Engine Authentication key generator.
Gathering 2048 bits for key from /dev/urandom.
Writing corosync key to /etc/corosync/authkey.
Writing corosync config to /etc/pve/corosync.conf
Restart corosync and cluster filesystem

root@pve2 ~ # pvecm status
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
LANGUAGE = (unset),
LC_ALL = (unset),
LC_CTYPE = "UTF-8",
LANG = "en_US.UTF-8"
are supported and installed on your system.
perl: warning: Falling back to a fallback locale ("en_US.UTF-8").
Cluster information
-------------------
Name: prod
Config Version: 1
Transport: knet
Secure auth: on

Quorum information
------------------
Date: Mon Mar 18 21:05:47 2024
Quorum provider: corosync_votequorum
Nodes: 1
Node ID: 0x00000001
Ring ID: 1.5
Quorate: Yes

Votequorum information
----------------------
Expected votes: 1
Highest expected: 1
Total votes: 1
Quorum: 1
Flags: Quorate

Membership information
----------------------
Nodeid Votes Name
0x00000001 1 10.0.100.11 (local)

Now i try to connect a node. First i try via cli :

pvecm add 10.0.100.11 - this command fails with :

500 Can't connect to 10.0.100.11:8006 (hostname verification failed)
end task UPID

ve5:00000BF0:00011642:65F89F26:clusterjoin::root@pam: 500 Can't connect to 10.0.100.11:8006 (hostname verification failed)

Then i try from the GUI using join cluster. First i get :

Establishing API connection with host '10.0.100.11'
Login succeeded.
check cluster join API version
Request addition of this node
Join request OK, finishing setup locally
stopping pve-cluster service

Then on the node i get : permission denied - invalid PVE ticket (401) and now i cannot access the proxmox GUI, i cannot login. I can access the server via ssh but not using the ssh key, using the password. So, something overwrite my ssh key on the node.

An on the master, i get the error : '/etc/pve/nodes/pve5/pve-ssl.pem' does not exist! (500) because the files are not in /etc/pve...

What can i do to fix this issue? Any ideas, tried everything from the internet.

regards.

pdaniel · Mar 19, 2024

Anyone? Is proxmox a good option for production use? This error seems to appear randomly for people. This makes me think that proxmox is not really ready for production use or at least not stable enough.

gfngfn256 · Mar 19, 2024

pdaniel said:
pvecm add 10.0.100.11

Have you tried with fingerprint?
Also see this.

pdaniel · Mar 19, 2024

It seems that i pass the fingerprint issue, but it stucks at waiting for quorum and in GUI I get '/etc/pve/nodes/pve2/pve-ssl.pem' does not exist! (500)

Please enter superuser (root) password for 'pve3': **************

Establishing API connection with host 'pve3'

The authenticity of host 'pve3' can't be established.

X509 SHA256 key fingerprint is F0:......

Are you sure you want to continue connecting (yes/no)? yes

Login succeeded.

check cluster join API version

No cluster network links passed explicitly, fallback to local node IP 'xxxxx'

Request addition of this node

Join request OK, finishing setup locally

stopping pve-cluster service

backup old database to '/var/lib/pve-cluster/backup/config-1710871331.sql.gz'

waiting for quorum...

gfngfn256 · Mar 19, 2024

pdaniel said:
The authenticity of host 'pve3' can't be established.

Thought your using pve1 & pve2?

Haven't got any more time now - sorry. I think I've got more to tell you about setup. Another day.

pdaniel · Mar 19, 2024

i am using pve3 and pve1 now, i have 3 servers with hetzner. I tested multiple configurations.

clayrisser · Jul 1, 2024

I'm getting this same exact error. Anyone figure out the problem yet?

mabcastillo · Jul 2, 2024

I have installed a proxmox cluster with different sizes and machine types using hetzner.
They work as a charm
I used the vswitch option to create an internal network and I used this to connect the cluster.
Also remember that the firewall for each bare metal server needs to be correctly configured in order to let the cluster connections to function correctly

gfngfn256 · Jul 2, 2024

clayrisser said:
I'm getting this same exact error

Surprising you know you have the exact problem as the OP. The OP didn't really provide consistent readable info - just a bunch of error messages, which could happen for a whole range of issues. What was apparent from his info, that something was up with his node naming convention, he starts with pve1 & pve2, error line then shows pve5, then in subsequent post he goes pve3. Yes he's test/trying (maybe part-reinstalling?) with different names, but I'm pretty sure in the end his config/s are mismatched (both in node & cluster-wide).

While trying to setup a PVE cluster - firstly setup each node with a naming convention that is consistent AND DON'T CHANGE THAT NAME EVER. Secondly don't try a "bunch of things from the internet"; copying & pasting random configs/scripts from the web is a sure way to break any system. Thirdly, before joining any node to a cluster, make sure it is perfectly running & accessible & has the correct usable (tested) storage setup BEFORE even attempting to join it to the cluster.

One final thought, the OP's problem occurred on the 18th-19th of March (more than 3 months ago), since then he has been AWOL from this thread, we can pretty much assume he worked out his problems, probably by just starting again from scratch with a consistent & correct install. He should have reported back on his success or even failure, so that he could "help" subsequent posters with "the same problem".

billedwardz · Jul 2, 2024

gfngfn256 said:
Surprising you know you have the exact problem as the OP. The OP didn't really provide consistent readable info - just a bunch of error messages, which could happen for a whole range of issues. What was apparent from his info, that something was up with his node naming convention, he starts with pve1 & pve2, error line then shows pve5, then in subsequent post he goes pve3. Yes he's test/trying (maybe part-reinstalling?) with different names, but I'm pretty sure in the end his config/s are mismatched (both in node & cluster-wide).

While trying to setup a PVE cluster - firstly setup each node with a naming convention that is consistent AND DON'T CHANGE THAT NAME EVER. Secondly don't try a "bunch of things from the internet"; copying & pasting random configs/scripts from the web is a sure way to break any system. Thirdly, before joining any node to a cluster, make sure it is perfectly running & accessible & has the correct usable (tested) storage setup BEFORE even attempting to join it to the cluster.

One final thought, the OP's problem occurred on the 18th-19th of March (more than 3 months ago), since then he has been AWOL from this thread, we can pretty much assume he worked out his problems, probably by just starting again from scratch with a consistent & correct install. He should have reported back on his success or even failure, so that he could "help" subsequent posters with "the same problem".

I'm imagining that "the world if" meme and it's saying "the world if Proxmox wasn't irrevocably busted by changing the hostname." I feel like it's every couple years I'm playing around with a proxmox install and I think maybe I can get around that proxmox hostname rule.

clayrisser · Jul 2, 2024

@gfngfn256 I never said I had the "exact problem as the OP". I said I'm "getting this same exact error", which is 100% true. I am getting the error : '/etc/pve/nodes/pve5/pve-ssl.pem' does not exist! (500).

Also a bit ironic that you ended your statement with an assumption that he "worked out his problems". There are many reasons for not following up on an issue beyond "working out the problem". Such reasons could include giving up, moving on to another technology, changing hosting providers, etc... all of which do not address the original problem.

I setup each node with a naming convention that is consistent, pve1 and pve2. I never changed my hostnames.
I did not copy & paste random configs/scripts.
I made sure the cluster is perfectly running & accessible & has the correct usable storage setup before I attempted to join it to the cluster.
I was even able to ssh into pve1 from pve2 and ssh into pve2 from pve1.
My firewall is 100% wide open, so that's definitely not the issue.

clayrisser · Jul 2, 2024

I'm using a vSwitch on hetzner. Here are more details regarding my networking and configuration. I'm using the vmbr1 network for the cluster.

pve1:/etc/hosts
127.0.0.1 localhost.localdomain localhost
192.168.1.10 pve1.local pve1
192.168.1.11 pve2.local pve2
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts

pve1:/etc/network/interfaces
auto lo
iface lo inet loopback
iface lo inet6 loopback

auto eno1
iface eno1 inet manual

auto vmbr0
iface vmbr0 inet static
address 23.88.4.224/32
gateway 23.88.4.193
bridge-ports eno1
bridge-stp off
bridge-fd 0

auto eno1.4000
iface eno1.4000 inet manual

auto vmbr1
iface vmbr1 inet static
address 192.168.1.10/24
bridge-ports eno1.4000
bridge-stp off
bridge-fd 0
mtu 1400

auto eno1.4001
iface eno1.4001 inet manual

auto vmbr2
iface vmbr2 inet static
address 172.16.0.1/16
bridge-ports eno1.4001
bridge-stp off
bridge-fd 0
mtu 1400
post-up iptables -t nat -A POSTROUTING -s '172.16.0.0/16' -o vmbr0 -j MASQUERADE
post-down iptables -t nat -D POSTROUTING -s '172.16.0.0/16' -o vmbr0 -j MASQUERADE
post-up iptables -t raw -I PREROUTING -i fwbr+ -j CT --zone 1
post-down iptables -t raw -D PREROUTING -i fwbr+ -j CT --zone 1

pve2:/etc/hosts
127.0.0.1 localhost.localdomain localhost
192.168.1.10 pve1.local pve1
192.168.1.11 pve2.local pve2

# The following lines are desirable for IPv6 capable hosts

::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts

pve2:/etc/network/interfaces
auto lo
iface lo inet loopback
iface lo inet6 loopback

auto enp7s0
iface enp7s0 inet manual

auto vmbr0
iface vmbr0 inet static
address 144.76.159.26/32
gateway 144.76.159.1
bridge-ports enp7s0
bridge-stp off
bridge-fd 0

auto enp7s0.4000
iface enp7s0.4000 inet manual

auto vmbr1
iface vmbr1 inet static
address 192.168.1.11/24
bridge-ports enp7s0.4000
bridge-stp off
bridge-fd 0
mtu 1400

auto enp7s0.4001
iface enp7s0.4001 inet manual

auto vmbr2
iface vmbr2 inet static
address 172.16.0.1/16
bridge-ports enp7s0.4001
bridge-stp off
bridge-fd 0
mtu 1400
post-up iptables -t nat -A POSTROUTING -s '172.16.0.0/16' -o vmbr0 -j MASQUERADE
post-down iptables -t nat -D POSTROUTING -s '172.16.0.0/16' -o vmbr0 -j MASQUERADE
post-up iptables -t raw -I PREROUTING -i fwbr+ -j CT --zone 1
post-down iptables -t raw -D PREROUTING -i fwbr+ -j CT --zone 1

gfngfn256 · Jul 2, 2024

AFAIK, as of now each /etc/hosts on each node should only contain its own (node) details. See this post.

Search

Search

Problems creating proxmox cluster on hetzner

pdaniel

New Member

pdaniel

New Member

gfngfn256

Renowned Member

pdaniel

New Member

gfngfn256

Renowned Member

pdaniel

New Member

clayrisser

New Member

mabcastillo

New Member

gfngfn256

Renowned Member

billedwardz

New Member

clayrisser

New Member

clayrisser

New Member

gfngfn256

Renowned Member