Hello there, I hope to find some kind of solution because this is driving me crazy after ~1 week of different tests.
I have two servers at Hetzner with public IP, and also two VLANs. I post the latest one I have, but I tested a lot other combinations (like only interfaces, only bridges ..).
With this config I have full connectivity between cloud machines (no proxmox), robot machines (proxmox) and VM within robot machines.
Firewall is now disabled both at cluster and host level, but I also tried to enable it (defaults) and to enable it with custom rules. ALL FAILED
I already tested like the 1000 other threads, videos, posts and comments the simplest "GUI, create + join, super easy, see?". ALL FAILED
Somewhere out there I found that Hetzner may have something related to multicast, but the golden unicorn in this cases is to use "udpu" so unicast is used. ALL FAILED
I also found that maybe "two_nodes" or "netmtu" can help. ALL FAILED
I tried to join via public IP, 200 range and 250 range. ALL FAILED
I tested via GUI, via default pvecm commands a via customized pvecm (-link0 -nodeid). ALL FAILED
In the case I'm posting there is also more weight given to the first node, so it can fake a quorum by itself. ALL FAILED
IT NEVER EVER EVER EVER WORKED, EVERY SINGLE TIME IT FAILS AS FOLLOWS:
If joining via GUI:
- On root1 interface: root2 is seen, but always with red X. Nothing can be used but the console.
- On root2 interface: some log while the initial process is shown, and after that, a "invalid ticket 401" appears
- Depending my modifications, the gui is just broken (login always fails) or directly not reachable.
If joining via command:
And the fact all the documentation states that once the attempt is done, the server is kind of broken, does not help at all. Server 1 is reachable, but server 2 becomes half-dead.
What I'm missing !?
Thanks in advance,
CieNTi
I have two servers at Hetzner with public IP, and also two VLANs. I post the latest one I have, but I tested a lot other combinations (like only interfaces, only bridges ..).
Code:
auto enp4s0
iface enp4s0 inet static
address [PUBLIC IP]/32
gateway [GATEWAY]
pointtopoint [GATEWAY]
hwaddress [MAC]
pre-up ebtables -t nat -A POSTROUTING -j snat --to-src [MAC] -o enp4s0
post-down ebtables -t nat -D POSTROUTING -j snat --to-src [MAC] -o enp4s0
# Main interface (Public IP) - IPv4
auto enp4s0.4025
iface enp4s0.4025 inet static
address 10.250.10.1/17
mtu 1400
post-up ip route add 10.250.128.0/17 via 10.250.0.1 dev enp4s0.4025
pre-down ip route del 10.250.128.0/17 via 10.250.0.1 dev enp4s0.4025
post-up iptables -t nat -A POSTROUTING -j SNAT -s 10.200.0.0/16 -o enp4s0.4025 --to 10.250.10.1
pre-down iptables -t nat -D POSTROUTING -j SNAT -s 10.200.0.0/16 -o enp4s0.4025 --to 10.250.10.1
# Management interface (VLAN 4025)
auto enp4s0.4020
iface enp4s0.4020 inet manual
mtu 1400
# Servers/VMs interface (VLAN 4020)
auto vmbr0
iface vmbr0 inet static
address 10.200.10.1/17
bridge-ports enp4s0.4020
bridge_waitport 0
bridge-stp off
bridge-fd 0
mtu 1400
up ip route add 10.200.128.0/17 via 10.200.0.1 dev vmbr0
down ip route del 10.200.128.0/17 via 10.200.0.1 dev vmbr0
post-up iptables -t nat -A POSTROUTING -j SNAT -s 10.200.0.0/16 -o enp4s0 --to [PUBLIC IP]
post-down iptables -t nat -D POSTROUTING -j SNAT -s 10.200.0.0/16 -o enp4s0 --to [PUBLIC IP]
# Servers/VMs bridge (VLAN 4020)
Code:
logging {
debug: off
to_syslog: yes
}
nodelist {
node {
name: root1
nodeid: 1
quorum_votes: 2
ring0_addr: 10.250.10.1
}
node {
name: root2
nodeid: 2
quorum_votes: 1
ring0_addr: 10.250.10.2
}
}
quorum {
expected_votes: 2
provider: corosync_votequorum
two_nodes: 2
}
totem {
cluster_name: RooT
config_version: 6
crypto_cipher: none
crypto_hash: none
interface {
linknumber: 0
}
ip_version: ipv4
link_mode: passive
secauth: on
transport: udpu
netmtu: 1400
version: 2
}
Code:
127.0.0.1 localhost.localdomain localhost
10.250.10.1 root1.[DOMAIN] root1
10.250.10.2 root2.[DOMAIN] root2
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts
With this config I have full connectivity between cloud machines (no proxmox), robot machines (proxmox) and VM within robot machines.
Firewall is now disabled both at cluster and host level, but I also tried to enable it (defaults) and to enable it with custom rules. ALL FAILED
I already tested like the 1000 other threads, videos, posts and comments the simplest "GUI, create + join, super easy, see?". ALL FAILED
Somewhere out there I found that Hetzner may have something related to multicast, but the golden unicorn in this cases is to use "udpu" so unicast is used. ALL FAILED
I also found that maybe "two_nodes" or "netmtu" can help. ALL FAILED
I tried to join via public IP, 200 range and 250 range. ALL FAILED
I tested via GUI, via default pvecm commands a via customized pvecm (-link0 -nodeid). ALL FAILED
In the case I'm posting there is also more weight given to the first node, so it can fake a quorum by itself. ALL FAILED
IT NEVER EVER EVER EVER WORKED, EVERY SINGLE TIME IT FAILS AS FOLLOWS:
If joining via GUI:
- On root1 interface: root2 is seen, but always with red X. Nothing can be used but the console.
- On root2 interface: some log while the initial process is shown, and after that, a "invalid ticket 401" appears
- Depending my modifications, the gui is just broken (login always fails) or directly not reachable.
If joining via command:
Code:
root@root2:~# pvecm add root1 -link0 10.250.10.2 -nodeid 2
Please enter superuser (root) password for 'root1': *************
Establishing API connection with host 'root1'
The authenticity of host 'root1' can't be established.
X509 SHA256 key fingerprint is 3E:CB:58:13:59:67:B6:1D:AF:DF:F1:65:48:EC:03:24:7A:01:A6:54:E5:A3:50:00:2A:2C:D8:8A:45:16:14:08.
Are you sure you want to continue connecting (yes/no)? yes
Login succeeded.
check cluster join API version
Request addition of this node
Join request OK, finishing setup locally
stopping pve-cluster service
backup old database to '/var/lib/pve-cluster/backup/config-1662960617.sql.gz'
waiting for quorum...
And the fact all the documentation states that once the attempt is done, the server is kind of broken, does not help at all. Server 1 is reachable, but server 2 becomes half-dead.
Code:
Sep 12 08:11:40 root1 corosync[1616]: [MAIN ] Corosync Cluster Engine 3.1.5 starting up
Sep 12 08:11:40 root1 corosync[1616]: [MAIN ] Corosync built-in features: dbus monitoring watchdog systemd xmlconf vqsim nozzle snmp pie relro bindnow
Sep 12 08:11:40 root1 corosync[1616]: [TOTEM ] Initializing transport (UDP/IP Unicast).
Sep 12 08:11:40 root1 corosync[1616]: [TOTEM ] The network interface [10.250.10.1] is now up.
Sep 12 08:11:40 root1 corosync[1616]: [SERV ] Service engine loaded: corosync configuration map access [0]
Sep 12 08:11:40 root1 corosync[1616]: [QB ] server name: cmap
Sep 12 08:11:40 root1 corosync[1616]: [SERV ] Service engine loaded: corosync configuration service [1]
Sep 12 08:11:40 root1 corosync[1616]: [QB ] server name: cfg
Sep 12 08:11:40 root1 corosync[1616]: [SERV ] Service engine loaded: corosync cluster closed process group service v1.01 [2]
Sep 12 08:11:40 root1 corosync[1616]: [QB ] server name: cpg
Sep 12 08:11:40 root1 corosync[1616]: [SERV ] Service engine loaded: corosync profile loading service [4]
Sep 12 08:11:40 root1 corosync[1616]: [SERV ] Service engine loaded: corosync resource monitoring service [6]
Sep 12 08:11:40 root1 corosync[1616]: [WD ] Watchdog not enabled by configuration
Sep 12 08:11:40 root1 corosync[1616]: [WD ] resource load_15min missing a recovery key.
Sep 12 08:11:40 root1 corosync[1616]: [WD ] resource memory_used missing a recovery key.
Sep 12 08:11:40 root1 corosync[1616]: [WD ] no resources configured.
Sep 12 08:11:40 root1 corosync[1616]: [SERV ] Service engine loaded: corosync watchdog service [7]
Sep 12 08:11:40 root1 corosync[1616]: [QUORUM] Using quorum provider corosync_votequorum
Sep 12 08:11:40 root1 corosync[1616]: [QUORUM] This node is within the primary component and will provide service.
Sep 12 08:11:40 root1 corosync[1616]: [QUORUM] Members[0]:
Sep 12 08:11:40 root1 corosync[1616]: [SERV ] Service engine loaded: corosync vote quorum service v1.0 [5]
Sep 12 08:11:40 root1 corosync[1616]: [QB ] server name: votequorum
Sep 12 08:11:40 root1 corosync[1616]: [SERV ] Service engine loaded: corosync cluster quorum service v0.1 [3]
Sep 12 08:11:40 root1 corosync[1616]: [QB ] server name: quorum
Sep 12 08:11:40 root1 corosync[1616]: [TOTEM ] Configuring link 0
Sep 12 08:11:40 root1 corosync[1616]: [TOTEM ] adding new UDPU member {10.250.10.1}
Sep 12 08:11:40 root1 corosync[1616]: [TOTEM ] adding new UDPU member {10.250.10.2}
Sep 12 08:11:40 root1 corosync[1616]: [QUORUM] Sync members[1]: 1
Sep 12 08:11:40 root1 corosync[1616]: [QUORUM] Sync joined[1]: 1
Sep 12 08:11:40 root1 corosync[1616]: [TOTEM ] A new membership (1.23) was formed. Members joined: 1
Sep 12 08:11:40 root1 corosync[1616]: [QUORUM] Members[1]: 1
Sep 12 08:11:40 root1 corosync[1616]: [MAIN ] Completed service synchronization, ready to provide service.
Code:
Sep 12 08:14:59 root2 corosync[1570]: [MAIN ] Corosync Cluster Engine 3.1.5 starting up
Sep 12 08:14:59 root2 corosync[1570]: [MAIN ] Corosync built-in features: dbus monitoring watchdog systemd xmlconf vqsim nozzle snmp pie relro bindnow
Sep 12 08:14:59 root2 corosync[1570]: [TOTEM ] Initializing transport (UDP/IP Unicast).
Sep 12 08:14:59 root2 corosync[1570]: [TOTEM ] The network interface [10.250.10.2] is now up.
Sep 12 08:14:59 root2 corosync[1570]: [SERV ] Service engine loaded: corosync configuration map access [0]
Sep 12 08:14:59 root2 corosync[1570]: [QB ] server name: cmap
Sep 12 08:14:59 root2 corosync[1570]: [SERV ] Service engine loaded: corosync configuration service [1]
Sep 12 08:14:59 root2 corosync[1570]: [QB ] server name: cfg
Sep 12 08:14:59 root2 corosync[1570]: [SERV ] Service engine loaded: corosync cluster closed process group service v1.01 [2]
Sep 12 08:14:59 root2 corosync[1570]: [QB ] server name: cpg
Sep 12 08:14:59 root2 corosync[1570]: [SERV ] Service engine loaded: corosync profile loading service [4]
Sep 12 08:14:59 root2 corosync[1570]: [SERV ] Service engine loaded: corosync resource monitoring service [6]
Sep 12 08:14:59 root2 corosync[1570]: [WD ] Watchdog not enabled by configuration
Sep 12 08:14:59 root2 corosync[1570]: [WD ] resource load_15min missing a recovery key.
Sep 12 08:14:59 root2 corosync[1570]: [WD ] resource memory_used missing a recovery key.
Sep 12 08:14:59 root2 corosync[1570]: [WD ] no resources configured.
Sep 12 08:14:59 root2 corosync[1570]: [SERV ] Service engine loaded: corosync watchdog service [7]
Sep 12 08:14:59 root2 corosync[1570]: [QUORUM] Using quorum provider corosync_votequorum
Sep 12 08:14:59 root2 corosync[1570]: [SERV ] Service engine loaded: corosync vote quorum service v1.0 [5]
Sep 12 08:14:59 root2 corosync[1570]: [QB ] server name: votequorum
Sep 12 08:14:59 root2 corosync[1570]: [SERV ] Service engine loaded: corosync cluster quorum service v0.1 [3]
Sep 12 08:14:59 root2 corosync[1570]: [QB ] server name: quorum
Sep 12 08:14:59 root2 corosync[1570]: [TOTEM ] Configuring link 0
Sep 12 08:14:59 root2 corosync[1570]: [TOTEM ] adding new UDPU member {10.250.10.1}
Sep 12 08:14:59 root2 corosync[1570]: [TOTEM ] adding new UDPU member {10.250.10.2}
Sep 12 08:14:59 root2 corosync[1570]: [QUORUM] Sync members[1]: 2
Sep 12 08:14:59 root2 corosync[1570]: [QUORUM] Sync joined[1]: 2
Sep 12 08:14:59 root2 corosync[1570]: [TOTEM ] A new membership (2.2d) was formed. Members joined: 2
Sep 12 08:14:59 root2 corosync[1570]: [QUORUM] Members[1]: 2
Sep 12 08:14:59 root2 corosync[1570]: [MAIN ] Completed service synchronization, ready to provide service.
Code:
proxmox-ve: 7.2-1 (running kernel: 5.15.53-1-pve)
pve-manager: 7.2-7 (running version: 7.2-7/d0dd0e85)
pve-kernel-5.15: 7.2-10
pve-kernel-helper: 7.2-10
pve-kernel-5.15.53-1-pve: 5.15.53-1
pve-kernel-5.15.30-2-pve: 5.15.30-3
ceph-fuse: 15.2.16-pve1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve1
libproxmox-acme-perl: 1.4.2
libproxmox-backup-qemu0: 1.3.1-1
libpve-access-control: 7.2-4
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.2-2
libpve-guest-common-perl: 4.1-2
libpve-http-server-perl: 4.1-3
libpve-storage-perl: 7.2-8
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.0-3
lxcfs: 4.0.12-pve1
novnc-pve: 1.3.0-3
proxmox-backup-client: 2.2.5-1
proxmox-backup-file-restore: 2.2.5-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.5.1
pve-cluster: 7.2-2
pve-container: 4.2-2
pve-docs: 7.2-2
pve-edk2-firmware: 3.20220526-1
pve-firewall: 4.2-6
pve-firmware: 3.5-1
pve-ha-manager: 3.4.0
pve-i18n: 2.7-2
pve-qemu-kvm: 7.0.0-3
pve-xtermjs: 4.16.0-1
qemu-server: 7.2-4
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.7.1~bpo11+1
vncterm: 1.7-1
zfsutils-linux: 2.1.5-pve1
What I'm missing !?
Thanks in advance,
CieNTi
Last edited: