node not joining

Jord · Jan 21, 2022

Hi,

Consider the following situation:

Server 1 in location 1
10.10.10.14 in DMZ
PMG community edition: wanna-be master in cluster yet to create
Stock PMG installation, with LetsEncrypt
+ zabbix
+ openvpn (server tap bridge mode)
+ port 22219 in ssh. Login through password disabled
+ iptables

Server 2 in location 2
192.168.2.24 in DMZ
10.10.10.16 on tap VPN
PMG community edition: wanna-be member in cluster yet to create
+ zabbix
+ openvpn (tap client)
+ port 22219 in ssh. Login through password disabled

IP tables rules.v4 of server 1 (server 2 has no iptables yet)

root@pmg:~# iptables -L -n
Chain INPUT (policy DROP)
target prot opt source destination
REJECT all — IP.OF.BAD.GUY 0.0.0.0/0 reject-with icmp-port-unreachable
...
REJECT all — IP.OF.BAD.GUY 0.0.0.0/0 reject-with icmp-port-unreachable
ACCEPT all -- 0.0.0.0/0 0.0.0.0/0 state RELATED,ESTABLISHED
ACCEPT all -- 0.0.0.0/0 0.0.0.0/0
ACCEPT icmp -- 0.0.0.0/0 0.0.0.0/0 icmptype 8
ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpt:22219
ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpt:80
ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpt:25
ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpt:26
ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpt:10050
ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpt:8006
ACCEPT udp -- 0.0.0.0/0 0.0.0.0/0 udp dpt:1194

Chain FORWARD (policy DROP)
target prot opt source destination

Chain OUTPUT (policy ACCEPT)
target prot opt source destination

Both sites are behind a firewall with DNAT & SNAT
Server 1 DNAT tcp 80,25,8006,22219 & udp 1194
Server 2 DNAT tcp 80,25,8006,22219

Id_rsa.pub are in each others authorized_keys. Servers can SSH into each other through pub ip & through vpn ip in both directions.

Issue
Started cluster on Server 1 —> join on server 2 with 10.10.10.14 , passwd & fingerprint —> dialog closes immediately, no cluster though.

Any ideas?
TIA
Jord

Jord · Jan 22, 2022

Changing ssh back to ports 22 on both servers (and iptables on server 1) changes nothing

Jord · Jan 23, 2022

Using CLI:

root@pmg2:~# pmgcm join 10.10.10.14 --fingerprint 2B:58:3A:6F:82:A6:F8:55:94:C6:06:56:5B:94:92:C5:8E:66:45:2B:2F:69:F0:1A:FB:6F:6E:CD:12:C2:9D:C8

cluster join failed: 401 permission denied - invalid PMG ticket

Jord · Jan 23, 2022

okay, so I presumed an error with fingerprint, so went for reinstalling making sure certificates were kept pristine.

Did a clean re-install of server 1 & 2.
Didn't touch any certificates in /etc/ssh or /root/.ssh at all
Restore from backup on server 1 (server 2 was blank anyway)
Reinstalled vpn-tunnel

root@pmg2:~# pmgcm join 10.10.10.14 --fingerprint D2:6B:2E:3C:C7:46:69:13:9C:AC:98:2D:55:98:5B:06:67:E6:33:42:A9:A1:E1:81:22:56:41:27:BF:71:49:E9

cluster join failed: 401 permission denied - invalid PMG ticket

Could use some help here....
TIA
Jord

Stoiko Ivanov · Jan 24, 2022

Is the time in sync across both nodes?

else - please post the journal of both nodes during the attempt to join

I hope this helps!

Jord · Jan 24, 2022

Time sync is correct.

root@pmg:~# ntpdate ntp.belnet.be
24 Jan 11:47:47 ntpdate[12051]: adjust time server 193.190.198.10 offset +0.003148 sec

root@pmg2:~# ntpdate ntp.belnet.be
24 Jan 11:47:29 ntpdate[7601]: adjust time server 193.190.198.10 offset +0.000521 sec

During attempt to join I get on Server 1 (master):
/var/log/pmgproxy/pmgproxy.log
::ffff:10.10.10.16 - - [24/01/2022:11:57:37 +0100] "POST /api2/json/access/ticket HTTP/1.1" 200 565
::ffff:10.10.10.16 - - [24/01/2022:11:57:37 +0100] "POST /api2/json/config/cluster/nodes HTTP/1.1" 401 -

/var/log/syslog
Jan 24 11:59:55 pmg pmgdaemon[842]: successful auth for user 'root@pam'

Is there another log I should check?

TIA
Jord

Stoiko Ivanov · Jan 24, 2022

Jord said:
/var/log/pmgproxy/pmgproxy.log
::ffff:10.10.10.16 - - [24/01/2022:11:57:37 +0100] "POST /api2/json/access/ticket HTTP/1.1" 200 565
::ffff:10.10.10.16 - - [24/01/2022:11:57:37 +0100] "POST /api2/json/config/cluster/nodes HTTP/1.1" 401 -

/var/log/syslog
Jan 24 11:59:55 pmg pmgdaemon[842]: successful auth for user 'root@pam'

the successful authentication is 2 minutes after the cluster-join call?

could you please post
* the /etc/pmg/cluster.conf (of both systems if existent)
* the output of pmgcm join-cmd on the master node

Jord · Jan 24, 2022

Stoiko Ivanov said:
the successful authentication is 2 minutes after the cluster-join call?

No,

I tailed one log at a time and performed two attempts to join.

I did one join again and have the following simultaneous logs:
::ffff:10.10.10.16 - - [24/01/2022:13:10:21 +0100] "POST /api2/json/access/ticket HTTP/1.1" 200 565
::ffff:10.10.10.16 - - [24/01/2022:13:10:21 +0100] "POST /api2/json/config/cluster/nodes HTTP/1.1" 401 -
&
Jan 24 13:10:21 pmg pmgdaemon[856]: successful auth for user 'root@pam'

/etc/pmg/cluster.conf is only present on Server 1:

master: 1
    fingerprint D2:6B:2E:3C:C7:46:69:13:9C:AC:98:2D:55:98:5B:06:67:E6:33:42:A9:A1:E1:81:22:56:41:27:BF:71:49:E9
    hostrsapubkey AAAAB3NzaC1yc2EAAAADAQABAAABgQDCd3oo3xB7wri2u7FMzYMR6JeIaUS6UqaegdWumm/KnLkxZd70ejziDU15FIGbgRwMOdhwJxcyDPcAyTuSqroVJos8OGtMaIHaDmHBy8xbCdQ38YtZ9eM4RNaWkBdyvoxxnZQw3iFvIv4EozFUvy4oeC0cWRsm41IcONrk0QpDwXre7eRRRhjf8l7XkVN+kaeedlZqldpKaFLY7lojndXFr5toocQ4fKc/X3s9RPi4XUq3vZ5zUBeylugPls3Gg9UVMpu4V6r8zST4MwYVxze/CMrjQS5Sxgvw2FEzWG0fXzi9wHA1bgTpn6BfHgvmA/dsV0lIrneEZVBflkytTNCQwUKK2TjIgkxysCDz95bhJmsEikGXzrituiXDVA5DHpDWfTKmjK52A38FACSKjP1Uuq3XA1iI9bzIp6XxZtGFgo7SUKlfEIoOs4VlEQe8cJmvR6tngFB0yo9aVR9IsgvwqbkyJGi0g4LShwwGWaH5XRa0QJSud9ZUs2ooiaVuTDs=
    ip 10.10.10.14
    maxcid 1
    name pmg
    rootrsapubkey AAAAB3NzaC1yc2EAAAADAQABAAABAQDE1KGFbYu0HpdEFGY/pak4EOVJTOxoMSRH5yO980mezAjgnk7AUEdcnkksN25v5QoRAPDeQi+kl/fO1/C9pUdcRvRAzys/Ewy+9+3+mmon5GS+r7/Wht5n9BqCR4IHPGFQsMU9IHPU98D6LcvxG8Al8nWTs+HFjV0ReFUj3w3sSeSjjk4tr+V29He1Rg3/OEzx8xM1uIPTYLEfxx2MQuPzJmacpPEjoQMQYh6gW0w9axTa4Q1x0bBzvPirMTb/gkN2tLW8m8FHnsA0YnB0EVzzcfQm8TWobiCUaanDg8c5/ZZChlA/MFDJquHe4Gi6Ax3jeroOlHIwmhgsHa9GyTXj

root@pmg:~# pmgcm join-cmd
pmgcm join 10.10.10.14 --fingerprint D2:6B:2E:3C:C7:46:69:13:9C:AC:98:2D:55:98:5B:06:67:E6:33:42:A9:A1:E1:81:22:56:41:27:BF:71:49:E9

TIA
Jord

Stoiko Ivanov · Jan 24, 2022

Do you have TFA (two/multifactor authentication) enabled on you PMG (mostly the master node?)
else I don't see where this could go wrong - I assume that both nodes can reach port 8006 (pmgproxy) directly - without any other piece of software in between?

Jord · Jan 24, 2022

Yes! Disabling TFA did the trick!

Node joined the cluster with issues though:

root@pmg2:~# pmgcm join 10.10.10.14 --fingerprint D2:6B:2E:3C:C7:46:69:13:9C:AC:98:2D:55:98:5B:06:67:E6:33:42:A9:A1:E1:81:22:56:41:27:BF:71:49:E9

stop all services accessing the database
save new cluster configuration
cluster node successfully joined
updated /etc/pmg/cluster.conf
updated /etc/pmg/pmg-authkey.key
updated /etc/pmg/pmg-authkey.pub
updated /etc/pmg/pmg-csrf.key
updated /etc/pmg/user.conf
updated /etc/pmg/tfa.json
updated /etc/pmg/domains
updated /etc/pmg/mynetworks
updated /etc/pmg/transport
updated /etc/pmg/tls_policy
updated /etc/pmg/pmg.conf
copying master database from '10.10.10.14'
copying master database finished (got 247524 bytes)
delete local database
could not change directory to "/root": Permission denied
create new local database
could not change directory to "/root": Permission denied
insert received data into local database
creating indexes
run analyze to speed up database queries
could not change directory to "/root": Permission denied
ANALYZE
could not change directory to "/root": Permission denied
could not change directory to "/root": Permission denied
could not change directory to "/root": Permission denied
could not change directory to "/root": Permission denied
could not change directory to "/root": Permission denied
could not change directory to "/root": Permission denied
could not change directory to "/root": Permission denied
could not change directory to "/root": Permission denied
could not change directory to "/root": Permission denied
could not change directory to "/root": Permission denied
nextserver: Bootstrap discovery failed. Giving up.
could not change directory to "/root": Permission denied
could not change directory to "/root": Permission denied
syncing quarantine data
syncing quarantine data finished

Cluster administration on Server 1 gives "communication failure (0)" although the rest of the WebUI seems OK.
Cluster administration on Server 2 gives correct cluster overview.

Halfway there ?

Stoiko Ivanov · Jan 24, 2022

Jord said:
Yes! Disabling TFA did the trick!

nice

Jord said:
could not change directory to "/root": Permission denied

these messages are harmless (although we'll try to get rid of them soon, since they do cause confusion)

Jord said:
nextserver: Bootstrap discovery failed. Giving up.

I think this is due to missing firewall-allow rules for razor

Jord said:
Cluster administration on Server 1 gives "communication failure (0)" although the rest of the WebUI seems OK.

would also guess on a hunch that this might be related to missing firewall policies or pmgproxy not listening on the openvpn interface
again the journal from both nodes (over the period of 2 Minutes) might give some hints

Jord · Jan 25, 2022

Stoiko Ivanov said:
I think this is due to missing firewall-allow rules for razor

I can't verify, but after your message I allowed all outgoing traffic on Server's 2 firewall.

Stoiko Ivanov said:
would also guess on a hunch that this might be related to missing firewall policies or pmgproxy not listening on the openvpn interface

Indeed, I had to add a static rule to /etc/network/interfaces on Server 1

up ip route add 192.168.2.24/32 via 10.10.10.16 dev br0

I think I have everything running as of now.

One issue remains though: as I intended to use TFA, the root password is fairly simple.
1. will re-enabling TFA break the cluster?
or
2. if TFA breaks the cluster, will modifying the password break the cluster?

Thanks for all the help so far.
TIA
Jord

Stoiko Ivanov · Jan 25, 2022

Jord said:
One issue remains though: as I intended to use TFA, the root password is fairly simple.

* changing the password won't break synchronization
* re-enabling tfa should not break the synchronization (should because I did not explictly try this)

so - I'd suggest - try it - should it cause problems - just report them back here (with the logs) - and I'd try to reproduce the environment (and see if we can do something about it)

Jord · Jan 25, 2022

Stoiko Ivanov said:
* re-enabling tfa should not break the synchronization (should because I did not explictly try this)

It works like a charm: enabling TFA on user root on Server 1 enables automatically the same TFA on user root on Server 2.

Tested a reboot of both servers & everything works fine.

Thanks a lot.
Jord

Stoiko Ivanov · Jan 25, 2022

Jord said:
It works like a charm: enabling TFA on user root on Server 1 enables automatically the same TFA on user root on Server 2.

yes - however the passwords for the @pam users are local to the node - make sure to update them on both nodes!

Glad we figured this out!

Search

Search

node not joining

Jord

Active Member

Jord

Active Member

Jord

Active Member

Jord

Active Member

Stoiko Ivanov

Proxmox Staff Member

Jord

Active Member

Stoiko Ivanov

Proxmox Staff Member

Jord

Active Member

Stoiko Ivanov

Proxmox Staff Member

Jord

Active Member

Stoiko Ivanov

Proxmox Staff Member

Jord

Active Member

Stoiko Ivanov

Proxmox Staff Member

Jord

Active Member

Stoiko Ivanov

Proxmox Staff Member

We value your privacy