node not joining

Jord

Active Member
Aug 14, 2017
11
1
43
49
Hi,

Consider the following situation:

Server 1 in location 1
10.10.10.14 in DMZ
PMG community edition: wanna-be master in cluster yet to create
Stock PMG installation, with LetsEncrypt
+ zabbix
+ openvpn (server tap bridge mode)
+ port 22219 in ssh. Login through password disabled
+ iptables

Server 2 in location 2
192.168.2.24 in DMZ
10.10.10.16 on tap VPN
PMG community edition: wanna-be member in cluster yet to create
+ zabbix
+ openvpn (tap client)
+ port 22219 in ssh. Login through password disabled

IP tables rules.v4 of server 1 (server 2 has no iptables yet)

root@pmg:~# iptables -L -n
Chain INPUT (policy DROP)
target prot opt source destination
REJECT all — IP.OF.BAD.GUY 0.0.0.0/0 reject-with icmp-port-unreachable
...
REJECT all — IP.OF.BAD.GUY 0.0.0.0/0 reject-with icmp-port-unreachable
ACCEPT all -- 0.0.0.0/0 0.0.0.0/0 state RELATED,ESTABLISHED
ACCEPT all -- 0.0.0.0/0 0.0.0.0/0
ACCEPT icmp -- 0.0.0.0/0 0.0.0.0/0 icmptype 8
ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpt:22219
ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpt:80
ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpt:25
ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpt:26
ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpt:10050
ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpt:8006
ACCEPT udp -- 0.0.0.0/0 0.0.0.0/0 udp dpt:1194

Chain FORWARD (policy DROP)
target prot opt source destination

Chain OUTPUT (policy ACCEPT)
target prot opt source destination

Both sites are behind a firewall with DNAT & SNAT
Server 1 DNAT tcp 80,25,8006,22219 & udp 1194
Server 2 DNAT tcp 80,25,8006,22219

Id_rsa.pub are in each others authorized_keys. Servers can SSH into each other through pub ip & through vpn ip in both directions.


Issue
Started cluster on Server 1 —> join on server 2 with 10.10.10.14 , passwd & fingerprint —> dialog closes immediately, no cluster though.


Any ideas?
TIA
Jord
 
Using CLI:

root@pmg2:~# pmgcm join 10.10.10.14 --fingerprint 2B:58:3A:6F:82:A6:F8:55:94:C6:06:56:5B:94:92:C5:8E:66:45:2B:2F:69:F0:1A:FB:6F:6E:CD:12:C2:9D:C8

cluster join failed: 401 permission denied - invalid PMG ticket
 
okay, so I presumed an error with fingerprint, so went for reinstalling making sure certificates were kept pristine.

Did a clean re-install of server 1 & 2.
Didn't touch any certificates in /etc/ssh or /root/.ssh at all
Restore from backup on server 1 (server 2 was blank anyway)
Reinstalled vpn-tunnel

root@pmg2:~# pmgcm join 10.10.10.14 --fingerprint D2:6B:2E:3C:C7:46:69:13:9C:AC:98:2D:55:98:5B:06:67:E6:33:42:A9:A1:E1:81:22:56:41:27:BF:71:49:E9

cluster join failed: 401 permission denied - invalid PMG ticket

Could use some help here....
TIA
Jord
 
Is the time in sync across both nodes?

else - please post the journal of both nodes during the attempt to join

I hope this helps!
 
Time sync is correct.

root@pmg:~# ntpdate ntp.belnet.be
24 Jan 11:47:47 ntpdate[12051]: adjust time server 193.190.198.10 offset +0.003148 sec

root@pmg2:~# ntpdate ntp.belnet.be
24 Jan 11:47:29 ntpdate[7601]: adjust time server 193.190.198.10 offset +0.000521 sec

During attempt to join I get on Server 1 (master):
/var/log/pmgproxy/pmgproxy.log
::ffff:10.10.10.16 - - [24/01/2022:11:57:37 +0100] "POST /api2/json/access/ticket HTTP/1.1" 200 565
::ffff:10.10.10.16 - - [24/01/2022:11:57:37 +0100] "POST /api2/json/config/cluster/nodes HTTP/1.1" 401 -

/var/log/syslog
Jan 24 11:59:55 pmg pmgdaemon[842]: successful auth for user 'root@pam'

Is there another log I should check?

TIA
Jord
 
/var/log/pmgproxy/pmgproxy.log
::ffff:10.10.10.16 - - [24/01/2022:11:57:37 +0100] "POST /api2/json/access/ticket HTTP/1.1" 200 565
::ffff:10.10.10.16 - - [24/01/2022:11:57:37 +0100] "POST /api2/json/config/cluster/nodes HTTP/1.1" 401 -

/var/log/syslog
Jan 24 11:59:55 pmg pmgdaemon[842]: successful auth for user 'root@pam'
the successful authentication is 2 minutes after the cluster-join call?

could you please post
* the /etc/pmg/cluster.conf (of both systems if existent)
* the output of pmgcm join-cmd on the master node
 
the successful authentication is 2 minutes after the cluster-join call?
No,

I tailed one log at a time and performed two attempts to join.

I did one join again and have the following simultaneous logs:
::ffff:10.10.10.16 - - [24/01/2022:13:10:21 +0100] "POST /api2/json/access/ticket HTTP/1.1" 200 565
::ffff:10.10.10.16 - - [24/01/2022:13:10:21 +0100] "POST /api2/json/config/cluster/nodes HTTP/1.1" 401 -
&
Jan 24 13:10:21 pmg pmgdaemon[856]: successful auth for user 'root@pam'

/etc/pmg/cluster.conf is only present on Server 1:
master: 1 fingerprint D2:6B:2E:3C:C7:46:69:13:9C:AC:98:2D:55:98:5B:06:67:E6:33:42:A9:A1:E1:81:22:56:41:27:BF:71:49:E9 hostrsapubkey AAAAB3NzaC1yc2EAAAADAQABAAABgQDCd3oo3xB7wri2u7FMzYMR6JeIaUS6UqaegdWumm/KnLkxZd70ejziDU15FIGbgRwMOdhwJxcyDPcAyTuSqroVJos8OGtMaIHaDmHBy8xbCdQ38YtZ9eM4RNaWkBdyvoxxnZQw3iFvIv4EozFUvy4oeC0cWRsm41IcONrk0QpDwXre7eRRRhjf8l7XkVN+kaeedlZqldpKaFLY7lojndXFr5toocQ4fKc/X3s9RPi4XUq3vZ5zUBeylugPls3Gg9UVMpu4V6r8zST4MwYVxze/CMrjQS5Sxgvw2FEzWG0fXzi9wHA1bgTpn6BfHgvmA/dsV0lIrneEZVBflkytTNCQwUKK2TjIgkxysCDz95bhJmsEikGXzrituiXDVA5DHpDWfTKmjK52A38FACSKjP1Uuq3XA1iI9bzIp6XxZtGFgo7SUKlfEIoOs4VlEQe8cJmvR6tngFB0yo9aVR9IsgvwqbkyJGi0g4LShwwGWaH5XRa0QJSud9ZUs2ooiaVuTDs= ip 10.10.10.14 maxcid 1 name pmg rootrsapubkey AAAAB3NzaC1yc2EAAAADAQABAAABAQDE1KGFbYu0HpdEFGY/pak4EOVJTOxoMSRH5yO980mezAjgnk7AUEdcnkksN25v5QoRAPDeQi+kl/fO1/C9pUdcRvRAzys/Ewy+9+3+mmon5GS+r7/Wht5n9BqCR4IHPGFQsMU9IHPU98D6LcvxG8Al8nWTs+HFjV0ReFUj3w3sSeSjjk4tr+V29He1Rg3/OEzx8xM1uIPTYLEfxx2MQuPzJmacpPEjoQMQYh6gW0w9axTa4Q1x0bBzvPirMTb/gkN2tLW8m8FHnsA0YnB0EVzzcfQm8TWobiCUaanDg8c5/ZZChlA/MFDJquHe4Gi6Ax3jeroOlHIwmhgsHa9GyTXj

root@pmg:~# pmgcm join-cmd
pmgcm join 10.10.10.14 --fingerprint D2:6B:2E:3C:C7:46:69:13:9C:AC:98:2D:55:98:5B:06:67:E6:33:42:A9:A1:E1:81:22:56:41:27:BF:71:49:E9

TIA
Jord
 
Do you have TFA (two/multifactor authentication) enabled on you PMG (mostly the master node?)
else I don't see where this could go wrong - I assume that both nodes can reach port 8006 (pmgproxy) directly - without any other piece of software in between?
 
Yes! Disabling TFA did the trick!

Node joined the cluster with issues though:

root@pmg2:~# pmgcm join 10.10.10.14 --fingerprint D2:6B:2E:3C:C7:46:69:13:9C:AC:98:2D:55:98:5B:06:67:E6:33:42:A9:A1:E1:81:22:56:41:27:BF:71:49:E9 stop all services accessing the database save new cluster configuration cluster node successfully joined updated /etc/pmg/cluster.conf updated /etc/pmg/pmg-authkey.key updated /etc/pmg/pmg-authkey.pub updated /etc/pmg/pmg-csrf.key updated /etc/pmg/user.conf updated /etc/pmg/tfa.json updated /etc/pmg/domains updated /etc/pmg/mynetworks updated /etc/pmg/transport updated /etc/pmg/tls_policy updated /etc/pmg/pmg.conf copying master database from '10.10.10.14' copying master database finished (got 247524 bytes) delete local database could not change directory to "/root": Permission denied create new local database could not change directory to "/root": Permission denied insert received data into local database creating indexes run analyze to speed up database queries could not change directory to "/root": Permission denied ANALYZE could not change directory to "/root": Permission denied could not change directory to "/root": Permission denied could not change directory to "/root": Permission denied could not change directory to "/root": Permission denied could not change directory to "/root": Permission denied could not change directory to "/root": Permission denied could not change directory to "/root": Permission denied could not change directory to "/root": Permission denied could not change directory to "/root": Permission denied could not change directory to "/root": Permission denied nextserver: Bootstrap discovery failed. Giving up. could not change directory to "/root": Permission denied could not change directory to "/root": Permission denied syncing quarantine data syncing quarantine data finished

Cluster administration on Server 1 gives "communication failure (0)" although the rest of the WebUI seems OK.
Cluster administration on Server 2 gives correct cluster overview.

Halfway there ?
 
Yes! Disabling TFA did the trick!
nice
could not change directory to "/root": Permission denied
these messages are harmless (although we'll try to get rid of them soon, since they do cause confusion)

nextserver: Bootstrap discovery failed. Giving up.
I think this is due to missing firewall-allow rules for razor

Cluster administration on Server 1 gives "communication failure (0)" although the rest of the WebUI seems OK.
would also guess on a hunch that this might be related to missing firewall policies or pmgproxy not listening on the openvpn interface
again the journal from both nodes (over the period of 2 Minutes) might give some hints
 
I think this is due to missing firewall-allow rules for razor
I can't verify, but after your message I allowed all outgoing traffic on Server's 2 firewall.
would also guess on a hunch that this might be related to missing firewall policies or pmgproxy not listening on the openvpn interface
Indeed, I had to add a static rule to /etc/network/interfaces on Server 1

up ip route add 192.168.2.24/32 via 10.10.10.16 dev br0

I think I have everything running as of now.

One issue remains though: as I intended to use TFA, the root password is fairly simple.
1. will re-enabling TFA break the cluster?
or
2. if TFA breaks the cluster, will modifying the password break the cluster?

Thanks for all the help so far.
TIA
Jord
 
One issue remains though: as I intended to use TFA, the root password is fairly simple.
* changing the password won't break synchronization
* re-enabling tfa should not break the synchronization (should because I did not explictly try this)

so - I'd suggest - try it - should it cause problems - just report them back here (with the logs) - and I'd try to reproduce the environment (and see if we can do something about it)
 
It works like a charm: enabling TFA on user root on Server 1 enables automatically the same TFA on user root on Server 2.
yes - however the passwords for the @pam users are local to the node - make sure to update them on both nodes!

Glad we figured this out!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!