PMG cluster and the user white/black lists

ednt

Renowned Member
Mar 16, 2017
111
7
83
Hi,

we run a pmg cluster with 2 nodes.
pmg mirror tells sync is always ok.

But ...
On the 'master' we have much more user white list entries than on the 'slave'.
Are these entries not synchronized ?

If so, we run into a problem if the 'slave' becomes active.

Btw,: PMG 8.2.0 no pending updates.


Best regards.
 
They are synchronized with the cluster-sync usually - do you see any errors with the sync in the past?
could you share:
* the output of `pmgcm status`?
* journal from pmgtunnel and pmgmirror for a longer timeframe - and while you add one entry for one user ?
* the output of `pmgsh get /quarantine/whitelist -pmail <one-address-with-different-entries>`
All from both nodes! (replace <one-address-with-different-entries> by one e-mail address where you observed that the entries are not in sync)
 
Hi,

thanks for your fast response.
The problem is not only missing entries from a user, also the users itself are not complete on the slave.

I can select areround 25 users on the master, but only 2 on the slave.

I attached a file for master and slave.

I don't know why this message appears on pmg-02:
Mar 31 16:16:59 pk-pmg-02 pmgmirror[860]: syncing deleted node 2 from master '192.168.248.11'

There are only 2 nodes and both are up and running.

Also this is not good:
Mar 31 16:16:59 pk-pmg-02 pmgmirror[860]: database sync 'pk-pmg-01' failed - command 'rsync '--rsh=ssh -l root -o BatchMode=yes -o HostKeyAlias=pk-pmg-01' -q -aq --timeout 10 '[192.168.248.11]:/var/spool/pmg/cluster/2/' /var/spool/pmg/cluster/2 --include spam/ --include 'spam/*' --include 'spam/*/*' --include virus/ --include 'virus/*' --include 'virus/*/*' --exclude '*'' failed: exit code 23

But I can do ssh from pmg-02 to pmg-01 without problems:
root@pk-pmg-02:~# ssh 192.168.248.11
Linux pk-pmg-01 6.8.12-9-pve #1 SMP PREEMPT_DYNAMIC PMX 6.8.12-9 (2025-03-16T19:18Z) x86_64

If I run:
rsync --rsh=ssh -l root -o BatchMode=yes -o HostKeyAlias=pk-pmg-01' -q -aq --timeout 10 '[192.168.248.11]:/var/spool/pmg/cluster/2/' /var/spool/pmg/cluster/2 --include spam/ --include 'spam/*' --include 'spam/*/*' --include virus/ --include 'virus/*' --include 'virus/*/*' --exclude '*'
by hand, I get a prompt: >


ahhh, there is no node 2 , only node 1 and 3.
So this has something todo with the deleted node 2.

But this has nothing to do with the missing users and there wihit list entries.
 

Attachments

Last edited:
I tried now to delete the not available node 2, but ...
root@pk-pmg-02:~# pmgcm delete 2
delete cluster node failed: operation not permitted (not master)

And on pmg-01:
root@pk-pmg-01:~# pmgcm delete 2
delete cluster node failed: no such node (cid == 2 does not exists)

So, additional question: how can I get rid of the failing sync message on pmg-02 ?
 
ahhh, there is no node 2 , only node 1 and 3.
So this has something todo with the deleted node 2.

But this has nothing to do with the missing users and there wihit list entries.
without checking explicitly if i remember correctly the database-sync (containing those entries) happens after the files are synced - so I'd first fix the one error you're having and then see if this resolves the issue

what's the output of the command that files when run in a root-shell?:
Code:
rsync '--rsh=ssh -l root -o BatchMode=yes -o HostKeyAlias=pk-pmg-01' -q -aq --timeout 10 '[TAKE_FROM_LOGS]:/var/spool/pmg/cluster/2/' /var/spool/pmg/cluster/2 --include spam/ --include 'spam/*' --include 'spam/*/*' --include virus/ --include 'virus/*' --include 'virus/*/*' --exclude '*'
 
With this code line I get no prompt, I get ,like in the logs, :
[Receiver] io timeout after 10 seconds -- exiting
rsync error: timeout in data send/receive (code 30) at io.c(201) [Receiver=3.2.7]

As written: node 2 is not existant.
But I found no entry in any file of pmg-02 where something is written about node 2
Only in /var/spool/pmg/cluster is a directory named 2.
Yesterday I moved this directory, but it is recreated.
 
Last edited: