[SOLVED] Cluster unable to sync both ways since 11nov

c0urier

Member
Aug 29, 2011
28
2
23
Denmark
I've seen some other threads and not sure if it's related - But I'm trying to figure out how to get my cluster to sync both ways again since it has not worked since 2022.11.11 at 07.28.
The error I see is;

Code:
Nov 11 07:28:00 mail-gw02 pmgmirror[1017497]: cluster synchronization finished  (0 errors, 3.99 seconds (files 0.39, database 2.86, config 0.74))
Nov 11 07:29:56 mail-gw02 pmgmirror[1017497]: starting cluster synchronization
Nov 11 07:29:57 mail-gw02 pmgmirror[1017497]: database sync 'mail-gw01' failed - Wide character in subroutine entry at /usr/share/perl5/PMG/DBTools.pm line 1093.
Nov 11 07:29:59 mail-gw02 pmgmirror[1017497]: cluster synchronization finished  (1 errors, 3.27 seconds (files 0.00, database 2.54, config 0.73))
Nov 11 07:31:56 mail-gw02 pmgmirror[1017497]: starting cluster synchronization
Nov 11 07:31:58 mail-gw02 pmgmirror[1017497]: detected rule database changes - starting sync from '10.2.0.22'
Nov 11 07:31:58 mail-gw02 pmgmirror[1017497]: finished rule database sync from host '10.2.0.22'
Nov 11 07:31:58 mail-gw02 pmgmirror[1017497]: database sync 'mail-gw01' failed - Wide character in subroutine entry at /usr/share/perl5/PMG/DBTools.pm line 1093.
Nov 11 07:32:00 mail-gw02 pmgmirror[1017497]: cluster synchronization finished  (1 errors, 3.48 seconds (files 0.00, database 2.73, config 0.75))
And that just continues until this day.

From mail-gw02 to mail-gw01 everything is fine.

Code:
root@mail-gw02:~> pmgcm status
NAME(CID)--------------IPADDRESS----ROLE-STATE---------UPTIME---LOAD----MEM---DISK
mail-gw01(2)         10.2.0.22       master A    9 days 21:25   0.18    38%    21%
mail-gw02(1)         10.4.0.2        node   S    9 days 21:25   0.16    38%    22%

Code:
root@mail-gw02:~> pmgversion  -v
proxmox-mailgateway: 7.1-2 (API: 7.1-9/e0c0be55, running kernel: 5.15.64-1-pve)
pmg-api: 7.1-9
pmg-gui: 3.1-6
pve-kernel-5.15: 7.2-14
pve-kernel-helper: 7.2-14
pve-kernel-5.13: 7.1-9
pve-kernel-5.15.74-1-pve: 5.15.74-1
pve-kernel-5.15.64-1-pve: 5.15.64-1
pve-kernel-5.13.19-6-pve: 5.13.19-15
pve-kernel-5.13.19-2-pve: 5.13.19-4
clamav-daemon: 0.103.7+dfsg-0+deb11u1
ifupdown: 0.8.36+pve2
libarchive-perl: 3.4.0-1
libjs-extjs: 7.0.0-1
libjs-framework7: 4.4.7-1
libproxmox-acme-perl: 1.4.2
libproxmox-acme-plugins: 1.4.2
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.2-6
libpve-http-server-perl: 4.1-5
libxdgmime-perl: 1.0-1
lvm2: 2.03.11-2.1
pmg-docs: 7.1-2
pmg-i18n: 2.7-2
pmg-log-tracker: 2.3.1-1
postgresql-13: 13.8-0+deb11u1
proxmox-mini-journalreader: 1.3-1
proxmox-offline-mirror-helper: 0.5.0-1
proxmox-spamassassin: 3.4.6-4
proxmox-widget-toolkit: 3.5.1
pve-firmware: 3.5-6
pve-xtermjs: 4.16.0-1
zfsutils-linux: 2.1.6-pve1

Any help to get the cluster back in sync would be greatly appreciated.
 

Stoiko Ivanov

Proxmox Staff Member
Staff member
May 2, 2018
7,748
1,276
169
've seen some other threads and not sure if it's related - But I'm trying to figure out how to get my cluster to sync both ways again since it has not worked since 2022.11.11 at 07.28.
could you share the logs of that timeframe (-10 minutes till +10 minutes)?
that might help narrowing down where the issue is.

The error-message is odd -as it points to a location in the code where this error should not happen - could you share line 1093 (and the surrounding lines) of /usr/share/perl5/PMG/DBTools.pm

Did you change anything in the database manually? (i.e. not using the PMG tooling/GUI)?
 

c0urier

Member
Aug 29, 2011
28
2
23
Denmark
Hi Stoiko

Thanks for reaching out; logs from mail-gw01 and mail-gw02 - attached as it's a lot of logs.

DBTools.pm - mail-gw01
Code:
sub update_master_clusterinfo {
    my ($clientcid) = @_;

    my $dbh = open_ruledb();

    $dbh->do("DELETE FROM ClusterInfo WHERE CID = $clientcid");

    my @mt = ('CMSReceivers', 'CGreylist', 'UserPrefs', 'DomainStat', 'DailyStat', 'LocalStat', 'VirusInfo');

    foreach my $table (@mt) {
        $dbh->do ("INSERT INTO ClusterInfo (cid, name, ivalue) select $clientcid, 'lastmt_$table', " .
                  "EXTRACT(EPOCH FROM now())");
    }
}

DBTools.pm - mail-gw02
Code:
sub update_master_clusterinfo {
    my ($clientcid) = @_;

    my $dbh = open_ruledb();

    $dbh->do("DELETE FROM ClusterInfo WHERE CID = $clientcid");

    my @mt = ('CMSReceivers', 'CGreylist', 'UserPrefs', 'DomainStat', 'DailyStat', 'LocalStat', 'VirusInfo');

    foreach my $table (@mt) {
        $dbh->do ("INSERT INTO ClusterInfo (cid, name, ivalue) select $clientcid, 'lastmt_$table', " .
                  "EXTRACT(EPOCH FROM now())");
    }
}

Did you change anything in the database manually? (i.e. not using the PMG tooling/GUI)?
No I have done no manual changes to the DB.

Again thanks for assisting.
 

Stoiko Ivanov

Proxmox Staff Member
Staff member
May 2, 2018
7,748
1,276
169

c0urier

Member
Aug 29, 2011
28
2
23
Denmark
Hi Stoiko

Thanks appreciate you taking the time.

They're attached - Only thing redacted is the blacklist and whitelist. Everything else should be generic.

modify field and notify are what followed pmg and has not been modified since these were installed quite some years ago.
Screenshot from 2022-11-23 12-18-33.png
 

Attachments

  • pmgdb_dump-mail-gw01_redacted.txt
    24 KB · Views: 1
  • pmgdb_dump-mail-gw02_redacted.txt
    26.1 KB · Views: 1
Last edited:

Stoiko Ivanov

Proxmox Staff Member
Staff member
May 2, 2018
7,748
1,276
169
Thanks!

Do you by any chance know when you installed the upgrades?
(or could you share the /var/log/apt/history.log (or the rotated variant that captures the timeframe before the issue occured?)
 

c0urier

Member
Aug 29, 2011
28
2
23
Denmark
They're pretty identical.

mail-gw01:
Code:
Start-Date: 2022-11-09  13:19:07
Commandline: apt dist-upgrade -y
Upgrade: pmg-api:amd64 (7.1-7, 7.1-8), pmg-gui:amd64 (3.1-4, 3.1-5)
End-Date: 2022-11-09  13:19:22

Start-Date: 2022-11-12  18:56:04
Commandline: apt dist-upgrade -y
Upgrade: libpixman-1-0:amd64 (0.40.0-1, 0.40.0-1.1~deb11u1)
End-Date: 2022-11-12  18:56:04

Start-Date: 2022-11-15  23:07:28
Commandline: apt dist-upgrade -y
Upgrade: grub-pc-bin:amd64 (2.06-3~deb11u2, 2.06-3~deb11u4), pmg-api:amd64 (7.1-8, 7.1-9), grub-efi-amd64-bin:amd64 (2.06-3~deb11u2, 2.06-3~deb11u4), grub2-common:amd64 (2.06-3~deb11u2, 2.06-3~deb11u4), libpve-http-server-perl:amd64 (4.1-4, 4.1-5), libpve-common-perl:amd64 (7.2-3, 7.2-5), grub-common:amd64 (2.06-3~deb11u2, 2.06-3~deb11u4), grub-efi-ia32-bin:amd64 (2.06-3~deb11u2, 2.06-3~deb11u4), grub-pc:amd64 (2.06-3~deb11u2, 2.06-3~deb11u4)
End-Date: 2022-11-15  23:07:46

mail-gw02:
Code:
Start-Date: 2022-11-09  13:19:15
Commandline: apt dist-upgrade -y
Upgrade: pmg-api:amd64 (7.1-7, 7.1-8), pmg-gui:amd64 (3.1-4, 3.1-5)
End-Date: 2022-11-09  13:20:02

Start-Date: 2022-11-12  18:55:56
Commandline: apt dist-upgrade -y
Upgrade: libpixman-1-0:amd64 (0.40.0-1, 0.40.0-1.1~deb11u1)
End-Date: 2022-11-12  18:55:57

Start-Date: 2022-11-15  23:07:55
Commandline: apt dist-upgrade -y
Upgrade: grub-pc-bin:amd64 (2.06-3~deb11u2, 2.06-3~deb11u4), pmg-api:amd64 (7.1-8, 7.1-9), grub-efi-amd64-bin:amd64 (2.06-3~deb11u2, 2.06-3~deb11u4), grub2-common:amd64 (2.06-3~deb11u2, 2.06-3~deb11u4), libpve-http-server-perl:amd64 (4.1-4, 4.1-5), libpve-common-perl:amd64 (7.2-3, 7.2-5), grub-common:amd64 (2.06-3~deb11u2, 2.06-3~deb11u4), grub-efi-ia32-bin:amd64 (2.06-3~deb11u2, 2.06-3~deb11u4), grub-pc:amd64 (2.06-3~deb11u2, 2.06-3~deb11u4)
End-Date: 2022-11-15  23:09:05

And the issue showed on the 11th So in between 7.1-7 -> 7.1-8 and 7.1-8 -> 7.1-9
 

Stoiko Ivanov

Proxmox Staff Member
Staff member
May 2, 2018
7,748
1,276
169
ok - that helps - so the issue was introduced with 7.1-8 !

Thanks!
 

Stoiko Ivanov

Proxmox Staff Member
Staff member
May 2, 2018
7,748
1,276
169
Sadly did not manage to reproduce the issue here - despite trying to match the potentially problematic mail as closely as possible...

the following commands should produce a text-listing of your db-tables, which might cause the issue:
Code:
psql Proxmox_ruledb --echo-queries -c "select * from clusterinfo" >> dbdump.txt; psql Proxmox_ruledb --echo-queries -c "select * from cmsreceivers" >> dbdump.txt;  for i in cmailstore cstatistic domainstat dailystat; do psql Proxmox_ruledb --echo-queries -c "select * from $i where time > 1668121200" >> dbdump.txt ; done

If you like - please run it, gzip the resulting dbdump.txt and share it (if you prefer send it via mail to s.ivanov _at_ proxmox.com)

Thanks
 

Stoiko Ivanov

Proxmox Staff Member
Staff member
May 2, 2018
7,748
1,276
169
Thanks - I think the issue is with quarantined mails.

How long is your quarantine lifetime? (GUI->Configuration->Spam Detector->Quarantine)

I still wonder how the mails managed to get put in quarantine - since here the mail is just dropped (just like in the linked thread) - thus not breaking the cluster-sync

Do you have an LDAP profile configured?

In any case - depending on how long you want to wait/can wait:
* after the quarantine life-time days+1 from the moment you installed pmg-api 7.1-9 (2022-11-15) the issue should resolve itself
(all problematic mails will be purged by the pmgspamreport timer ...)
* else we could try to selectively remove old mails from the quarantine db and from the spooldir - but this is a bit involved and could cause inconsistencies
 

c0urier

Member
Aug 29, 2011
28
2
23
Denmark
Spam quarantine is 31 days and has not been changed since I actually don't remember. And I agree it seem a bit odd.

No LDAP profiles configured.

So if I changed the quarantine time to ex. 7 days it should resolve itself?

Either way, I am willing to try what ever you prefer, just to ensure it wont happen again - And I guess since it's resolved in 7.1-9 it wont.
 

Stoiko Ivanov

Proxmox Staff Member
Staff member
May 2, 2018
7,748
1,276
169
Spam quarantine is 31 days and has not been changed since I actually don't remember.
hm - from a quick look in the db-dump I think that most of the mails should have been dealt with (they have been delivered or deleted)?
could you check as Administrator in the GUI -> Spam Quarantine - are there still mails visible there?

if not - it would really be the simplest to set the quarantine lifetime to 7 days ; run `pmgqm purge` ; and then reset it to 31 days (I assume your users do expect this to stay that way)

And I guess since it's resolved in 7.1-9 it wont.

I would assume so - if not don't hesitate to post here :)
 
Last edited:
  • Like
Reactions: c0urier

c0urier

Member
Aug 29, 2011
28
2
23
Denmark
No it's empty on both nodes.

Code:
root@mail-gw01:~> pmgqm purge
purging database
removed 142 spam quarantine files

Code:
root@mail-gw02:~> pmgqm purge
purging database
removed 118 spam quarantine files

And now we're back in sync.
Code:
NAME(CID)--------------IPADDRESS----ROLE-STATE---------UPTIME---LOAD----MEM---DISK
mail-gw02(1)         10.4.0.2        node   A   10 days 22:35   0.29    37%    22%
mail-gw01(2)         10.2.0.22       master A   10 days 22:35   0.00    38%    22%

And the log looks good:
Code:
Nov 23 17:33:14 mail-gw02 pmgmirror[870138]: database sync 'mail-gw01' failed - Wide character in subroutine entry at /usr/share/perl5/PMG/DBTools.pm line 1076.
Nov 23 17:33:16 mail-gw02 pmgmirror[870138]: cluster synchronization finished  (1 errors, 3.25 seconds (files 0.00, database 2.49, config 0.76))
Nov 23 17:35:13 mail-gw02 pmgmirror[870138]: starting cluster synchronization
Nov 23 17:35:18 mail-gw02 pmgmirror[870138]: cluster synchronization finished  (0 errors, 5.27 seconds (files 0.64, database 3.86, config 0.76))
Nov 23 17:37:13 mail-gw02 pmgmirror[870138]: starting cluster synchronization
Nov 23 17:37:17 mail-gw02 pmgmirror[870138]: cluster synchronization finished  (0 errors, 4.04 seconds (files 0.45, database 2.78, config 0.80))

Thank you, Stoiko - Really appreciated the effort! Awesome!
 
Last edited:
  • Like
Reactions: Stoiko Ivanov

Stoiko Ivanov

Proxmox Staff Member
Staff member
May 2, 2018
7,748
1,276
169
Glad we figured that out!
should you run into something similar again - just reply here - maybe then we'll find the root-cause
 
  • Like
Reactions: c0urier

c0urier

Member
Aug 29, 2011
28
2
23
Denmark
All good I'll keep that in mind. Again thanks!

We can still consider this resolved as everything is back to normal.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!