Disk Usage and Maintenance

ejmerkel · Feb 7, 2020

We have a pair of PMG servers (Mail Gateway 5.2-7) in a clustered configuration and I've noticed over time in the disk utilization on the second node has been steadily increasing.

The culprit seems to be the following directory.

/var/spool/pmg/cluster

du -hs *
11G 1
11G 2

The usage appears to be a bunch of email files stored in the directories /var/spool/pmg/cluster/{1|2}/{spam|virus} etc.

My question is, is this normal for them to grow so large and is there a manual maintenance routine necessary to prune these emails on a periodic basis?

Best regards,
Eric

Stoiko Ivanov · Feb 7, 2020

Those are the mails you have in quarantine - they should get cleaned eventually after the quarantine lifetime on them expires.

What's your setting for your quarantine lifetime?

ejmerkel · Feb 7, 2020

If I click on Spam Detector/Virus Detector -> Quarantine -> Lifetime = 7 Days

find /var/spool/pmg/cluster -type f -mtime +7 |wc -l
1309039

It seems the clean up process is not working. Is there something I can do to manual force the cleanup process?

Best regards,
Eric

Stoiko Ivanov · Feb 7, 2020

have you disabled the 'pmgspamreport.timer' on the system (this runs daily and one of the tasks is to run: `/usr/bin/pmgqm purge`

I hope this helps!

ejmerkel · Feb 7, 2020

Stoiko Ivanov said:
have you disabled the 'pmgspamreport.timer' on the system (this runs daily and one of the tasks is to run: `/usr/bin/pmgqm purge`

I hope this helps!

No, we have not disabled this to my knowledge. Where would I check to see that it is enabled?

/usr/bin/pmgqm purge
purging database
removed 1313429 spam quarantine files
removed 146 virus quarantine files
removed: /var/spool/pmg/cluster/2/spam/96/2404915DA9FA3724796
removed: /var/spool/pmg/cluster/2/spam/8B/26F9F75D8BDC8FCCE8B

ejmerkel · Feb 7, 2020

Sorry to follow up to my post....it looks like the service is active?

systemctl status pmgspamreport.timer
● pmgspamreport.timer - Send Daily Spam Report Mails
Loaded: loaded (/lib/systemd/system/pmgspamreport.timer; enabled; vendor preset: enabled)
Active: active (waiting) since Fri 2019-10-18 13:52:25 EDT; 3 months 20 days ago

Stoiko Ivanov · Feb 7, 2020

hmm - please post the output of:

Code:

systemctl list-timers
systemctl cat pmgspamreport.timer
systemctl cat pmgspamreport.service

and check your journal for invocations of the timer/service

another question would be - did you get any spamreports in the past few months?

I assume the purge helped in reducing the diskusage?

ejmerkel · Feb 8, 2020

Stoiko,

To answer your questions in reverse order. Yes, the manually running of the command fixed the disk usage issue.

We had been receiving spam reports, but it appears they were only coming from the other node in cluster not this one. I assume we should we receive notifications directly from both servers?

Here are the output of the commands. Let me know if those look ok at this point or not?

Code:

root@mx2:~# systemctl list-timers
NEXT                         LEFT          LAST                         PASSED       UNIT                         ACTIVATES
Sat 2020-02-08 13:00:00 EST  42min left    Sat 2020-02-08 12:00:04 EST  17min ago    pmg-hourly.timer             pmg-hourly.service
Sat 2020-02-08 13:59:19 EST  1h 41min left Fri 2020-02-07 13:59:19 EST  22h ago      systemd-tmpfiles-clean.timer systemd-tmpfiles-cle
Sat 2020-02-08 23:32:20 EST  11h left      Sat 2020-02-08 07:52:04 EST  4h 25min ago apt-daily.timer              apt-daily.service
Sun 2020-02-09 00:01:00 EST  11h left      Sat 2020-02-08 00:01:04 EST  12h ago      pmgreport.timer              pmgreport.service
Sun 2020-02-09 00:05:00 EST  11h left      Sat 2020-02-08 00:05:04 EST  12h ago      pmgspamreport.timer          pmgspamreport.servic
Sun 2020-02-09 03:30:36 EST  15h left      Sat 2020-02-08 03:47:19 EST  8h ago       pmg-daily.timer              pmg-daily.service
Sun 2020-02-09 06:27:14 EST  18h left      Sat 2020-02-08 06:08:53 EST  6h ago       apt-daily-upgrade.timer      apt-daily-upgrade.se

7 timers listed.
Pass --all to see loaded but inactive timers, too.

Code:

root@mx2:~# systemctl cat pmgspamreport.timer
# /lib/systemd/system/pmgspamreport.timer
[Unit]
Description=Send Daily Spam Report Mails

[Timer]
OnCalendar=00:05
Persistent=true 

[Install]
WantedBy=timers.target

root@mx2:~# systemctl cat pmgspamreport.service
# /lib/systemd/system/pmgspamreport.service
[Unit]
Description=Send Daily Spam Report Mails
ConditionPathExists=/usr/bin/pmgqm

[Service]
Type=oneshot
ExecStart=/usr/bin/pmgqm send --timespan yesterday
ExecStartPost=/usr/bin/pmgqm purge

Thanks,
Eric

ejmerkel · Mar 2, 2020

I am still having to manually run the purge command as it is not automatically cleaning the spam and virus quarantine. I i am running the latest Mail Gateway 5.2-7.

Any thoughts on how to fix this?

Stoiko Ivanov · Mar 2, 2020

Hmm - could you:
* run `pmgspamreport.service` once and post the resulting log-lines:
`systemctl start pmgspamreport.service`
* run: `/usr/bin/pmgqm send --timespan yesterday ; echo $?`

maybe it's an issue with the service-file - maybe it's something on your installation...

Thanks!

ejmerkel · Mar 2, 2020

Here are the results...

systemctl start pmgspamreport.service
Job for pmgspamreport.service failed because the control process exited with error code.
See "systemctl status pmgspamreport.service" and "journalctl -xe" for details.
root@mx2:~# journalctl -xe
--
-- The result is failed.
Mar 02 08:30:44 mx2 systemd[1]: pmgspamreport.service: Unit entered failed state.
Mar 02 08:30:44 mx2 systemd[1]: pmgspamreport.service: Failed with result 'exit-code'.

/usr/bin/pmgqm send --timespan yesterday ; echo $?
local node is not master - not sending spam report
25

Stoiko Ivanov · Mar 2, 2020

Thanks - I need to dig into the source again - since I'd assume that more people would have gotten that problem until now.

do all of your pmg-nodes have the same amount of diskspace?

is the cluster healthy? (pmgcm status)

ejmerkel · Mar 2, 2020

Yes, both nodes are the same size. Note that the disk shown on the command below is because I manually ran the purge again today. It was at approximate 19% on mx2 before doing that.

pmgcm status
NAME(CID)--------------IPADDRESS----ROLE-STATE---------UPTIME---LOAD----MEM---DISK
mx1(1) X.Y.Z.146 master A 4 days 01:35 0.15 33% 11%
mx2(2) X.Y.Z.151 node A 4 days 02:28 0.12 33% 12%

Stoiko Ivanov · Mar 2, 2020

Thanks - could I ask you to post some (anonymized) logs from the cluster-synchronization? (pmgmirror/pmgtunnel services)

additionally - are the files only present on mx2? (and only in the directory /var/spool/pmg/cluster/2/spam/) - do the same files exist on mx1 (in the directory /var/spool/pmg/cluster/2/spam/?

Thanks!

ejmerkel · Mar 3, 2020

In regards to the/ var/spool/pmg/cluster/2/spam/ , I looked on mx1 and verified that the same files that are on mx2 DO exist on mx1 so that looks good.

Here are the only logs back from Feb 27 regarding the pmgtunnel services. There is nothing recently regarding that service in the daemon.log files.

Feb 27 07:54:08 mx2 pmgtunnel[785]: starting server
Feb 27 07:54:08 mx2 pmgtunnel[785]: starting tunnel 786 A.B.C.D
Feb 27 08:44:49 mx2 pmgmirror[832]: database sync 'mx1' failed - DBI connect('dbname=Proxmox_ruledb;host=/var/run/pmgtunnel;port=1;','root',...) failed: server closed the connection unexpectedly#012#011This probably means the server terminated abnormally#012#011before or while processing the request. at /usr/share/perl5/PMG/DBTools.pm line 59.
Feb 27 08:48:49 mx2 pmgtunnel[785]: tunnel finished 786 A.B.C.D
Feb 27 08:48:49 mx2 pmgmirror[832]: database sync 'mx1' failed - DBI connect('dbname=Proxmox_ruledb;host=/var/run/pmgtunnel;port=1;','root',...) failed: server closed the connection unexpectedly#012#011This probably means the server terminated abnormally#012#011before or while processing the request. at /usr/share/perl5/PMG/DBTools.pm line 59.
Feb 27 08:49:08 mx2 pmgtunnel[785]: restarting crashed tunnel 5355 A.B.C.D

In regards to the pmgmirror it looks fairly normal and is regularly posting the below messages to the daemon.log.

Mar 3 08:52:25 mx2 pmgmirror[832]: detected rule database changes - starting sync from 'A.B.C.D'
Mar 3 08:52:26 mx2 pmgmirror[832]: finished rule database sync from host 'A.B.C.D'
Mar 3 08:52:29 mx2 pmgmirror[832]: cluster syncronization finished (0 errors, 5.56 seconds (files 1.87, database 2.45, config 1.24))
Mar 3 08:54:23 mx2 pmgmirror[832]: starting cluster syncronization
Mar 3 08:54:27 mx2 pmgmirror[832]: cluster syncronization finished (0 errors, 4.20 seconds (files 1.22, database 1.73, config 1.25))
Mar 3 08:56:23 mx2 pmgmirror[832]: starting cluster syncronization
Mar 3 08:56:28 mx2 pmgmirror[832]: cluster syncronization finished (0 errors, 4.35 seconds (files 1.39, database 1.73, config 1.24))
Mar 3 08:58:23 mx2 pmgmirror[832]: starting cluster syncronization
Mar 3 08:58:27 mx2 pmgmirror[832]: cluster syncronization finished (0 errors, 4.24 seconds (files 1.28, database 1.72, config 1.24))
Mar 3 09:00:23 mx2 pmgmirror[832]: starting cluster syncronization
Mar 3 09:00:27 mx2 pmgmirror[832]: cluster syncronization finished (0 errors, 4.26 seconds (files 1.33, database 1.69, config 1.24))
Mar 3 09:02:23 mx2 pmgmirror[832]: starting cluster syncronization
Mar 3 09:02:27 mx2 pmgmirror[832]: cluster syncronization finished (0 errors, 4.23 seconds (files 1.23, database 1.76, config 1.24))
Mar 3 09:04:23 mx2 pmgmirror[832]: starting cluster syncronization
Mar 3 09:04:25 mx2 pmgmirror[832]: detected rule database changes - starting sync from 'A.B.C.D'
Mar 3 09:04:26 mx2 pmgmirror[832]: finished rule database sync from host 'A.B.C.D'
Mar 3 09:04:28 mx2 pmgmirror[832]: cluster syncronization finished (0 errors, 4.97 seconds (files 1.49, database 2.23, config 1.24))
Mar 3 09:06:23 mx2 pmgmirror[832]: starting cluster syncronization
Mar 3 09:06:25 mx2 pmgmirror[832]: detected rule database changes - starting sync from 'A.B.C.D'
Mar 3 09:06:26 mx2 pmgmirror[832]: finished rule database sync from host 'A.B.C.D'
Mar 3 09:06:28 mx2 pmgmirror[832]: cluster syncronization finished (0 errors, 4.74 seconds (files 1.28, database 2.21, config 1.25))
Mar 3 09:08:23 mx2 pmgmirror[832]: starting cluster syncronization
Mar 3 09:08:25 mx2 pmgmirror[832]: detected rule database changes - starting sync from 'A.B.C.D'
Mar 3 09:08:25 mx2 pmgmirror[832]: finished rule database sync from host 'A.B.C.D'
Mar 3 09:08:28 mx2 pmgmirror[832]: cluster syncronization finished (0 errors, 4.65 seconds (files 1.18, database 2.23, config 1.24))
Mar 3 09:10:23 mx2 pmgmirror[832]: starting cluster syncronization
Mar 3 09:10:27 mx2 pmgmirror[832]: cluster syncronization finished (0 errors, 4.53 seconds (files 1.55, database 1.73, config 1.25))
Mar 3 09:12:23 mx2 pmgmirror[832]: starting cluster syncronization
Mar 3 09:12:28 mx2 pmgmirror[832]: cluster syncronization finished (0 errors, 4.17 seconds (files 1.22, database 1.70, config 1.25))
Mar 3 09:14:23 mx2 pmgmirror[832]: starting cluster syncronization
Mar 3 09:14:24 mx2 pmgmirror[832]: detected rule database changes - starting sync from 'A.B.C.D'
Mar 3 09:14:25 mx2 pmgmirror[832]: finished rule database sync from host 'A.B.C.D'
Mar 3 09:14:27 mx2 pmgmirror[832]: cluster syncronization finished (0 errors, 4.74 seconds (files 1.29, database 2.20, config 1.24))
Mar 3 09:16:23 mx2 pmgmirror[832]: starting cluster syncronization
Mar 3 09:16:28 mx2 pmgmirror[832]: cluster syncronization finished (0 errors, 4.32 seconds (files 1.34, database 1.73, config 1.25))
Mar 3 09:18:23 mx2 pmgmirror[832]: starting cluster syncronization
Mar 3 09:18:24 mx2 pmgmirror[832]: detected rule database changes - starting sync from 'A.B.C.D'
Mar 3 09:18:25 mx2 pmgmirror[832]: finished rule database sync from host 'A.B.C.D'
Mar 3 09:18:27 mx2 pmgmirror[832]: cluster syncronization finished (0 errors, 4.68 seconds (files 1.23, database 2.22, config 1.24))
Mar 3 09:18:27 mx2 pmgmirror[832]: restarting server after 15 cycles to reduce memory usage (free 112295936 (5345280) bytes)
Mar 3 09:18:27 mx2 pmgmirror[832]: server shutdown (restart)

Stoiko Ivanov · Mar 19, 2020

FYI: I just sent a small patch-set which should address this issue to the pmg-devel mailing-list for discussion:
https://pve.proxmox.com/pipermail/pmg-devel/2020-March/000984.html

So once this is merged no more manual intervention should be necessary

ejmerkel · Mar 19, 2020

Awesome. Many thanks!!!

Stoiko Ivanov · Mar 19, 2020

will drop a line here once it's merged (and we know in which version it will be available

aasami · Jul 23, 2020

This thread is not marked as SOLVED so please let me jump into it.
I'm experiencing the same issue as OP with steadily increasing disk utilization of disk sapce in /var/spool/pmg/spam, but not due to failing pmgspamreport.service.
I have "Send daily admin reports" turned off which disables the pmgspamreport.service.
As a result PMG will never purge old spam from quarantine which is not what I/user expect.
When "Configuration > Spam Detector > Lifetime (days)" is set I did not expect that this depends on enabling sending of daily admin reports. Is there a way to let automatically purge old spam without enabling daily spam reports? I'm on PMG 6.2-5.

Stoiko Ivanov · Jul 23, 2020

aasami said:
I'm experiencing the same issue as OP with steadily increasing disk utilization of disk sapce in /var/spool/pmg/spam, but not due to failing pmgspamreport.service.

* are you running PMG in a cluster or as a single node?
*

aasami said:
"Send daily admin reports" turned off which disables the pmgspamreport.service.

this is not quite right - pmgspamreport.service is used for sending out quarantine reports to the users (and cleaning up the quarantine) - pmgreport.service is used to send out the daily admin reports - both of them are invoked (unless you turn it off manually) by their respective systemd-timers.

aasami said:
When "Configuration > Spam Detector > Lifetime (days)" is set I did not expect that this depends on enabling sending of daily admin reports. Is there a way to let automatically purge old spam without enabling daily spam reports? I'm on PMG 6.2-5.

this should happen in any case :

Code:

systemctl cat pmgspamreport.service
# /lib/systemd/system/pmgspamreport.service
[Unit]
Description=Send Daily Spam Report Mails
ConditionPathExists=/usr/bin/pmgqm

[Service]
Type=oneshot
ExecStartPre=-/usr/bin/pmgqm purge
ExecStart=/usr/bin/pmgqm send --timespan yesterday

the /usr/bin/pmgqm purge is responsible for the cleanup.

are you running the latest version? (pmgversion -v)

Disk Usage and Maintenance

Renowned Member

Proxmox Staff Member

Renowned Member

Proxmox Staff Member

Renowned Member

Renowned Member

Proxmox Staff Member

Renowned Member

Renowned Member

Proxmox Staff Member

Renowned Member

Proxmox Staff Member

Renowned Member

Proxmox Staff Member

Renowned Member

Proxmox Staff Member

Renowned Member

Proxmox Staff Member

Renowned Member

Proxmox Staff Member

We value your privacy