[SOLVED] Nodes stay Syncing after update

Soporte Servi

New Member
Jul 1, 2019
13
0
1
30
Today we have updated a cluster of three nodes. After the update, the master and one of the two nodes have started to attack the DB of the third node as if it were the master, while in the cluster state they were in 'syncing' mode instead of 'active' mode that only the last node in that state has been left.

In the logs of the first two servers we have obtained the following:

Jul 1 16:51:46 pmgmail1 pmgmirror [32177]: database sync 'pmgmail2' failed - command 'rsync' --rsh = ssh -l root -o BatchMode = yes -o HostKeyAlias = pmgmail2 '-q --timeout 10' [xxxx]: / var / spool / pmg '/ var / spool / pmg --files-from /tmp/quarantinefilelist.32177' failed: exit code 23

The file /tmp/quarantinefilelist.32177 does not exist on any of the three servers.


pmgversion:

proxmox-mailgateway: 5.2-1 (API: 5.2-3 / 26df5d99, running kernel: 4.15.18-16-pve)
pmg-api: 5.2-3
pmg-gui: 1.0-45
pve-kernel-4.15: 5.4-4
pve-kernel-4.13: 5.1-45
pve-kernel-4.15.18-16-pve: 4.15.18-41
pve-kernel-4.15.18-12-pve: 4.15.18-36
pve-kernel-4.15.18-10-pve: 4.15.18-31
pve-kernel-4.15.18-7-pve: 4.15.18-27
pve-kernel-4.15.18-4-pve: 4.15.18-23
pve-kernel-4.13.16-3-pve: 4.13.16-49
pve-kernel-4.13.16-1-pve: 4.13.16-46
pve-kernel-4.13.13-6-pve: 4.13.13-42
pve-kernel-4.13.13-5-pve: 4.13.13-38
libarchive-perl: 3.2.1-1
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-52
libpve-http-server-perl: 2.0-12
libxdgmime-perl: 0.01-3
lvm2: 2.02.168-2
pmg-docs: 5.2-3
proxmox-spamassassin: 3.4.2-2
proxmox-widget-toolkit: 1.0-28
pve-firmware: 2.0-5
pve-xtermjs: 3.10.1-2
zfsutils-linux: 0.7.13-pve1 ~ bpo1
 
Last edited:
* Please try to restart the `pmgmirror` and `pmgtunnel` services.
* what's the (redacted!) output of `pmgcm status`

Thanks!
 
Here's the output of the 'pmgcm status':

root@host1:/etc/pmg# pmgcm status
NAME(CID)--------------IPADDRESS----ROLE-STATE---------UPTIME---LOAD----MEM---DISK
host1(1) X.X.X.1 master S 57 days 20:33 0.23 55% 20%
host3(3) X.X.X.3 node S 57 days 20:09 0.12 43% 39%
host2(2) X.X.X.2 node A 57 days 16:25 0.06 56% 36%
 
how many files (and how large are they) are in /var/spool/pmg on host2 ?
 
Here's the content of /var/spool/pmg on host2:

root@host2:/var/spool/pmg# du -sh *
32K active
7.2G cluster
4.0K spam
4.0K virus
 
7.2G cluster
* That might explain it - 7.2G quarantined mail?!
* the timeout (10) is currently hardcoded in the sourcecode (/usr/share/perl5/PMG/Cluster.pm)

However 7.2G of quarantined mail is really odd - please check how that came to happen...
 
Hm - you can set the timeout in the source-code : /usr/share/perl5/PMG/Cluster.pm line 303
and restart the cluster-services (`pmgmirror`)
(for the amount of data set it to 120, if you have gigabit between the nodes)
If this resolves the issue we can consider making it configurable.

However I'm still curious what files take up so much space - could you please check which files are the largest, and where they are?

Thanks!
 
Still getting the same error:

Aug 28 10:45:17 host1 pmgmirror[26930]: database sync 'host2' failed - command 'rsync '--rsh=ssh -l root -o BatchMode=yes -o HostKeyAlias=host2' -q --timeout 120 '[X.X.X.2]:/var/spool/pmg' /var/spool/pmg --files-from /tmp/quarantinefilelist.26930' failed: exit code 23


About your question, here's what we have:

root@host2:/var/spool/pmg/cluster# du -sh *
1.5G 1
2.5G 2
3.3G 3

root@host2:/var/spool/pmg/cluster/3# du -sh *
2.4G spam
903M virus

root@host2:/var/spool/pmg/cluster/3/spam# ls
00 05 0A 0F 14 19 1E 23 28 2D 32 37 3C 41 46 4B 50 55 5A 5F 64 69 6E 73 78 7D 82 87 8C 91 96 9B A0 A5 AA AF B4 B9 BE C3 C8 CD D2 D7 DC E1 E6 EB F0 F5 FA FF
01 06 0B 10 15 1A 1F 24 29 2E 33 38 3D 42 47 4C 51 56 5B 60 65 6A 6F 74 79 7E 83 88 8D 92 97 9C A1 A6 AB B0 B5 BA BF C4 C9 CE D3 D8 DD E2 E7 EC F1 F6 FB
02 07 0C 11 16 1B 20 25 2A 2F 34 39 3E 43 48 4D 52 57 5C 61 66 6B 70 75 7A 7F 84 89 8E 93 98 9D A2 A7 AC B1 B6 BB C0 C5 CA CF D4 D9 DE E3 E8 ED F2 F7 FC
03 08 0D 12 17 1C 21 26 2B 30 35 3A 3F 44 49 4E 53 58 5D 62 67 6C 71 76 7B 80 85 8A 8F 94 99 9E A3 A8 AD B2 B7 BC C1 C6 CB D0 D5 DA DF E4 E9 EE F3 F8 FD
04 09 0E 13 18 1D 22 27 2C 31 36 3B 40 45 4A 4F 54 59 5E 63 68 6D 72 77 7C 81 86 8B 90 95 9A 9F A4 A9 AE B3 B8 BD C2 C7 CC D1 D6 DB E0 E5 EA EF F4 F9 FE
 
* What bandwidth do you have between the nodes?
* Please increase it quite some more - the error 23 is due to the partial transmit (i expect because the timeout runs out before)
 
hmm - you could test how long a plain rsync would take:
Code:
mkdir -p /tmp/pmgsynctest                                                       
time rsync '--rsh=ssh -l root -o BatchMode=yes -o HostKeyAlias=host2' -q --timeout 120 '[X.X.X.2]:/var/spool/pmg' /tmp/pmgsynctest

Thanks!
 
Output:

root@host1:/var/spool/pmg# time rsync '--rsh=ssh -l root -o BatchMode=yes -o HostKeyAlias=host2' -q --timeout 300 '[X.X.X.2]:/var/spool/pmg' /tmp/pmgsynctest
real 0m0.159s
user 0m0.013s
sys 0m0.002s
 
sorry - I mistakenly pasted the '-q' in the command and forgot that we don't have an explicit file-list - can you try with '-av' instead (adds recursive syncing and verbose output)
 
sent 1,462,908 bytes received 7,522,633,219 bytes 15,184,855.96 bytes/sec
total size is 7,515,180,685 speedup is 1.00

real 8m15.260s
user 1m2.336s
sys 0m36.804s
 
8m15.260s
Thats more than the 300 you set (although rsync should only transfer the delta) - maybe try with a timeout of 1000
(the next run would need to be faster...)
also - once the mirror start you should be able to see a file in the master's 'tmp/' starting with quarantinefilelist. - if possible please copy it and run `wc -l` on it

Thanks!
 
Even with 2000s timeout I still get the same error code.

About the quarantinefile:

root@host2:/tmp# cat quarantinefilelist.20882 | wc -l
1000
 
ok - did you get any error while you ran the rsync command manually?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!