Cluster Rsync Error

Janko

Active Member
May 15, 2011
60
2
28
Hello All,

i have some Issues with my new Cluster Setup.....:
Code:
Feb 03 08:18:06 mailgate01.example.com pmgmirror[897]: starting cluster syncronization
Feb 03 08:18:06 mailgate01.example.com pmgmirror[897]: database sync 'mailgate02' failed - DBI connect('dbname=Proxmox_ruledb;host=/var/run/pmgtunnel;port=2;','root',...) failed: could not connect to server: No such file or directory
Is the server running locally and accepting
connections on Unix domain socket "/var/run/pmgtunnel/.s.PGSQL.2"? at /usr/share/perl5/PMG/DBTools.pm line 59.
Feb 03 08:18:06 mailgate01.example.com pmgmirror[897]: cluster syncronization finished (1 errors, 0.01 seconds (files 0.00, database 0.01, config 0.00))
Feb 03 08:18:21 mailgate01.example.com pmgtunnel[850]: restarting crashed tunnel 1452 88.99.32.213
Feb 03 08:18:21 mailgate01.example.com pmgtunnel[850]: tunnel finished 1452 88.99.32.213

can anybody give me an advice

- gerne auch auf Deutsch :)
 

dietmar

Proxmox Staff Member
Staff member
Apr 28, 2005
17,002
467
103
Austria
www.proxmox.com
Please make sure you can connect to the other host via ssh without password (in both direction). If that works, make sure pmgtunnel service is running and stable (no crashed tunnel messages).
 

tom

Proxmox Staff Member
Staff member
Aug 29, 2006
15,250
807
163
And make sure that you run latest packages, compare your list with this one:

Code:
# pmgversion -v

proxmox-mailgateway: 5.0-7 (API: 5.0-61/71d9a758, running kernel: 4.13.13-5-pve)
pmg-api: 5.0-61
pmg-gui: 1.0-31
proxmox-spamassassin: 3.4.1-54
proxmox-widget-toolkit: 1.0-10
pve-kernel-4.13.13-5-pve: 4.13.13-38
libpve-http-server-perl: 2.0-8
lvm2: 2.02.168-2
pve-firmware: 2.0-3
libpve-common-perl: 5.0-27
pmg-docs: 5.0-13
pve-xtermjs: 1.0-2
libarchive-perl: 3.2.1-1
libxdgmime-perl: 0.01-3
zfsutils-linux: 0.7.3-pve1~bpo9
libpve-apiclient-perl: 2.0-2
 

Janko

Active Member
May 15, 2011
60
2
28
Hello Tom,
Thanks for the answer.

I found the solution myself.
I changed the SSH port.
 

DerDanilo

Well-Known Member
Jan 21, 2017
432
92
48
Is it possible to use the cluster with custom ssh ports?

@proxmox Team
Basically 'pmgcm' could just read the configured ssh port from '/etc/services' and use the port that is set down there.
It is not advisable to run default ssh port when the server is public available, running as VM within different PVE nodes in different DCs.

EDIT:
How long is the system supposed to show "syncing"? According to 'iftop' there is not much traffic ongoing.
 
Last edited:

tom

Proxmox Staff Member
Staff member
Aug 29, 2006
15,250
807
163
Is it possible to use the cluster with custom ssh ports?

@proxmox Team
Basically 'pmgcm' could just read the configured ssh port from '/etc/services' and use the port that is set down there.
It is not advisable to run default ssh port when the server is public available, running as VM within different PVE nodes in different DCs.

Changing a port does not increase your security. Instead, you should protect your services with a firewall.

EDIT:
How long is the system supposed to show "syncing"? According to 'iftop' there is not much traffic ongoing.

A few minutes in the first run. Just check the logs via GUI for errors (pmgmirror, pmgtunnel).
 

DerDanilo

Well-Known Member
Jan 21, 2017
432
92
48
Changing a port does not increase your security. Instead, you should protect your services with a firewall.

That is true, but it saves some load on the server from "script kiddies".

On the master node 'pmgtunnel' and 'pmgmirror' seem to be fine if I understand the output correct:


Code:
Feb 19 17:28:01 pmg1 pmgtunnel[889]: starting server
Feb 19 17:28:01 pmg1 pmgtunnel[889]: starting tunnel 890 <ip of pmg2>
Feb 19 19:21:15 pmg1 pmgtunnel[889]: tunnel finished 890 <ip of pmg2>
Feb 19 19:21:42 pmg1 pmgtunnel[861]: starting server
Feb 19 19:21:42 pmg1 pmgtunnel[861]: starting tunnel 862 <ip of pmg2>
Code:
Feb 19 20:11:46 pmg1 pmgmirror[919]: starting cluster syncronization
Feb 19 20:11:46 pmg1 pmgmirror[919]: cluster syncronization finished  (0 errors, 0.21 seconds (files 0.13, database 0.08, config 0.00))

On pmg2 I can see some rsync errors though.

Code:
Feb 19 20:15:41 pmg2 pmgmirror[819]: starting cluster syncronization
Feb 19 20:15:41 pmg2 pmgmirror[819]: database sync 'pmg1' failed - command 'rsync '--rsh=ssh -l root -o BatchMode=yes -o HostKeyAlias=pmg1' -q --timeout 10 <ip of pmg1>:/var/spool/pmg /var/spool/pmg --files-from /tmp/quarantinefilelist.819' failed: exit code 23
Feb 19 20:15:41 pmg2 pmgmirror[819]: cluster syncronization finished  (1 errors, 0.52 seconds (files 0.00, database 0.38, config 0.14))

The file
Code:
 /tmp/quarantinefilelist.819
doesn't seem to exist though. But then again it's in /tmp/.

When I change something either on master or slave node the changes are visible on both nodes immediately.

I disabled the firewall temporarily. But no changes.

Did I miss something?
Thanks for your help!
 

tom

Proxmox Staff Member
Staff member
Aug 29, 2006
15,250
807
163
Looks like you missed to install latest updates.

Update to latest and test again, also post your version.

> pmgversion -v
 

DerDanilo

Well-Known Member
Jan 21, 2017
432
92
48
Looks like you missed to install latest updates.
Update to latest and test again, also post your version.
> pmgversion -v

Both systems up to date with the 'no-subscription' repository.Also did reboots today to make sure that latest kernel applies.

Same pmgversion on both hosts.

Code:
root@pmg1:~# pmgversion -v
proxmox-mailgateway: 5.0-7 (API: 5.0-61/71d9a758, running kernel: 4.13.13-5-pve)
pmg-api: 5.0-61
pmg-gui: 1.0-31
proxmox-spamassassin: 3.4.1-54
proxmox-widget-toolkit: 1.0-10
pve-kernel-4.13.13-5-pve: 4.13.13-38
libpve-http-server-perl: 2.0-8
lvm2: 2.02.168-2
pve-firmware: 2.0-3
libpve-common-perl: 5.0-27
pmg-docs: 5.0-13
pve-xtermjs: 1.0-2
libarchive-perl: 3.2.1-1
libxdgmime-perl: 0.01-3
zfsutils-linux: 0.7.3-pve1~bpo9
libpve-apiclient-perl: 2.0-2

Code:
root@pmg2:/var/log# pmgversion -v
proxmox-mailgateway: 5.0-7 (API: 5.0-61/71d9a758, running kernel: 4.13.13-5-pve)
pmg-api: 5.0-61
pmg-gui: 1.0-31
proxmox-spamassassin: 3.4.1-54
proxmox-widget-toolkit: 1.0-10
pve-kernel-4.13.13-5-pve: 4.13.13-38
libpve-http-server-perl: 2.0-8
lvm2: 2.02.168-2
pve-firmware: 2.0-3
libpve-common-perl: 5.0-27
pmg-docs: 5.0-13
pve-xtermjs: 1.0-2
libarchive-perl: 3.2.1-1
libxdgmime-perl: 0.01-3
zfsutils-linux: 0.7.3-pve1~bpo9
libpve-apiclient-perl: 2.0-2

EDIT: Does it maybe throw an error regarding the DB as there currently is not yet any DB? Everything seems to work fine so far. Just the status doesn't really change. I just switched non critical domains to the PMG setup and will have a look tomorrow how it's doing.
 
Last edited:

DerDanilo

Well-Known Member
Jan 21, 2017
432
92
48
@tom The quarantine and tracking database seems to be corrupted as it always shows "no data in database" but there surely is some (spam reports).

How can one reset it so the cluster initializes it again?

Thanks
 

tom

Proxmox Staff Member
Staff member
Aug 29, 2006
15,250
807
163
Based on your other threads, I assume one of your custom changes breaks something.?

You can delete a cluster nodes with:

> pmgcm delete <cdid>
 

DerDanilo

Well-Known Member
Jan 21, 2017
432
92
48
When I have had trouble with the cluster join I removed the cluster.conf manually and re-created the cluster. Most probably that messed something up.

If I am going to "reset" the whole cluster including all nodes, what would be the cleanest way to do so without having to re-deploy the nodes?
I can create and download a config backup before doing so, so I should be fine regarding my config changes.

Thanks!

EDIT: I did not find anything regarding resetting PMG clusters in the docu. For PVE there was some docu somewhere regarding cluster resets. It would also be nice if one could regenerate the self signed certificates just like with
Code:
pvecm updatecerts --force
. :)
 
Last edited:

DerDanilo

Well-Known Member
Jan 21, 2017
432
92
48
I just reinstalled the whole cluster and will report if this still works afterwards. The master node at least shows "no data in database" again after the import of the backup.

collected all command into an "initsetup.sh" bash script. I'll stripe it of personal information and upload it to github for others to use if required.
I'll also create an heat template for the mailgateway since it can be simply installed on running debian systems. The launch should then be even faster in any cloud system that supports heat.

While joining pmg2 it shows the following output now:

Code:
root@pmg2:~# pmgcm join <ip of master> --fingerprint <fingerprint of master cert> 3:29
Enter password: ************
stop all services accessing the database
save new cluster configuration
cluster node successfully joined
updated /etc/pmg/pmg-authkey.key
updated /etc/pmg/pmg-authkey.pub
updated /etc/pmg/pmg-csrf.key
updated /etc/pmg/domains
updated /etc/pmg/pmg.conf
copying master database from '<ip of master>'
copying master database finished (got 34453 bytes)
delete local database
could not change directory to "/root": Permission denied
create new local database
could not change directory to "/root": Permission denied
insert received data into local database
creating indexes
run analyze to speed up database queries
could not change directory to "/root": Permission denied
ANALYZE
syncing quarantine data
syncing quarantine data finished

Are these root permission errors normal?
I am root and there is no sudo installed.

Error message for pmgmirror:

Code:
Feb 20 17:39:16 pmg1 pmgmirror[923]: starting cluster syncronization
Feb 20 17:39:22 pmg1 pmgmirror[923]: database sync 'pmg2' failed - DBD::Pg::st execute failed: ERROR:  duplicate key value violates unique constraint "cstatistic_pkey"#012DETAIL:  Key (cid, rid)=(2, 1) already exists. at /usr/share/perl5/PMG/DBTools.pm line 1032.
Feb 20 17:39:22 pmg1 pmgmirror[923]: cluster syncronization finished  (1 errors, 6.11 seconds (files 6.03, database 0.08, config 0.00))

Can i reset the database manually? The sync of statistics seems to work fine now.
 
Last edited:

DerDanilo

Well-Known Member
Jan 21, 2017
432
92
48
After rolling to pre-cluster snapshots, I restored the config without statistics and then created the cluster + joined the node.
Now it seems to work fine.

I don't get though why my self-signed certificates where not accepted by the Proxmox join command. It only worked with the "original" pmg-api.pem.
@tom Any idea why?

How do private certs need to be generated to be supported by the mailgateway?
 

DerDanilo

Well-Known Member
Jan 21, 2017
432
92
48
Yes it is possible, but for the api you will have to change the fingerprint every time you renew the certificate.this is to complicated and does not add much security. The fingerprints are enough in my opinion.

Therefore I us haproxy to provide the webinterface with LE cert via an alternative port.
The LE cert works fine for TLS encryption with postfix.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!