Cluster Rsync Error

Discussion in 'Mail Gateway: HA Cluster' started by Janko, Feb 3, 2018.

  1. Janko

    Janko New Member

    Joined:
    May 15, 2011
    Messages:
    21
    Likes Received:
    0
    Hello All,

    i have some Issues with my new Cluster Setup.....:
    Code:
    Feb 03 08:18:06 mailgate01.example.com pmgmirror[897]: starting cluster syncronization
    Feb 03 08:18:06 mailgate01.example.com pmgmirror[897]: database sync 'mailgate02' failed - DBI connect('dbname=Proxmox_ruledb;host=/var/run/pmgtunnel;port=2;','root',...) failed: could not connect to server: No such file or directory
    Is the server running locally and accepting
    connections on Unix domain socket "/var/run/pmgtunnel/.s.PGSQL.2"? at /usr/share/perl5/PMG/DBTools.pm line 59.
    Feb 03 08:18:06 mailgate01.example.com pmgmirror[897]: cluster syncronization finished (1 errors, 0.01 seconds (files 0.00, database 0.01, config 0.00))
    Feb 03 08:18:21 mailgate01.example.com pmgtunnel[850]: restarting crashed tunnel 1452 88.99.32.213
    Feb 03 08:18:21 mailgate01.example.com pmgtunnel[850]: tunnel finished 1452 88.99.32.213

    can anybody give me an advice

    - gerne auch auf Deutsch :)
     
  2. dietmar

    dietmar Proxmox Staff Member
    Staff Member

    Joined:
    Apr 28, 2005
    Messages:
    16,056
    Likes Received:
    257
    Please make sure you can connect to the other host via ssh without password (in both direction). If that works, make sure pmgtunnel service is running and stable (no crashed tunnel messages).
     
  3. tom

    tom Proxmox Staff Member
    Staff Member

    Joined:
    Aug 29, 2006
    Messages:
    12,865
    Likes Received:
    312
    And make sure that you run latest packages, compare your list with this one:

    Code:
    # pmgversion -v
    
    proxmox-mailgateway: 5.0-7 (API: 5.0-61/71d9a758, running kernel: 4.13.13-5-pve)
    pmg-api: 5.0-61
    pmg-gui: 1.0-31
    proxmox-spamassassin: 3.4.1-54
    proxmox-widget-toolkit: 1.0-10
    pve-kernel-4.13.13-5-pve: 4.13.13-38
    libpve-http-server-perl: 2.0-8
    lvm2: 2.02.168-2
    pve-firmware: 2.0-3
    libpve-common-perl: 5.0-27
    pmg-docs: 5.0-13
    pve-xtermjs: 1.0-2
    libarchive-perl: 3.2.1-1
    libxdgmime-perl: 0.01-3
    zfsutils-linux: 0.7.3-pve1~bpo9
    libpve-apiclient-perl: 2.0-2
    
     
  4. Janko

    Janko New Member

    Joined:
    May 15, 2011
    Messages:
    21
    Likes Received:
    0
    Hello Tom,
    Thanks for the answer.

    I found the solution myself.
    I changed the SSH port.
     
  5. DerDanilo

    DerDanilo Member

    Joined:
    Jan 21, 2017
    Messages:
    189
    Likes Received:
    11
    Is it possible to use the cluster with custom ssh ports?

    @proxmox Team
    Basically 'pmgcm' could just read the configured ssh port from '/etc/services' and use the port that is set down there.
    It is not advisable to run default ssh port when the server is public available, running as VM within different PVE nodes in different DCs.

    EDIT:
    How long is the system supposed to show "syncing"? According to 'iftop' there is not much traffic ongoing.
     
    #5 DerDanilo, Feb 19, 2018
    Last edited: Feb 19, 2018
  6. tom

    tom Proxmox Staff Member
    Staff Member

    Joined:
    Aug 29, 2006
    Messages:
    12,865
    Likes Received:
    312
    Changing a port does not increase your security. Instead, you should protect your services with a firewall.

    A few minutes in the first run. Just check the logs via GUI for errors (pmgmirror, pmgtunnel).
     
  7. DerDanilo

    DerDanilo Member

    Joined:
    Jan 21, 2017
    Messages:
    189
    Likes Received:
    11
    That is true, but it saves some load on the server from "script kiddies".

    On the master node 'pmgtunnel' and 'pmgmirror' seem to be fine if I understand the output correct:


    Code:
    Feb 19 17:28:01 pmg1 pmgtunnel[889]: starting server
    Feb 19 17:28:01 pmg1 pmgtunnel[889]: starting tunnel 890 <ip of pmg2>
    Feb 19 19:21:15 pmg1 pmgtunnel[889]: tunnel finished 890 <ip of pmg2>
    Feb 19 19:21:42 pmg1 pmgtunnel[861]: starting server
    Feb 19 19:21:42 pmg1 pmgtunnel[861]: starting tunnel 862 <ip of pmg2>
    
    Code:
    Feb 19 20:11:46 pmg1 pmgmirror[919]: starting cluster syncronization
    Feb 19 20:11:46 pmg1 pmgmirror[919]: cluster syncronization finished  (0 errors, 0.21 seconds (files 0.13, database 0.08, config 0.00))
    On pmg2 I can see some rsync errors though.

    Code:
    Feb 19 20:15:41 pmg2 pmgmirror[819]: starting cluster syncronization
    Feb 19 20:15:41 pmg2 pmgmirror[819]: database sync 'pmg1' failed - command 'rsync '--rsh=ssh -l root -o BatchMode=yes -o HostKeyAlias=pmg1' -q --timeout 10 <ip of pmg1>:/var/spool/pmg /var/spool/pmg --files-from /tmp/quarantinefilelist.819' failed: exit code 23
    Feb 19 20:15:41 pmg2 pmgmirror[819]: cluster syncronization finished  (1 errors, 0.52 seconds (files 0.00, database 0.38, config 0.14))
    
    The file
    Code:
     /tmp/quarantinefilelist.819
    doesn't seem to exist though. But then again it's in /tmp/.

    When I change something either on master or slave node the changes are visible on both nodes immediately.

    I disabled the firewall temporarily. But no changes.

    Did I miss something?
    Thanks for your help!
     
  8. tom

    tom Proxmox Staff Member
    Staff Member

    Joined:
    Aug 29, 2006
    Messages:
    12,865
    Likes Received:
    312
    Looks like you missed to install latest updates.

    Update to latest and test again, also post your version.

    > pmgversion -v
     
  9. DerDanilo

    DerDanilo Member

    Joined:
    Jan 21, 2017
    Messages:
    189
    Likes Received:
    11
    Both systems up to date with the 'no-subscription' repository.Also did reboots today to make sure that latest kernel applies.

    Same pmgversion on both hosts.

    Code:
    root@pmg1:~# pmgversion -v
    proxmox-mailgateway: 5.0-7 (API: 5.0-61/71d9a758, running kernel: 4.13.13-5-pve)
    pmg-api: 5.0-61
    pmg-gui: 1.0-31
    proxmox-spamassassin: 3.4.1-54
    proxmox-widget-toolkit: 1.0-10
    pve-kernel-4.13.13-5-pve: 4.13.13-38
    libpve-http-server-perl: 2.0-8
    lvm2: 2.02.168-2
    pve-firmware: 2.0-3
    libpve-common-perl: 5.0-27
    pmg-docs: 5.0-13
    pve-xtermjs: 1.0-2
    libarchive-perl: 3.2.1-1
    libxdgmime-perl: 0.01-3
    zfsutils-linux: 0.7.3-pve1~bpo9
    libpve-apiclient-perl: 2.0-2
    
    Code:
    root@pmg2:/var/log# pmgversion -v
    proxmox-mailgateway: 5.0-7 (API: 5.0-61/71d9a758, running kernel: 4.13.13-5-pve)
    pmg-api: 5.0-61
    pmg-gui: 1.0-31
    proxmox-spamassassin: 3.4.1-54
    proxmox-widget-toolkit: 1.0-10
    pve-kernel-4.13.13-5-pve: 4.13.13-38
    libpve-http-server-perl: 2.0-8
    lvm2: 2.02.168-2
    pve-firmware: 2.0-3
    libpve-common-perl: 5.0-27
    pmg-docs: 5.0-13
    pve-xtermjs: 1.0-2
    libarchive-perl: 3.2.1-1
    libxdgmime-perl: 0.01-3
    zfsutils-linux: 0.7.3-pve1~bpo9
    libpve-apiclient-perl: 2.0-2
    
    EDIT: Does it maybe throw an error regarding the DB as there currently is not yet any DB? Everything seems to work fine so far. Just the status doesn't really change. I just switched non critical domains to the PMG setup and will have a look tomorrow how it's doing.
     
    #9 DerDanilo, Feb 19, 2018
    Last edited: Feb 19, 2018
  10. DerDanilo

    DerDanilo Member

    Joined:
    Jan 21, 2017
    Messages:
    189
    Likes Received:
    11
    @tom The quarantine and tracking database seems to be corrupted as it always shows "no data in database" but there surely is some (spam reports).

    How can one reset it so the cluster initializes it again?

    Thanks
     
  11. tom

    tom Proxmox Staff Member
    Staff Member

    Joined:
    Aug 29, 2006
    Messages:
    12,865
    Likes Received:
    312
    Based on your other threads, I assume one of your custom changes breaks something.?

    You can delete a cluster nodes with:

    > pmgcm delete <cdid>
     
  12. DerDanilo

    DerDanilo Member

    Joined:
    Jan 21, 2017
    Messages:
    189
    Likes Received:
    11
    When I have had trouble with the cluster join I removed the cluster.conf manually and re-created the cluster. Most probably that messed something up.

    If I am going to "reset" the whole cluster including all nodes, what would be the cleanest way to do so without having to re-deploy the nodes?
    I can create and download a config backup before doing so, so I should be fine regarding my config changes.

    Thanks!

    EDIT: I did not find anything regarding resetting PMG clusters in the docu. For PVE there was some docu somewhere regarding cluster resets. It would also be nice if one could regenerate the self signed certificates just like with
    Code:
    pvecm updatecerts --force
    . :)
     
    #12 DerDanilo, Feb 20, 2018
    Last edited: Feb 20, 2018
  13. DerDanilo

    DerDanilo Member

    Joined:
    Jan 21, 2017
    Messages:
    189
    Likes Received:
    11
    I just reinstalled the whole cluster and will report if this still works afterwards. The master node at least shows "no data in database" again after the import of the backup.

    collected all command into an "initsetup.sh" bash script. I'll stripe it of personal information and upload it to github for others to use if required.
    I'll also create an heat template for the mailgateway since it can be simply installed on running debian systems. The launch should then be even faster in any cloud system that supports heat.

    While joining pmg2 it shows the following output now:

    Code:
    root@pmg2:~# pmgcm join <ip of master> --fingerprint <fingerprint of master cert> 3:29
    Enter password: ************
    stop all services accessing the database
    save new cluster configuration
    cluster node successfully joined
    updated /etc/pmg/pmg-authkey.key
    updated /etc/pmg/pmg-authkey.pub
    updated /etc/pmg/pmg-csrf.key
    updated /etc/pmg/domains
    updated /etc/pmg/pmg.conf
    copying master database from '<ip of master>'
    copying master database finished (got 34453 bytes)
    delete local database
    could not change directory to "/root": Permission denied
    create new local database
    could not change directory to "/root": Permission denied
    insert received data into local database
    creating indexes
    run analyze to speed up database queries
    could not change directory to "/root": Permission denied
    ANALYZE
    syncing quarantine data
    syncing quarantine data finished
    Are these root permission errors normal?
    I am root and there is no sudo installed.

    Error message for pmgmirror:

    Code:
    Feb 20 17:39:16 pmg1 pmgmirror[923]: starting cluster syncronization
    Feb 20 17:39:22 pmg1 pmgmirror[923]: database sync 'pmg2' failed - DBD::Pg::st execute failed: ERROR:  duplicate key value violates unique constraint "cstatistic_pkey"#012DETAIL:  Key (cid, rid)=(2, 1) already exists. at /usr/share/perl5/PMG/DBTools.pm line 1032.
    Feb 20 17:39:22 pmg1 pmgmirror[923]: cluster syncronization finished  (1 errors, 6.11 seconds (files 6.03, database 0.08, config 0.00))
    Can i reset the database manually? The sync of statistics seems to work fine now.
     
    #13 DerDanilo, Feb 20, 2018
    Last edited: Feb 20, 2018
  14. DerDanilo

    DerDanilo Member

    Joined:
    Jan 21, 2017
    Messages:
    189
    Likes Received:
    11
    After rolling to pre-cluster snapshots, I restored the config without statistics and then created the cluster + joined the node.
    Now it seems to work fine.

    I don't get though why my self-signed certificates where not accepted by the Proxmox join command. It only worked with the "original" pmg-api.pem.
    @tom Any idea why?

    How do private certs need to be generated to be supported by the mailgateway?
     
  15. hts

    hts New Member

    Joined:
    Mar 16, 2018
    Messages:
    7
    Likes Received:
    0
    You cannot use Let's Encrypt?
     
  16. DerDanilo

    DerDanilo Member

    Joined:
    Jan 21, 2017
    Messages:
    189
    Likes Received:
    11
    Yes it is possible, but for the api you will have to change the fingerprint every time you renew the certificate.this is to complicated and does not add much security. The fingerprints are enough in my opinion.

    Therefore I us haproxy to provide the webinterface with LE cert via an alternative port.
    The LE cert works fine for TLS encryption with postfix.
     
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice