pvesr started replication of a job every minute

mbaldini

Active Member
Nov 7, 2015
170
21
38
Hello

Today I noticed a strange behaviour in pvesr, it started to run a job id every minute even if the job is set to run every 30 minutes. I had to disable the replication to stop it.

pveversion is the latest
Code:
$ pveversion -v
proxmox-ve: 5.1-25 (running kernel: 4.13.4-1-pve)
pve-manager: 5.1-36 (running version: 5.1-36/131401db)
pve-kernel-4.13.4-1-pve: 4.13.4-25
pve-kernel-4.10.15-1-pve: 4.10.15-15
pve-kernel-4.10.17-3-pve: 4.10.17-23
libpve-http-server-perl: 2.0-6
lvm2: 2.02.168-pve6
corosync: 2.4.2-pve3
libqb0: 1.0.1-1
pve-cluster: 5.0-15
qemu-server: 5.0-17
pve-firmware: 2.0-3
libpve-common-perl: 5.0-20
libpve-guest-common-perl: 2.0-13
libpve-access-control: 5.0-7
libpve-storage-perl: 5.0-16
pve-libspice-server1: 0.12.8-3
vncterm: 1.5-2
pve-docs: 5.1-12
pve-qemu-kvm: 2.9.1-2
pve-container: 2.0-17
pve-firewall: 3.0-3
pve-ha-manager: 2.0-3
ksm-control-daemon: 1.2-2
glusterfs-client: 3.8.8-1
lxc-pve: 2.1.0-2
lxcfs: 2.0.7-pve4
criu: 2.11.1-1~bpo90
novnc-pve: 0.6-4
smartmontools: 6.5+svn4324-1
zfsutils-linux: 0.7.2-pve1~bpo90
ceph: 12.2.1-pve3

My pvesr list (task is actually disabled or it will start replicaton every minute again):
Code:
$ pvesr list
JobID                Target                 Schedule  Rate  Enabled
105-0                local/pve-hs-3             */30     5       no

replication.cfg
Code:
$ cat /etc/pve/replication.cfg
local: 105-0
        comment WIN2012
        disable
        target pve-hs-3
        rate 5
        schedule */30


These are the latest pvesr status before I disabled the task
Code:
$ pvesr status
JobID      Enabled    Target                           LastSync             NextSync   Duration  FailCount State
105-0      Yes        local/pve-hs-3        2017-10-29_06:22:00              pending   8.277227          0 OK

$ pvesr status
JobID      Enabled    Target                           LastSync             NextSync   Duration  FailCount State
105-0      Yes        local/pve-hs-3        2017-10-29_06:23:00              pending    7.40767          0 OK

$ pvesr status
JobID      Enabled    Target                           LastSync             NextSync   Duration  FailCount State
105-0      Yes        local/pve-hs-3        2017-10-29_06:24:00              pending   7.896934          0 OK

The replication end with OK but NextSync is always pending

These are the latest logs of the job

Code:
$ tail -f /var/log/pve/replicate/105-0 -n +1
2017-10-29 06:22:00 105-0: start replication job
2017-10-29 06:22:00 105-0: guest => VM 105, running => 27107
2017-10-29 06:22:00 105-0: volumes => local-zfs:vm-105-disk-1,local-zfs:vm-105-disk-2
2017-10-29 06:22:01 105-0: freeze guest filesystem
2017-10-29 06:22:02 105-0: create snapshot '__replicate_105-0_1509254520__' on local-zfs:vm-105-disk-1
2017-10-29 06:22:02 105-0: create snapshot '__replicate_105-0_1509254520__' on local-zfs:vm-105-disk-2
2017-10-29 06:22:03 105-0: thaw guest filesystem
2017-10-29 06:22:04 105-0: incremental sync 'local-zfs:vm-105-disk-1' (__replicate_105-0_1509254340__ => __replicate_105-0_1509254520__)
2017-10-29 06:22:04 105-0: send from @__replicate_105-0_1509254340__ to rpool/data/vm-105-disk-1@__replicate_105-0_1509254520__ estimated size is 4.43M
2017-10-29 06:22:04 105-0: total estimated size is 4.43M
2017-10-29 06:22:05 105-0: TIME        SENT   SNAPSHOT
2017-10-29 06:22:05 105-0: rpool/data/vm-105-disk-1@__replicate_105-0_1509254340__      name    rpool/data/vm-105-disk-1@__replicate_105-0_1509254340__    -
2017-10-29 06:22:06 105-0: incremental sync 'local-zfs:vm-105-disk-2' (__replicate_105-0_1509254340__ => __replicate_105-0_1509254520__)
2017-10-29 06:22:06 105-0: send from @__replicate_105-0_1509254340__ to rpool/data/vm-105-disk-2@__replicate_105-0_1509254520__ estimated size is 174K
2017-10-29 06:22:06 105-0: total estimated size is 174K
2017-10-29 06:22:06 105-0: rpool/data/vm-105-disk-2@__replicate_105-0_1509254340__      name    rpool/data/vm-105-disk-2@__replicate_105-0_1509254340__    -
2017-10-29 06:22:06 105-0: TIME        SENT   SNAPSHOT
2017-10-29 06:22:07 105-0: delete previous replication snapshot '__replicate_105-0_1509254340__' on local-zfs:vm-105-disk-1
2017-10-29 06:22:07 105-0: delete previous replication snapshot '__replicate_105-0_1509254340__' on local-zfs:vm-105-disk-2
2017-10-29 06:22:08 105-0: (remote_finalize_local_job) delete stale replication snapshot '__replicate_105-0_1509254340__' on local-zfs:vm-105-disk-1
2017-10-29 06:22:08 105-0: (remote_finalize_local_job) delete stale replication snapshot '__replicate_105-0_1509254340__' on local-zfs:vm-105-disk-2
2017-10-29 06:22:08 105-0: end replication job

tail: /var/log/pve/replicate/105-0: file truncated
2017-10-29 06:23:00 105-0: start replication job
2017-10-29 06:23:00 105-0: guest => VM 105, running => 27107
2017-10-29 06:23:00 105-0: volumes => local-zfs:vm-105-disk-1,local-zfs:vm-105-disk-2
2017-10-29 06:23:01 105-0: freeze guest filesystem
2017-10-29 06:23:02 105-0: create snapshot '__replicate_105-0_1509254580__' on local-zfs:vm-105-disk-1
2017-10-29 06:23:02 105-0: create snapshot '__replicate_105-0_1509254580__' on local-zfs:vm-105-disk-2
2017-10-29 06:23:02 105-0: thaw guest filesystem
2017-10-29 06:23:04 105-0: incremental sync 'local-zfs:vm-105-disk-1' (__replicate_105-0_1509254520__ => __replicate_105-0_1509254580__)
2017-10-29 06:23:04 105-0: send from @__replicate_105-0_1509254520__ to rpool/data/vm-105-disk-1@__replicate_105-0_1509254580__ estimated size is 2.48M
2017-10-29 06:23:04 105-0: total estimated size is 2.48M
2017-10-29 06:23:04 105-0: rpool/data/vm-105-disk-1@__replicate_105-0_1509254520__      name    rpool/data/vm-105-disk-1@__replicate_105-0_1509254520__    -
2017-10-29 06:23:05 105-0: TIME        SENT   SNAPSHOT
2017-10-29 06:23:05 105-0: incremental sync 'local-zfs:vm-105-disk-2' (__replicate_105-0_1509254520__ => __replicate_105-0_1509254580__)
2017-10-29 06:23:06 105-0: send from @__replicate_105-0_1509254520__ to rpool/data/vm-105-disk-2@__replicate_105-0_1509254580__ estimated size is 133K
2017-10-29 06:23:06 105-0: total estimated size is 133K
2017-10-29 06:23:06 105-0: TIME        SENT   SNAPSHOT
2017-10-29 06:23:06 105-0: rpool/data/vm-105-disk-2@__replicate_105-0_1509254520__      name    rpool/data/vm-105-disk-2@__replicate_105-0_1509254520__    -
2017-10-29 06:23:06 105-0: delete previous replication snapshot '__replicate_105-0_1509254520__' on local-zfs:vm-105-disk-1
2017-10-29 06:23:07 105-0: delete previous replication snapshot '__replicate_105-0_1509254520__' on local-zfs:vm-105-disk-2
2017-10-29 06:23:07 105-0: (remote_finalize_local_job) delete stale replication snapshot '__replicate_105-0_1509254520__' on local-zfs:vm-105-disk-1
2017-10-29 06:23:07 105-0: (remote_finalize_local_job) delete stale replication snapshot '__replicate_105-0_1509254520__' on local-zfs:vm-105-disk-2
2017-10-29 06:23:08 105-0: end replication job

tail: /var/log/pve/replicate/105-0: file truncated
2017-10-29 06:24:00 105-0: start replication job
2017-10-29 06:24:00 105-0: guest => VM 105, running => 27107
2017-10-29 06:24:00 105-0: volumes => local-zfs:vm-105-disk-1,local-zfs:vm-105-disk-2
2017-10-29 06:24:01 105-0: freeze guest filesystem
2017-10-29 06:24:02 105-0: create snapshot '__replicate_105-0_1509254640__' on local-zfs:vm-105-disk-1
2017-10-29 06:24:02 105-0: create snapshot '__replicate_105-0_1509254640__' on local-zfs:vm-105-disk-2
2017-10-29 06:24:03 105-0: thaw guest filesystem
2017-10-29 06:24:04 105-0: incremental sync 'local-zfs:vm-105-disk-1' (__replicate_105-0_1509254580__ => __replicate_105-0_1509254640__)
2017-10-29 06:24:04 105-0: send from @__replicate_105-0_1509254580__ to rpool/data/vm-105-disk-1@__replicate_105-0_1509254640__ estimated size is 2.48M
2017-10-29 06:24:04 105-0: total estimated size is 2.48M
2017-10-29 06:24:05 105-0: rpool/data/vm-105-disk-1@__replicate_105-0_1509254580__      name    rpool/data/vm-105-disk-1@__replicate_105-0_1509254580__    -
2017-10-29 06:24:05 105-0: TIME        SENT   SNAPSHOT
2017-10-29 06:24:06 105-0: incremental sync 'local-zfs:vm-105-disk-2' (__replicate_105-0_1509254580__ => __replicate_105-0_1509254640__)
2017-10-29 06:24:06 105-0: send from @__replicate_105-0_1509254580__ to rpool/data/vm-105-disk-2@__replicate_105-0_1509254640__ estimated size is 133K
2017-10-29 06:24:06 105-0: total estimated size is 133K
2017-10-29 06:24:06 105-0: rpool/data/vm-105-disk-2@__replicate_105-0_1509254580__      name    rpool/data/vm-105-disk-2@__replicate_105-0_1509254580__    -
2017-10-29 06:24:06 105-0: TIME        SENT   SNAPSHOT
2017-10-29 06:24:07 105-0: delete previous replication snapshot '__replicate_105-0_1509254580__' on local-zfs:vm-105-disk-1
2017-10-29 06:24:07 105-0: delete previous replication snapshot '__replicate_105-0_1509254580__' on local-zfs:vm-105-disk-2
2017-10-29 06:24:08 105-0: (remote_finalize_local_job) delete stale replication snapshot '__replicate_105-0_1509254580__' on local-zfs:vm-105-disk-1
2017-10-29 06:24:08 105-0: (remote_finalize_local_job) delete stale replication snapshot '__replicate_105-0_1509254580__' on local-zfs:vm-105-disk-2
2017-10-29 06:24:08 105-0: end replication job

Is there something wrong in configuration or is pvesr bugging?
 

mbaldini

Active Member
Nov 7, 2015
170
21
38
There could be something related to daylight saving time? This night in Italy the clocks have been put back 1 hour
 

cbx

Active Member
Mar 2, 2012
45
1
28
There could be something related to daylight saving time? This night in Italy the clocks have been put back 1 hour
I don't think that is the problem, I have restarted the services and still have the same problem
 

wolfgang

Proxmox Retired Staff
Retired Staff
Oct 1, 2014
6,496
496
103
Hiimbaldini,

can you send me the stat file also

Code:
cat /var/lib/pve-manager/pve-replication-state.json | jq
 

mbaldini

Active Member
Nov 7, 2015
170
21
38
My last replication status:
Code:
$ pvesr status
JobID      Enabled    Target                           LastSync             NextSync   Duration  FailCount State
105-0      No         local/pve-hs-3        2017-10-29_13:42:00              pending 501.631685          0 OK
After that I disabled replication to avoid it starting every minute

Here the stat file:
Code:
$ cat /var/lib/pve-manager/pve-replication-state.json | jq
{
  "105": {
    "local/pve-hs-3": {
      "fail_count": 0,
      "last_sync": 1509280920,
      "last_iteration": 1509280920,
      "last_node": "pve-hs-main",
      "last_try": 1509280920,
      "duration": 501.631685,
      "storeid_list": [
        "local-zfs"
      ]
    }
  }
}
Should I delete the replication task and create it new?
 

wolfgang

Proxmox Retired Staff
Retired Staff
Oct 1, 2014
6,496
496
103
Are all nodes on PVE 5.1?
 

mbaldini

Active Member
Nov 7, 2015
170
21
38
All the nodes are always updated together, so yes, all on the same version.
Actually I remove the replication task, recreated and actually seems working correct
 

ELO

Member
Oct 28, 2018
2
0
6
41
Hi, I'm facing the same issue, but removing the task and adding it back is not helping. Replication still runs every minute.
My replication.cfg has:
Code:
local: 130-0
        target frpve
        schedule */30
        source smpve


and the output from cat /var/lib/pve-manager/pve-replication-state.json | jq:
Code:
{
  "130": {
    "local/frpve": {
      "fail_count": 0,
      "last_sync": 1540751100,
      "last_iteration": 1540751100,
      "duration": 3.564039,
      "last_node": "smpve",
      "storeid_list": [
        "nvme"
      ],
      "last_try": 1540751100
    }
  }
}

all nodes are up-to-date on 5.2.

Any ideas what I should check?

Thanks,

Emmanuel
 

ELO

Member
Oct 28, 2018
2
0
6
41
Checking again this morning, it seems that the scheduling works once again.

Code:
 pvesr status
JobID      Enabled    Target                           LastSync             NextSync   Duration  FailCount State
130-0      Yes        local/frpve           2018-10-29_07:30:00  2018-10-29_08:00:00    2.64201          0 OK

Code:
cat /var/lib/pve-manager/pve-replication-state.json | jq
{
  "130": {
    "local/frpve": {
      "last_iteration": 1540794600,
      "last_try": 1540794600,
      "last_node": "smpve",
      "storeid_list": [
        "nvme"
      ],
      "fail_count": 0,
      "duration": 2.64201,
      "last_sync": 1540794600
    }
  }
}
 

wolfgang

Proxmox Retired Staff
Retired Staff
Oct 1, 2014
6,496
496
103
you can lock in the error log to get more information

/var/log/pve/replicate/
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!