I am running a 3 node cluster with local storage (zfs) and latest version of pve:
proxmox-ve: 5.2-2 (running kernel: 4.15.18-2-pve)
pve-manager: 5.2-7 (running version: 5.2-7/8d88e66a)
pve-kernel-4.15: 5.2-5
pve-kernel-4.15.18-2-pve: 4.15.18-20
pve-kernel-4.15.18-1-pve: 4.15.18-19
corosync: 2.4.2-pve5
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.0-8
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-38
libpve-guest-common-perl: 2.0-17
libpve-http-server-perl: 2.0-10
libpve-storage-perl: 5.0-24
libqb0: 1.0.1-1
lvm2: 2.02.168-pve6
lxc-pve: 3.0.2+pve1-1
lxcfs: 3.0.0-1
novnc-pve: 1.0.0-2
proxmox-widget-toolkit: 1.0-19
pve-cluster: 5.0-29
pve-container: 2.0-25
pve-docs: 5.2-8
pve-firewall: 3.0-13
pve-firmware: 2.0-5
pve-ha-manager: 2.0-5
pve-i18n: 1.0-6
pve-libspice-server1: 0.12.8-3
pve-qemu-kvm: 2.11.2-1
pve-xtermjs: 1.0-5
pve-zsync: 1.6-16
qemu-server: 5.0-32
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.9-pve1~bpo9
Despite a fix for storage replication [1] I still experience the same issue that replication jobs are sporadically failing:
[1] https://forum.proxmox.com/threads/storage-replication-constantly-failing.43347/#post-221446
From /var/log/syslog
Replication job log
The next try actually always succeeds.
Please let me know if you need more information that I can provide.
proxmox-ve: 5.2-2 (running kernel: 4.15.18-2-pve)
pve-manager: 5.2-7 (running version: 5.2-7/8d88e66a)
pve-kernel-4.15: 5.2-5
pve-kernel-4.15.18-2-pve: 4.15.18-20
pve-kernel-4.15.18-1-pve: 4.15.18-19
corosync: 2.4.2-pve5
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.0-8
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-38
libpve-guest-common-perl: 2.0-17
libpve-http-server-perl: 2.0-10
libpve-storage-perl: 5.0-24
libqb0: 1.0.1-1
lvm2: 2.02.168-pve6
lxc-pve: 3.0.2+pve1-1
lxcfs: 3.0.0-1
novnc-pve: 1.0.0-2
proxmox-widget-toolkit: 1.0-19
pve-cluster: 5.0-29
pve-container: 2.0-25
pve-docs: 5.2-8
pve-firewall: 3.0-13
pve-firmware: 2.0-5
pve-ha-manager: 2.0-5
pve-i18n: 1.0-6
pve-libspice-server1: 0.12.8-3
pve-qemu-kvm: 2.11.2-1
pve-xtermjs: 1.0-5
pve-zsync: 1.6-16
qemu-server: 5.0-32
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.9-pve1~bpo9
Despite a fix for storage replication [1] I still experience the same issue that replication jobs are sporadically failing:
[1] https://forum.proxmox.com/threads/storage-replication-constantly-failing.43347/#post-221446
From /var/log/syslog
Sep 6 11:11:00 pve1 systemd[1]: Starting Proxmox VE replication runner...
Sep 6 11:11:00 pve1 systemd[1]: Started Session 265785 of user root.
Sep 6 11:11:01 pve1 systemd[1]: Started Session 265786 of user root.
Sep 6 11:11:01 pve1 zed: eid=1000089 class=history_event pool_guid=0x47C9A96EEDAF31FB
Sep 6 11:11:02 pve1 systemd[1]: Started Session 265787 of user root.
Sep 6 11:11:02 pve1 pvesr[3987]: send from @__replicate_100-0_1536225000__ to rpool/data/vm-100-disk-1@__replicate_100-0_1536225060__ estimated size is 1.13M
Sep 6 11:11:02 pve1 pvesr[3987]: total estimated size is 1.13M
Sep 6 11:11:02 pve1 zed: eid=1000090 class=history_event pool_guid=0x47C9A96EEDAF31FB
Sep 6 11:11:02 pve1 pvesr[3987]: TIME SENT SNAPSHOT
Sep 6 11:11:02 pve1 zed: eid=1000091 class=history_event pool_guid=0x47C9A96EEDAF31FB
Sep 6 11:11:02 pve1 zed: eid=1000092 class=history_event pool_guid=0x47C9A96EEDAF31FB
Sep 6 11:11:02 pve1 zed: eid=1000093 class=history_event pool_guid=0x47C9A96EEDAF31FB
Sep 6 11:11:02 pve1 pvesr[3987]: cannot receive incremental stream: checksum mismatch or incomplete stream
Sep 6 11:11:02 pve1 pvesr[3987]: command 'zfs recv -F -- rpool/data/vm-100-disk-1' failed: exit code 1
Sep 6 11:11:02 pve1 pvesr[3987]: exit code 255
Sep 6 11:11:02 pve1 pvesr[3987]: send/receive failed, cleaning up snapshot(s)..
Sep 6 11:11:02 pve1 zed: eid=1000094 class=history_event pool_guid=0x47C9A96EEDAF31FB
Sep 6 11:11:02 pve1 pvesr[3987]: 100-0: got unexpected replication job error - import failed: exit code 29
Replication job log
2018-09-06 13:50:00 100-0: start replication job
2018-09-06 13:50:00 100-0: guest => VM 100, running => 2320
2018-09-06 13:50:00 100-0: volumes => local-SSD:vm-100-disk-1
2018-09-06 13:50:01 100-0: freeze guest filesystem
2018-09-06 13:50:01 100-0: create snapshot '__replicate_100-0_1536234600__' on local-SSD:vm-100-disk-1
2018-09-06 13:50:01 100-0: thaw guest filesystem
2018-09-06 13:50:01 100-0: incremental sync 'local-SSD:vm-100-disk-1' (__replicate_100-0_1536234540__ => __replicate_100-0_1536234600__)
2018-09-06 13:50:02 100-0: delete previous replication snapshot '__replicate_100-0_1536234600__' on local-SSD:vm-100-disk-1
2018-09-06 13:50:02 100-0: end replication job with error: import failed: exit code 29
The next try actually always succeeds.
Please let me know if you need more information that I can provide.
Last edited: