Hello,
I've recently setup 2 nodes (5.1). 1 Being a master that use PVESR to replicate the the other slave node.
Everything worked for a week but now one KVM out of the bunch won't replicate onto the slave node. I'm getting an error that the volume already exist but to my understanding it's fine since it should overwrite it.
Here are the logs of my job
From what I can see it seems like it's trying to send the whole disk again (which already exist on the target node) instead of just the snapshot
And my pveversion -v
Any idea what might cause this ?
Also is there a way to received email notification on failed job sync ?
Thanks
I've recently setup 2 nodes (5.1). 1 Being a master that use PVESR to replicate the the other slave node.
Everything worked for a week but now one KVM out of the bunch won't replicate onto the slave node. I'm getting an error that the volume already exist but to my understanding it's fine since it should overwrite it.
Here are the logs of my job
Code:
2017-11-03 16:04:01 128-0: start replication job
2017-11-03 16:04:01 128-0: guest => VM 128, running => 6216
2017-11-03 16:04:01 128-0: volumes => local-zfs:vm-128-disk-1
2017-11-03 16:04:02 128-0: freeze guest filesystem
2017-11-03 16:04:10 128-0: create snapshot '__replicate_128-0_1509739441__' on local-zfs:vm-128-disk-1
2017-11-03 16:04:10 128-0: thaw guest filesystem
2017-11-03 16:04:14 128-0: full sync 'local-zfs:vm-128-disk-1' (__replicate_128-0_1509739441__)
2017-11-03 16:04:15 128-0: full send of rpool/data/vm-128-disk-1@__replicate_128-0_1509739441__ estimated size is 79.8G
2017-11-03 16:04:15 128-0: total estimated size is 79.8G
2017-11-03 16:04:15 128-0: TIME SENT SNAPSHOT
2017-11-03 16:04:15 128-0: rpool/data/vm-128-disk-1 name rpool/data/vm-128-disk-1 -
2017-11-03 16:04:15 128-0: volume 'rpool/data/vm-128-disk-1' already exists
2017-11-03 16:04:15 128-0: warning: cannot send 'rpool/data/vm-128-disk-1@__replicate_128-0_1509739441__': signal received
2017-11-03 16:04:15 128-0: cannot send 'rpool/data/vm-128-disk-1': I/O error
2017-11-03 16:04:15 128-0: command 'zfs send -Rpv -- rpool/data/vm-128-disk-1@__replicate_128-0_1509739441__' failed: exit code 1
2017-11-03 16:04:15 128-0: delete previous replication snapshot '__replicate_128-0_1509739441__' on local-zfs:vm-128-disk-1
2017-11-03 16:04:16 128-0: end replication job with error: command 'set -o pipefail && pvesm export local-zfs:vm-128-disk-1 zfs - -with-snapshots 1 -snapshot __replicate_128-0_1509739441__ | /usr/bin/ssh -o 'BatchMode=yes' -o 'HostKeyAlias=dev-proxmox-2' root@192.168.1.173 -- pvesm import local-zfs:vm-128-disk-1 zfs - -with-snapshots 1' failed: exit code 255
From what I can see it seems like it's trying to send the whole disk again (which already exist on the target node) instead of just the snapshot
And my pveversion -v
Code:
proxmox-ve: 5.1-25 (running kernel: 4.13.4-1-pve)
pve-manager: 5.1-35 (running version: 5.1-35/722cc488)
pve-kernel-4.13.4-1-pve: 4.13.4-25
libpve-http-server-perl: 2.0-6
lvm2: 2.02.168-pve6
corosync: 2.4.2-pve3
libqb0: 1.0.1-1
pve-cluster: 5.0-15
qemu-server: 5.0-17
pve-firmware: 2.0-3
libpve-common-perl: 5.0-20
libpve-guest-common-perl: 2.0-13
libpve-access-control: 5.0-7
libpve-storage-perl: 5.0-16
pve-libspice-server1: 0.12.8-3
vncterm: 1.5-2
pve-docs: 5.1-12
pve-qemu-kvm: 2.9.1-2
pve-container: 2.0-17
pve-firewall: 3.0-3
pve-ha-manager: 2.0-3
ksm-control-daemon: 1.2-2
glusterfs-client: 3.8.8-1
lxc-pve: 2.1.0-2
lxcfs: 2.0.7-pve4
criu: 2.11.1-1~bpo90
novnc-pve: 0.6-4
smartmontools: 6.5+svn4324-1
zfsutils-linux: 0.7.2-pve1~bpo90
Any idea what might cause this ?
Also is there a way to received email notification on failed job sync ?
Thanks
Last edited: