Multiple Replications from Multiple Nodes to Single Storage Node

anson · Oct 28, 2017

Hi,

I am trying to configure Replication of multiple Proxmox nodes (v5.1) to a single storage node (v5.1) in a cluster as per below.

Code:

Node A <> Replicate <> Node Z
Node B <> Replicate <> Node Z
Node C <> Replicate <> Node Z

However, if there are multiple Replications from multiple nodes running at the same time, the chances of failing is very high, with the following errors found in the Replication log.

It is also impossible to resolve this issue until the 1) Replication job is removed and 2) VM's replicated disk image removed from Node Z.

Code:

2017-10-29 00:32:01 103-0: start replication job
2017-10-29 00:32:01 103-0: guest => VM 103, running => 2616
2017-10-29 00:32:01 103-0: volumes => local-zfs:vm-103-disk-1
2017-10-29 00:32:05 103-0: create snapshot '__replicate_103-0_1509208321__' on local-zfs:vm-103-disk-1
2017-10-29 00:32:05 103-0: full sync 'local-zfs:vm-103-disk-1' (__replicate_103-0_1509208321__)
2017-10-29 00:32:05 103-0: full send of rpool/data/vm-103-disk-1@__replicate_103-0_1509208321__ estimated size is 2.70G
2017-10-29 00:32:05 103-0: total estimated size is 2.70G
2017-10-29 00:32:05 103-0: TIME        SENT   SNAPSHOT
2017-10-29 00:32:06 103-0: 00:32:06   2.11M   rpool/data/vm-103-disk-1@__replicate_103-0_1509208321__
2017-10-29 00:32:07 103-0: 00:32:07   2.11M   rpool/data/vm-103-disk-1@__replicate_103-0_1509208321__
2017-10-29 00:32:08 103-0: rpool/data/vm-103-disk-1    name    rpool/data/vm-103-disk-1    -
2017-10-29 00:32:08 103-0: volume 'rpool/data/vm-103-disk-1' already exists
2017-10-29 00:32:08 103-0: warning: cannot send 'rpool/data/vm-103-disk-1@__replicate_103-0_1509208321__': signal received
2017-10-29 00:32:08 103-0: cannot send 'rpool/data/vm-103-disk-1': I/O error
2017-10-29 00:32:08 103-0: command 'zfs send -Rpv -- rpool/data/vm-103-disk-1@__replicate_103-0_1509208321__' failed: exit code 1
2017-10-29 00:32:08 103-0: delete previous replication snapshot '__replicate_103-0_1509208321__' on local-zfs:vm-103-disk-1
2017-10-29 00:32:08 103-0: end replication job with error: command 'set -o pipefail && pvesm export local-zfs:vm-103-disk-1 zfs - -with-snapshots 1 -snapshot __replicate_103-0_1509208321__ | /usr/bin/ssh -o 'BatchMode=yes' -o 'HostKeyAlias=pve-repl-1' root@10.0.0.51 -- pvesm import local-zfs:vm-103-disk-1 zfs - -with-snapshots 1' failed: exit code 255

Any clue? I have been facing this issue since v5.0. I might get away by configuring time-specific Replication schedules but would prefer each Replication task to run every 15-30 mins.

Thank you.

guletz · Oct 29, 2017

I use pve-zsync for exactly the same task like you. I can guess that you use the replication from the web-gui - I see this is not reliable for my case.
Using pve-zsync I have no problem. I also limit the bandwidth for any replication task, so even with 2 concurrent replication tasks, the tasks are successful. Another good thing is the fact that you have multiple snapshots on both sides

wolfgang · Oct 31, 2017

Hi,

can you send the /etc/pve/replication.cfg
and the pveversion -v output.
Do you need a dedicated replication Network?

anson · Oct 31, 2017

wolfgang said:
Hi,

can you send the /etc/pve/replication.cfg
and the pveversion -v output.
Do you need a dedicated replication Network?

Code:

root@pveXXX:~# cat /etc/pve/replication.cfg
local: 100-0
        target pve-repl-1

Code:

root@pve110:~# pveversion -v
proxmox-ve: 5.1-25 (running kernel: 4.13.4-1-pve)
pve-manager: 5.1-36 (running version: 5.1-36/131401db)
pve-kernel-4.13.4-1-pve: 4.13.4-25
pve-kernel-4.4.19-1-pve: 4.4.19-66
pve-kernel-4.4.67-1-pve: 4.4.67-92
pve-kernel-4.10.17-1-pve: 4.10.17-18
libpve-http-server-perl: 2.0-6
lvm2: 2.02.168-pve6
corosync: 2.4.2-pve3
libqb0: 1.0.1-1
pve-cluster: 5.0-15
qemu-server: 5.0-17
pve-firmware: 2.0-3
libpve-common-perl: 5.0-20
libpve-guest-common-perl: 2.0-13
libpve-access-control: 5.0-7
libpve-storage-perl: 5.0-16
pve-libspice-server1: 0.12.8-3
vncterm: 1.5-2
pve-docs: 5.1-12
pve-qemu-kvm: 2.9.1-2
pve-container: 2.0-17
pve-firewall: 3.0-3
pve-ha-manager: 2.0-3
ksm-control-daemon: 1.2-2
glusterfs-client: 3.8.8-1
lxc-pve: 2.1.0-2
lxcfs: 2.0.7-pve4
criu: 2.11.1-1~bpo90
novnc-pve: 0.6-4
smartmontools: 6.5+svn4324-1
zfsutils-linux: 0.7.2-pve1~bpo90

I have moved on to pve-zsync as recommended by @guletz . However, unlike how 'Replication' works, data restoration (in terms of disaster recovery) has to be done manually using 'zfs send'.

guletz · Oct 31, 2017

anson said:
I have moved on to pve-zsync as recommended by @guletz . However, unlike how 'Replication' works, data restoration (in terms of disaster recovery) has to be done manually using 'zfs send'.

Data restoration is very simple. Let say that node A is broken. Then you go on node Z, and simply move ALL(or maybe only some of them) yours VMs like this:

mv /etc/pve/nodes/A/qemu-server/*.conf /etc/pve/nodes/X/qemu-server

Then if you like, you can start all this VMs using web-interface. Another important thing: I use monit on each of nodes(A,B,C), and I make a test like this on each node $T:

IF file /etc/pve/nodes/$T/qemu-server/*.conf NOT EXIST then STOP cron
T={A,B,C}
- so the replication is stoped (via cron service) if I move any of my VMS on the X node

anson · Oct 31, 2017

Hi @guletz

Thank you for sharing.

Could you advise how we can bring up the VM on another Node Z using the latest snapshot after moving the conf file?

Code:

Each VM has a standard conf like the following

bootdisk: ide0
cores: 1
ide0: local-zfs:vm-101-disk-1,size=32G
ide2: none,media=cdrom
memory: 512
name: XXX.server.com
net0: virtio=36:31:32:31:39:38,bridge=vmbr1
numa: 0
onboot: 1
ostype: l26
smbios1: uuid=fca218f8-bb71-4f13-b4eb-a18ba8d7e689
sockets: 1

I understand the path to the disk image will be different and VM snapshots are stored with a format below.

Code:

VM Snapshots in rpool/snapshots

rpool/snapshots/vm-101-disk-1@rep_vm-101_2017-10-31_22:45:07     0B      -  13.6G  -
rpool/snapshots/vm-101-disk-1@rep_vm-101_2017-10-31_23:00:01     0B      -  13.6G  -

Earlier, I was using zfs send/receive method to rebuild a proper disk image from a Snapshot with the command below. This would be time consuming for large VMs with no verbose to monitor progress.

Code:

zfs send rpool/snapshots/vm-101-disk-1@rep_vm-101_2017-10-31_22:45:07 | zfs receive rpool/data/vm-NEW VM ID-disk-1

Thank you!

guletz said:
Data restoration is very simple. Let say that node A is broken. Then you go on node Z, and simply move ALL(or maybe only some of them) yours VMs like this:

mv /etc/pve/nodes/A/qemu-server/*.conf /etc/pve/nodes/X/qemu-server

Then if you like, you can start all this VMs using web-interface. Another important thing: I use monit on each of nodes(A,B,C), and I make a test like this on each node $T:

IF file /etc/pve/nodes/$T/qemu-server/*.conf NOT EXIST then STOP cron
T={A,B,C}
- so the replication is stoped (via cron service) if I move any of my VMS on the X node

guletz · Oct 31, 2017

Hi @anson,

anson said:
I understand the path to the disk image will be different and VM snapshots are stored with a format below.

No, the path is the same. I use something like this on the source nodes(A,B,C):

Code:

pve-zsync sync --source rpool/data/vm-xxx-disk-1 --dest 192.168.a1.b1:rpool/data --name my-VMxxx --maxsnap 18 --limit 51200 --method ssh

So you will have the same path for the replicated VMs on node X(rpool/data/vm-xxx-disk-1)

anson · Oct 31, 2017

Thank you @guletz ! I have implemented and tested it fully using your method and it works perfectly well!

I noticed you had configured maxsnap as '18' in your example, could you advise how you would select the snapshot that you wish to boot from?

guletz said:
Hi @anson,

No, the path is the same. I use something like this on the source nodes(A,B,C):

Code:

pve-zsync sync --source rpool/data/vm-xxx-disk-1 --dest 192.168.a1.b1:rpool/data --name my-VMxxx --maxsnap 18 --limit 51200 --method ssh

So you will have the same path for the replicated VMs on node X(rpool/data/vm-xxx-disk-1)

guletz · Oct 31, 2017

Thx for your appreciation!

No the second question is very hard. Ho many maxsnaps to use is dependent by your enviroment. I analysis many factors, many are very restricted :
- how much storage space you can use
- some VMs are more important than others (so I use zfs reservasion or disk quota, like for my very important VM i reserv xxx TB in zfs).
- some VMs are adding many data in a short time, so you need to use very often zsync because I have a Gbit replication network and I want to have a small network usage at is possible
- for any data base VM I use many zsyncs and very often
- If your clients have work hours(like 08.00-16.00), is very useful to use zsync in a smart way (like 07.00-18.00).

If you need to use a snapshot instead of a zvol disk, I will make a clone from the desired snaphot.

Remember what I said about monit. Also I want to say that at anytime anybody must have 2 different backup systems. I use zsync and rsync at the same time. I run rsync in each night (out of working hours , in 2 different moments - be warry then sorry ).
And in the end I recommend to anyone to test at least for one time/month how are their data:
- logs are ok?
- VMs are restor-able
- can you read/write some data
- how is your disk space usage (can I do more backups)
- how much time is spent for a backup (zsync, rsync)

.... and many others small things / details like this.

anson · Nov 2, 2017

Hi @guletz , thanks again for your advise! I now use the pve-zsync for disaster recovery, and another backup software for block level backup. I would also like to share that the pve-zsync has been successfully implemented on 4 nodes as of now.

Previously, I had no luck with the built-in Replication feature, although it works well if I only have 1 or 2 replications running from ~2 nodes. If I have multiple replications running concurrently from multi nodes they will likely fail 1 by 1 until I have to start all over again by removing and recreating the Replication tasks.

Now, the next step is to setup a cronjob to monitor "pve-zsync list/status" with email notifications.

guletz said:
Thx for your appreciation!

No the second question is very hard. Ho many maxsnaps to use is dependent by your enviroment. I analysis many factors, many are very restricted :
- how much storage space you can use
- some VMs are more important than others (so I use zfs reservasion or disk quota, like for my very important VM i reserv xxx TB in zfs).
- some VMs are adding many data in a short time, so you need to use very often zsync because I have a Gbit replication network and I want to have a small network usage at is possible
- for any data base VM I use many zsyncs and very often
- If your clients have work hours(like 08.00-16.00), is very useful to use zsync in a smart way (like 07.00-18.00).

If you need to use a snapshot instead of a zvol disk, I will make a clone from the desired snaphot.

Remember what I said about monit. Also I want to say that at anytime anybody must have 2 different backup systems. I use zsync and rsync at the same time. I run rsync in each night (out of working hours , in 2 different moments - be warry then sorry ).
And in the end I recommend to anyone to test at least for one time/month how are their data:
- logs are ok?
- VMs are restor-able
- can you read/write some data
- how is your disk space usage (can I do more backups)
- how much time is spent for a backup (zsync, rsync)

.... and many others small things / details like this.

guletz · Nov 2, 2017

anson said:
Now, the next step is to setup a cronjob to monitor "pve-zsync list/status" with email notifications.

By default when pve-zsync fail it send you a mail. But if you want to check pve-zsync list and more, use monit. Monit is a great tool. For your case you can create multiple checks, like:

If network is OK then
check if pve-zsync staus is ok

OR

if my-network-bandwidth > X mbits then
stop pve-zsync(or equivalent )

.... and more: if zpool is ok/If zpool space is ok/and ...

If you do not like how monit is, then I promise you a beer from me

Search

Search

Multiple Replications from Multiple Nodes to Single Storage Node

anson

Renowned Member

guletz

Distinguished Member

wolfgang

Proxmox Retired Staff

anson

Renowned Member

guletz

Distinguished Member

anson

Renowned Member

guletz

Distinguished Member

anson

Renowned Member

guletz

Distinguished Member

anson

Renowned Member

guletz

Distinguished Member

We value your privacy