help me to better understanding replication

fed

New Member
Apr 20, 2018
3
0
1
42
hi

sorry for my english


i have a 3 node proxmox cluster with mellanox interconnet in ethernet mode for replication testing
i don't have a shared filesystem because i want to test and better understanding the zfs replication

i have a vm0 with 2 disk on node1 and set a replication task to node2 every 15 min
this vm0 has HA set on a group called group0 in which we have node1 and node2
the replication work correctly

now, node1 go down for hardware reasons
the vm0 after a while came up correctly on node2 at "15 minutes before state"
on the panel i have disabled the replication task from node1 to node2
after while the node1 get repaired and came back online and in the cluster
now it's ok for the vm0 to stay on node2 but i want that in case of failure of that node the machines get replicated to node1
so i create another replication task for vm0 from node2 to node1
the replication started and now is in progress

on the host i see this (the replication is still in progress)

zfs list -t snapshot
node1:
NAME USED AVAIL REFER MOUNTPOINT
rpool/data/vm-100-disk-1@__replicate_100-0_1522856700__ 405M - 52.4G -
rpool/data/vm-100-disk-1@__replicate_100-1_1523993700__ 173M - 52.5G -
rpool/data/vm-100-disk-1@__replicate_100-0_1524217200__ 0B - 52.5G -
rpool/data/vm-100-disk-3@__replicate_100-0_1522856700__ 447G - 903G -
rpool/data/vm-100-disk-3@__replicate_100-1_1523993700__ 5.19G - 953G -
node2:
NAME USED AVAIL REFER MOUNTPOINT
rpool/data/vm-100-disk-1@__replicate_100-0_1522856700__ 404M - 52.4G -
rpool/data/vm-100-disk-1@__replicate_100-1_1523993700__ 172M - 52.5G -
rpool/data/vm-100-disk-1@__replicate_100-0_1524217200__ 33.2M - 52.5G -
rpool/data/vm-100-disk-3@__replicate_100-0_1522856700__ 447G - 903G -
rpool/data/vm-100-disk-3@__replicate_100-1_1523993700__ 207G - 953G -
rpool/data/vm-100-disk-3@__replicate_100-0_1524217200__ 1.47M - 992G -


i think that between the time the vm0 is used on node2 the difference between the original state on node1 would be ~300GB, but in the the replication is going up to the whole disk i think (over 500gb as now)

why the are 3 snpshots for disk and not 1 on each node?
the replication back from node2 to node1 cannot use or rewrite/delete the old snapshot?

this is as described above the correct use of replication?


any help to better understand replication is very welcome

regards
 
This is normal with multiple jobs, as each job uses two snapshots. After the node failure the disk has to be synced completely to the new destination.
 
This is normal with multiple jobs, as each job uses two snapshots. After the node failure the disk has to be synced completely to the new destination.

ok thanks

but if now the node2 goes down and the vm0 goes up on node1 the disks picked up are syncronized to the latest snapshot, correct? (in this case is the snapshot from the job 100-1 = the job replication between node2 and node1) or is possibile the disks picked up were the last snapshot of the replication job 100-0 = the first replication between node1 and node2?

after the vm0 goes up on node1 and the node2 after a while become online again, than i reactivate the replication job 0 (node1 to node2) and deactivate job replication 1 (node2 to node1), is this the correct "use" of zfs replication as proxmox intended?

regards
 
AFAIK, the storage replication doesn't know what the previous source state is and therefor has to sync again from the beginning.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!