hi
sorry for my english
i have a 3 node proxmox cluster with mellanox interconnet in ethernet mode for replication testing
i don't have a shared filesystem because i want to test and better understanding the zfs replication
i have a vm0 with 2 disk on node1 and set a replication task to node2 every 15 min
this vm0 has HA set on a group called group0 in which we have node1 and node2
the replication work correctly
now, node1 go down for hardware reasons
the vm0 after a while came up correctly on node2 at "15 minutes before state"
on the panel i have disabled the replication task from node1 to node2
after while the node1 get repaired and came back online and in the cluster
now it's ok for the vm0 to stay on node2 but i want that in case of failure of that node the machines get replicated to node1
so i create another replication task for vm0 from node2 to node1
the replication started and now is in progress
on the host i see this (the replication is still in progress)
zfs list -t snapshot
node1:
NAME USED AVAIL REFER MOUNTPOINT
rpool/data/vm-100-disk-1@__replicate_100-0_1522856700__ 405M - 52.4G -
rpool/data/vm-100-disk-1@__replicate_100-1_1523993700__ 173M - 52.5G -
rpool/data/vm-100-disk-1@__replicate_100-0_1524217200__ 0B - 52.5G -
rpool/data/vm-100-disk-3@__replicate_100-0_1522856700__ 447G - 903G -
rpool/data/vm-100-disk-3@__replicate_100-1_1523993700__ 5.19G - 953G -
node2:
NAME USED AVAIL REFER MOUNTPOINT
rpool/data/vm-100-disk-1@__replicate_100-0_1522856700__ 404M - 52.4G -
rpool/data/vm-100-disk-1@__replicate_100-1_1523993700__ 172M - 52.5G -
rpool/data/vm-100-disk-1@__replicate_100-0_1524217200__ 33.2M - 52.5G -
rpool/data/vm-100-disk-3@__replicate_100-0_1522856700__ 447G - 903G -
rpool/data/vm-100-disk-3@__replicate_100-1_1523993700__ 207G - 953G -
rpool/data/vm-100-disk-3@__replicate_100-0_1524217200__ 1.47M - 992G -
i think that between the time the vm0 is used on node2 the difference between the original state on node1 would be ~300GB, but in the the replication is going up to the whole disk i think (over 500gb as now)
why the are 3 snpshots for disk and not 1 on each node?
the replication back from node2 to node1 cannot use or rewrite/delete the old snapshot?
this is as described above the correct use of replication?
any help to better understand replication is very welcome
regards
sorry for my english
i have a 3 node proxmox cluster with mellanox interconnet in ethernet mode for replication testing
i don't have a shared filesystem because i want to test and better understanding the zfs replication
i have a vm0 with 2 disk on node1 and set a replication task to node2 every 15 min
this vm0 has HA set on a group called group0 in which we have node1 and node2
the replication work correctly
now, node1 go down for hardware reasons
the vm0 after a while came up correctly on node2 at "15 minutes before state"
on the panel i have disabled the replication task from node1 to node2
after while the node1 get repaired and came back online and in the cluster
now it's ok for the vm0 to stay on node2 but i want that in case of failure of that node the machines get replicated to node1
so i create another replication task for vm0 from node2 to node1
the replication started and now is in progress
on the host i see this (the replication is still in progress)
zfs list -t snapshot
node1:
NAME USED AVAIL REFER MOUNTPOINT
rpool/data/vm-100-disk-1@__replicate_100-0_1522856700__ 405M - 52.4G -
rpool/data/vm-100-disk-1@__replicate_100-1_1523993700__ 173M - 52.5G -
rpool/data/vm-100-disk-1@__replicate_100-0_1524217200__ 0B - 52.5G -
rpool/data/vm-100-disk-3@__replicate_100-0_1522856700__ 447G - 903G -
rpool/data/vm-100-disk-3@__replicate_100-1_1523993700__ 5.19G - 953G -
node2:
NAME USED AVAIL REFER MOUNTPOINT
rpool/data/vm-100-disk-1@__replicate_100-0_1522856700__ 404M - 52.4G -
rpool/data/vm-100-disk-1@__replicate_100-1_1523993700__ 172M - 52.5G -
rpool/data/vm-100-disk-1@__replicate_100-0_1524217200__ 33.2M - 52.5G -
rpool/data/vm-100-disk-3@__replicate_100-0_1522856700__ 447G - 903G -
rpool/data/vm-100-disk-3@__replicate_100-1_1523993700__ 207G - 953G -
rpool/data/vm-100-disk-3@__replicate_100-0_1524217200__ 1.47M - 992G -
i think that between the time the vm0 is used on node2 the difference between the original state on node1 would be ~300GB, but in the the replication is going up to the whole disk i think (over 500gb as now)
why the are 3 snpshots for disk and not 1 on each node?
the replication back from node2 to node1 cannot use or rewrite/delete the old snapshot?
this is as described above the correct use of replication?
any help to better understand replication is very welcome
regards