Hi,
we are testing pvesr to replicate several VEs for easy migration between several nodes.
Basically this is working nice, but if node A replicates a big chunk (100+GB) to node B while B trying to replicate another VE to A this results in hangs within this VE while zfs replication tries and fails. This has the potential to spiral into a problem.
Also notable: one can not stop/remove a replication while it's trying to sync.
Suggestion would be to have a back-off strategy for a replicationjob that fails, like double the waiting time for the next try after each failed attempt.
Being able to stop a sync-job would also be helpful (especially while it is pending).
Finally a kind of time-slice might also help: for initial replications of big storages only transfer like 10GB, then allow other replication-jobs to run and then take the next 10G and so on.
Any improvement on these issues would be greatly appreciated.
H.
we are testing pvesr to replicate several VEs for easy migration between several nodes.
Basically this is working nice, but if node A replicates a big chunk (100+GB) to node B while B trying to replicate another VE to A this results in hangs within this VE while zfs replication tries and fails. This has the potential to spiral into a problem.
Also notable: one can not stop/remove a replication while it's trying to sync.
Suggestion would be to have a back-off strategy for a replicationjob that fails, like double the waiting time for the next try after each failed attempt.
Being able to stop a sync-job would also be helpful (especially while it is pending).
Finally a kind of time-slice might also help: for initial replications of big storages only transfer like 10GB, then allow other replication-jobs to run and then take the next 10G and so on.
Any improvement on these issues would be greatly appreciated.
H.