Replication occasionally fails, persistently

thoralf

New Member
Nov 29, 2023
12
0
1
Randomly(?) replication jobs for some containers or virtual machines fail, usually persistently.
Even deleting and recreating the jobs and possibly existing volumes does not fix this. Sometimes it just doesn't work.

What's even more weird:
If I delete the failing replication job and delete exsiting volumes, I can migrate the container or vm just fine - which does a replication in the background.
(And I can even create a new replaction job back to the original node and that job works fine then as well.)

This workaround is fine for small workloads.
But it's impractical for workloads wih lots of data, where a replication from scratch will take hours.

Anyone any idea what might cause this/how to fix this?
 
Last edited:
Please share the log.
I have just "fixed" all the replication errors in the way described above.
I can certainly share the error message if it occurs again.

If memory serves me correctly, it's the common "conflicting snapshot" stuff (even though sometimes there is not even a volume on the other node).