Replication occasionally fails, persistently

thoralf · Oct 7, 2025

Randomly(?) replication jobs for some containers or virtual machines fail, usually persistently.
Even deleting and recreating the jobs and possibly existing volumes does not fix this. Sometimes it just doesn't work.

What's even more weird:
If I delete the failing replication job and delete exsiting volumes, I can migrate the container or vm just fine - which does a replication in the background.
(And I can even create a new replaction job back to the original node and that job works fine then as well.)

This workaround is fine for small workloads.
But it's impractical for workloads wih lots of data, where a replication from scratch will take hours.

Anyone any idea what might cause this/how to fix this?

Impact · Oct 7, 2025

Please share the log.

thoralf · Oct 7, 2025

Impact said:
Please share the log.

I have just "fixed" all the replication errors in the way described above.
I can certainly share the error message if it occurs again.

If memory serves me correctly, it's the common "conflicting snapshot" stuff (even though sometimes there is not even a volume on the other node).

Replication occasionally fails, persistently

thoralf

Member

Impact

Distinguished Member

thoralf

Member

We value your privacy