I have slowly been seeing more and more replication job failed errors. By the time I get to them, the problem has gone and I cannot find a sensible log to work out what the problem is. I would suggest the configuration isn't wrong per-se otherwise it wouldn't work most of the time. I get a bunch of emails for various guest containers or VM's around the same time.
My replication is via a dedicated network shown below on the 200 subnet. Its simply a switch with 6 machines (5 PVE and 1PBS) and that is it.
I might have assumed that it was a dodgy switch but its not just that one site its starting to infect two other sites.
I changed the replication schedules to spread them out, that didn't make any difference.
I checked all the guests that had a problem and by the time I get to them, they are working and the replication it taking just a second or 2 so as far as I can see not running into itself.
I'm running 8.3.2 and they were updated not that long ago.
What I'm asking for is how do I find out what the problem is?
Possibly related to https://forum.proxmox.com/threads/random-zfs-replication-errors.82486/
Same symptom but doesn't seem applicable, its not helpful that logs are not persistent.
My replication is via a dedicated network shown below on the 200 subnet. Its simply a switch with 6 machines (5 PVE and 1PBS) and that is it.
I might have assumed that it was a dodgy switch but its not just that one site its starting to infect two other sites.
Code:
Replication job '110-0' with target 'proxmoxmon1' and schedule '09/15' failed!
Last successful sync: 2025-03-17 08:24:52
Next sync try: 2025-03-17 08:44:00
Failure count: 1
Error:
command '/usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=proxmoxmon1' -o 'UserKnownHostsFile=/etc/pve/nodes/proxmoxmon1/ssh_known_hosts' -o 'GlobalKnownHostsFile=none' root@192.168.200.6 -- pvesr prepare-local-job 110-0 --scan local-zfs local-zfs:subvol-110-disk-0 --last_sync 1742199892' failed: exit code 255
I changed the replication schedules to spread them out, that didn't make any difference.
I checked all the guests that had a problem and by the time I get to them, they are working and the replication it taking just a second or 2 so as far as I can see not running into itself.
I'm running 8.3.2 and they were updated not that long ago.
What I'm asking for is how do I find out what the problem is?
Possibly related to https://forum.proxmox.com/threads/random-zfs-replication-errors.82486/
Same symptom but doesn't seem applicable, its not helpful that logs are not persistent.
Last edited: