ERROR ZFS REPLICATION

julicravo

Renowned Member
Aug 22, 2013
26
0
66
Salvador, Brazil
Hello people, I have a cluster, with sata 3 disks in raidz2 (zfs proxmox) four disks. At certain times of the day I get this error message, at the time of replication. I wonder if it is caused by network or even disk. What I could do to minimize. Thank you.


Nov 27 09:06:32 prox01 pvesr[23438]: 100-0: got unexpected replication job error - command 'zfs snapshot rpool/data/vm-100-disk-0@__replicate_100-0_1606478759__' failed: got timeout
 
2020-11-27 09:28:15 100-0: create snapshot '__replicate_100-0_1606480080__' on local-zfs:vm-100-disk-0
2020-11-27 09:28:26 100-0: thaw guest filesystem
2020-11-27 09:28:26 100-0: end replication job with error: command 'zfs snapshot rpool/data/vm-100-disk-0@__replicate_100-0_1606480080__' failed: got timeout
 
This is very likely a performance problem. I am seeing this occasionally myself on systems with HDD. If you are Replicating bidrectional, it might help to prevent overlaps. If server A replicates to B while B replicates to A, it might be good to have this separated. After I did this, I get the timeout rarely, once in a couple of days. If it is a major problem, you should think about a ssd cache for the slog to speed the system up.
 
This is very likely a performance problem. I am seeing this occasionally myself on systems with HDD. If you are Replicating bidrectional, it might help to prevent overlaps. If server A replicates to B while B replicates to A, it might be good to have this separated. After I did this, I get the timeout rarely, once in a couple of days. If it is a major problem, you should think about a ssd cache for the slog to speed the system up.
Many thanks for the reply! Replication is only from A to B. I liked the idea of SSD caching. Do you have any material explaining how this works? Thanks.