ZFS Replication failed: got timeout

MAgno Santos

Member
Jun 20, 2019
19
3
8
42
Hello Everyone,

I have a Cluster with 2 groups of servers. 2 are Dell R720 and another 2 are Dell R710. In both i have the VMs running in ZFS Pool made by SSDs.

On the R710 pair some times the ZFS replication between nodes fails, and i only notice it because i have email notifications.

Below is a sample of the notification:

Code:
  command 'zfs snapshot R710SSD1/vm-107-disk-0@__replicate_107-0_1596076021__' failed: got timeout

some of these replication are set to happen every 5mins, with the goal of implementing a HA scenario without the need to go for CEPH (would require 4 node groups and better network).

Any ideas of the possible causes (ok, in this case is a time out... but why?) and fixes.

Thanks in advance,
 
This can happen if there is too much load on the zpool. IO gets high and the time until the snapshot for the next replication run is taken is too long -> time out.

Check the IO delay graph in the node summary if you have peaks that correlate with the timeouts
 
In my opiniont this is some bug, not IO overload. I have a system where:

1. IO is not overloaded, system is overall responsive,
2. issuing the 'zfs snapshot' command from command line takes just a few seconds to execute

btw. the abovementioned shapshots exist after failed replication.

and despite that there are abovementioned errors reported and replication fails.
 
For me, there are constant ZFS timeouts when doing a scrub on the target. By definition scrubs should not be a "higher priority" IO task. I think the timeout should be longer than a few seconds, or tunable - I get hundreds of replication failed emails whenever I scrub a pool.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!