Hello Everyone,
I have a Cluster with 2 groups of servers. 2 are Dell R720 and another 2 are Dell R710. In both i have the VMs running in ZFS Pool made by SSDs.
On the R710 pair some times the ZFS replication between nodes fails, and i only notice it because i have email notifications.
Below is a sample of the notification:
some of these replication are set to happen every 5mins, with the goal of implementing a HA scenario without the need to go for CEPH (would require 4 node groups and better network).
Any ideas of the possible causes (ok, in this case is a time out... but why?) and fixes.
Thanks in advance,
I have a Cluster with 2 groups of servers. 2 are Dell R720 and another 2 are Dell R710. In both i have the VMs running in ZFS Pool made by SSDs.
On the R710 pair some times the ZFS replication between nodes fails, and i only notice it because i have email notifications.
Below is a sample of the notification:
Code:
command 'zfs snapshot R710SSD1/vm-107-disk-0@__replicate_107-0_1596076021__' failed: got timeout
some of these replication are set to happen every 5mins, with the goal of implementing a HA scenario without the need to go for CEPH (would require 4 node groups and better network).
Any ideas of the possible causes (ok, in this case is a time out... but why?) and fixes.
Thanks in advance,