Replication: sometimes got snapshot timeout error

Alexey Pavlyuts · Jun 29, 2020

Hi All,

After migration to 6.2 and reworking the whole disks layout, I am experience regular email messages with content:

Code:

command 'zfs snapshot data/vm-111-disk-0@__replicate_111-0_1593357001__' failed: got timeout

I use ZFS replication to keep MV/container safe of any separate single server failure, and the most critical MV/LXCs replicated to 2 other servers.

I have zabbix on place and I see iowait on these servers has average about 3-5% and sometimes jumps up to 30% (!!!)

System layout:

"rpool" is a small ZFS mirror for system root mostly and a count of wery light, but important LXCs, build on two SAS 10K 300Gb drives.

"data" is a pool where relatively big MVs reside. It is a mirror of ST4000LM024 drives, 4TB, 5400RPM SATA, 130MB/s speed by specs. Yes, I know, it is slow. But I have no others.

One SATA attached 400GB intel SSD used, gpt partitioned for 2 parts: linux swap and all the otheer is used as cache for pool "data", but it looks like it is not very helpful.

All the pools have ashift=12, complession on and sync disabled.

Code:

root@hp1:~# zpool status
  pool: data
state: ONLINE
  scan: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        data        ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            sdc     ONLINE       0     0     0
            sdd     ONLINE       0     0     0
        cache
          sde2      ONLINE       0     0     0

errors: No known data errors

  pool: rpool
state: ONLINE
  scan: scrub repaired 0B in 0 days 00:37:35 with 0 errors on Sun Jun 14 01:01:37 2020
config:

        NAME        STATE     READ WRITE CKSUM
        rpool       ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            sda3    ONLINE       0     0     0
            sdb3    ONLINE       0     0     0

errors: No known data errors

The issue is that I have email notification on arror in one hand, but when I go to the web-interface to check - all the replications is Ok. I guess, Proxmox retries it with success before I go to the web-interface.

Th appearance of th eerror is a very random and I have no idea about corellation. It looks like it may happen if another sync task is on the go (may be incoming sync from other host?).

Quite annoying behaviour.

Any ideas about how to avoid snapshot timeout? Is the timeout configuable in Proxmox? Is any other options except of disk upgrate to 7200RPM or better?

wolfgang · Jul 8, 2020

Hi,

this described problem normally occurs is the pool is under load and snapshot has a lower priority.

Alexey Pavlyuts · Aug 27, 2020

wolfgang said:
Hi,

this described problem normally occurs is the pool is under load and snapshot has a lower priority.

I see that it has reprication failure because snapshut sreation timeout but the snapshot is always created successfully. Is it possible to increase timeout value somehow?

wolfgang · Aug 27, 2020

No, it is not possible to increase the timeout.
And this would also not resolve the problem at all.

Yes, the snapshot will be created because it is in the ZFS low priority queue.
Normally in the error case, we clean up this snapshot but the snapshot does not exist at this moment.

Alexey Pavlyuts · Aug 27, 2020

So, there no chance for fail-less operation at all? The problem is that it may sync good or may fail. Is there any way to decrease ZFS load?

wolfgang · Aug 27, 2020

Yes, you could use a special device.
This could outsource the small writes and metadata that are currently blocking you.
But it is important that this device is mirrored and is an fast enterprise-grade SSD.

carsten2 · Sep 7, 2020

wolfgang said:
No, it is not possible to increase the timeout.
And this would also not resolve the problem at all.

Why it is not possible to increate the timeout? What is the timeout? I think increasing the timeout would solve the problem.
I also have mail with "replications failed" errors because of timeouts, even though everything works fine and all replications (every 15min) I see in the UI have status ok. Could it be that if there are several replications jobs and they try to snapshot at the same time, that they run into a delay because of the zfs pool serializing the snapshot operations?

BW: I have a ZIL and CACHE, and normal replications just take 5-10 seconds only.

wolfgang · Sep 8, 2020

carsten2 said:
Why it is not possible to increate the timeout?

Because there are other tasks what will hang if the timeout is too long.

carsten2 said:
I think increasing the timeout would solve the problem.

There are large pools out there that can take several minutes to hours to return successfully.
If the pool is to slow you have to increase the speed to use this feature.

carsten2 said:
Could it be that if there are several replications jobs and they try to snapshot at the same time,

There is no parallel snapshot process it is all serialized.
This is why we freeze the VM with the qemu-guest agent if consistent is required.

carsten2 said:
BW: I have a ZIL and CACHE, and normal replications just take 5-10 seconds only.

There are not the required data for snapshots in ZIL and Cache.

Alexey Pavlyuts · Sep 8, 2020

wolfgang said:
Because there are other tasks what will hang if the timeout is too long.

May be it is a good idea to make it configurable? After changing 5400RPM poor drives to SSD I still have about 2-5 warnings per day. While the amount of my data is really small - tens of GB.

carsten2 · Sep 8, 2020

wolfgang said:
There are large pools out there that can take several minutes to hours to return successfully.

No one asked to increase the timeout to hours, but currently it seems to be only a couple of seconds, which should be possible to increase. What is the current timeout?

wolfgang · Sep 8, 2020

I'm not 100% sure 3 or 5 sec.

Alexey Pavlyuts · Oct 16, 2020

wolfgang said:
I'm not 100% sure 3 or 5 sec.

With time, the count of errors constantly increase while I am run SSD over 6Gbps SATA, it looks VERY strange. May you please to advice how to manage it? How I can distribute the replications over time, I can't find a way because of the scheduler does not allow me to setup exact minutes for replication, for example. IO is very spiky then.

Please, help!

wolfgang · Oct 23, 2020

Alexey Pavlyuts said:
I can't find a way because of the scheduler does not allow me to setup exact minutes for replication

You can configure every time pattern that is allowed at systemd.timer

Alexey Pavlyuts · Nov 30, 2020

Finally, I found the reason.

Tha problem was poor performance SSD drives. I was sure that almost any SSD will bring better IOPS that HDD, by nature. But there was a mistake. We have put desktop-class SunDisk SSD. It performs well for read and short writes with TRIM, but it lost it's performance on long and randow writes. Database-like usege kills it completely, dropping iops to terroble rates like 6K, completely saturates IO and boosting server iowait figures.

As changed to Intel server-class SSD - everything start to work well and iowait goes below 1% in a worst case.

So, it is not a problem of software it is a problem of iops rate.

Tim Pozar · Jun 9, 2022

Another option would be to only alert if say three replications in a row fail. Normally replications will "resolve" themselves on the second try. Perhaps have the script count consecutive failed replication attempts and only alert after, say the third?

Vasily Poterjyko · Dec 22, 2022

Tim Pozar said:
Another option would be to only alert if say three replications in a row fail. Normally replications will "resolve" themselves on the second try. Perhaps have the script count consecutive failed replication attempts and only alert after, say the third?

How can this setting be configured? So that the notification is sent only after the third unsuccessful replication? Is there any mechanism provided?

damago1 · Jul 15, 2023

In my opiniont this is some bug, not IO overload. I have a system where

1. IO is not overloaded, system is overall responsive,
2. issuing the 'zfs snapshot' command from command line takes just a few seconds (5-15) to execute

and despite that there are abovementioned errors reported and replication fails.

...and after failed replication the snapshot is left existing on the zfs filesystem

either there is a bug, or the timeout value is way too small.

hauszduxjso · Mar 8, 2025

Vasily Poterjyko said:
How can this setting be configured? So that the notification is sent only after the third unsuccessful replication? Is there any mechanism provided?

I would also be interested in how to configure the "failed 3 Times in a row". Anyone can Help?

Search

Search

Replication: sometimes got snapshot timeout error

Alexey Pavlyuts

Active Member

wolfgang

Proxmox Retired Staff

Alexey Pavlyuts

Active Member

wolfgang

Proxmox Retired Staff

Alexey Pavlyuts

Active Member

wolfgang

Proxmox Retired Staff

carsten2

Renowned Member

wolfgang

Proxmox Retired Staff

Alexey Pavlyuts

Active Member

carsten2

Renowned Member

wolfgang

Proxmox Retired Staff

Alexey Pavlyuts

Active Member

wolfgang

Proxmox Retired Staff

Alexey Pavlyuts

Active Member

Tim Pozar

Active Member

Vasily Poterjyko

Renowned Member

damago1

Active Member

hauszduxjso

New Member

We value your privacy