ZFS snapshot failed: got timeout

Dubard · Oct 18, 2018

Hi everybody,

First my configuration
- Proxmox 5.2.9 with 3 nodes.
- ZFS storage for ALL VMs and CTs
- RJ45 network 1Gb
- VMs and CTs

I sometimes have a problem (4 to 5 times a day ) with a VM (Debian Jessie - 10 Go RAM - 60 Go virtio HDD) whose replication ends with a "ZFS snapshot failed: got timeout". This replication is performed every minute.
I have other Debian VMs with similar hardware configuration but also Windows server VMs (2012 R2 and 2016 standard), CentOS and I rarely have this replication problem.

Does anyone have any idea why this error is mainly made on this VMs ?

Many thanks

wolfgang · Oct 18, 2018

Hi,

the timeout happens if you pool is under heavy load and the snapshot creation takes to long.

Dubard said:
Does anyone have any idea why this error is mainly made on this VMs ?

Has this VM more IO load than the other ones?

Dubard · Oct 19, 2018

Ho @wolfgang ,
Thank you for your reply.
This VM is a mail server. You can see some stats for her:
iostat

Code:

root@myserver:~# iostat
Linux 3.16.0-6-amd64 (chloris)    19. 10. 18    _x86_64_   (4 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.57    0.00    0.42    0.11    0.12   98.77

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
vda               7.79        34.81        45.81   37426397   49243228
scd0              0.00         0.00         0.00         48          0

root@myserver:~#

ioping statistics

Code:

--- /dev/vda1 (block device 58.0 GiB) ioping statistics ---
21 requests completed in 21.1 s, 108 iops, 433.0 KiB/s
min/avg/max/mdev = 2.29 ms / 9.24 ms / 43.2 ms / 8.44 ms
root@myserver:~#

load average value:

Code:

0.00, 0.03, 0.00

The problem occurs randomly...4 to 5 times a day...while the VM is replicated every minute

Would there be something to do about the VM or proxmox nodes ?
Many thanks

Dubard · Nov 2, 2018

Hi everybody,
Anyone have any ideas about the problem I mentioned above ?

Many thanks !

Romkus · Dec 7, 2018

Hello! I'm having this problem with replication too.
Maybe my VM have too much load for my hardware, as it shows 100% loaded hard disk very often, mostly on much non-sequential write operations...
It does not crush my work, but I have to check such messages, because when everything is Ok there no messages, and when replication failing at all there just one message about that fail and then no messages, too.
I tried to find if duration to this "timeout" can be enlarged to fit my hardware possibilities, but no luck. If somebody point me to that place it may be good.

Search

Search

ZFS snapshot failed: got timeout

Dubard

Active Member

wolfgang

Proxmox Retired Staff

Dubard

Active Member

Dubard

Active Member

Romkus

Member

We value your privacy