ZFS snapshot failed: got timeout

Dubard

Active Member
Oct 5, 2016
61
2
28
47
Switzerland
Hi everybody,

First my configuration
- Proxmox 5.2.9 with 3 nodes.
- ZFS storage for ALL VMs and CTs
- RJ45 network 1Gb
- VMs and CTs

I sometimes have a problem (4 to 5 times a day ) with a VM (Debian Jessie - 10 Go RAM - 60 Go virtio HDD) whose replication ends with a "ZFS snapshot failed: got timeout". This replication is performed every minute.
I have other Debian VMs with similar hardware configuration but also Windows server VMs (2012 R2 and 2016 standard), CentOS and I rarely have this replication problem.

Does anyone have any idea why this error is mainly made on this VMs ?

Many thanks
 
Hi,

the timeout happens if you pool is under heavy load and the snapshot creation takes to long.
Does anyone have any idea why this error is mainly made on this VMs ?
Has this VM more IO load than the other ones?
 
Ho @wolfgang ,
Thank you for your reply.
This VM is a mail server. You can see some stats for her:
iostat
Code:
root@myserver:~# iostat
Linux 3.16.0-6-amd64 (chloris)    19. 10. 18    _x86_64_   (4 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.57    0.00    0.42    0.11    0.12   98.77

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
vda               7.79        34.81        45.81   37426397   49243228
scd0              0.00         0.00         0.00         48          0

root@myserver:~#
ioping statistics
Code:
--- /dev/vda1 (block device 58.0 GiB) ioping statistics ---
21 requests completed in 21.1 s, 108 iops, 433.0 KiB/s
min/avg/max/mdev = 2.29 ms / 9.24 ms / 43.2 ms / 8.44 ms
root@myserver:~#

load average value:

Code:
0.00, 0.03, 0.00

The problem occurs randomly...4 to 5 times a day...while the VM is replicated every minute ;)

Would there be something to do about the VM or proxmox nodes ?
Many thanks
 
Hello! I'm having this problem with replication too.
Maybe my VM have too much load for my hardware, as it shows 100% loaded hard disk very often, mostly on much non-sequential write operations...
It does not crush my work, but I have to check such messages, because when everything is Ok there no messages, and when replication failing at all there just one message about that fail and then no messages, too.
I tried to find if duration to this "timeout" can be enlarged to fit my hardware possibilities, but no luck. If somebody point me to that place it may be good.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!