ZFS snapshot failed: got timeout

Discussion in 'Proxmox VE: Installation and configuration' started by Dubard, Oct 18, 2018.

  1. Dubard

    Dubard Member

    Joined:
    Oct 5, 2016
    Messages:
    50
    Likes Received:
    2
    Hi everybody,

    First my configuration
    - Proxmox 5.2.9 with 3 nodes.
    - ZFS storage for ALL VMs and CTs
    - RJ45 network 1Gb
    - VMs and CTs

    I sometimes have a problem (4 to 5 times a day ) with a VM (Debian Jessie - 10 Go RAM - 60 Go virtio HDD) whose replication ends with a "ZFS snapshot failed: got timeout". This replication is performed every minute.
    I have other Debian VMs with similar hardware configuration but also Windows server VMs (2012 R2 and 2016 standard), CentOS and I rarely have this replication problem.

    Does anyone have any idea why this error is mainly made on this VMs ?

    Many thanks
     
  2. wolfgang

    wolfgang Proxmox Staff Member
    Staff Member

    Joined:
    Oct 1, 2014
    Messages:
    4,072
    Likes Received:
    250
    Hi,

    the timeout happens if you pool is under heavy load and the snapshot creation takes to long.
    Has this VM more IO load than the other ones?
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  3. Dubard

    Dubard Member

    Joined:
    Oct 5, 2016
    Messages:
    50
    Likes Received:
    2
    Ho @wolfgang ,
    Thank you for your reply.
    This VM is a mail server. You can see some stats for her:
    iostat
    Code:
    root@myserver:~# iostat
    Linux 3.16.0-6-amd64 (chloris)    19. 10. 18    _x86_64_   (4 CPU)
    
    avg-cpu:  %user   %nice %system %iowait  %steal   %idle
               0.57    0.00    0.42    0.11    0.12   98.77
    
    Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
    vda               7.79        34.81        45.81   37426397   49243228
    scd0              0.00         0.00         0.00         48          0
    
    root@myserver:~#
    
    ioping statistics
    Code:
    --- /dev/vda1 (block device 58.0 GiB) ioping statistics ---
    21 requests completed in 21.1 s, 108 iops, 433.0 KiB/s
    min/avg/max/mdev = 2.29 ms / 9.24 ms / 43.2 ms / 8.44 ms
    root@myserver:~#
    

    load average value:

    Code:
    0.00, 0.03, 0.00
    
    The problem occurs randomly...4 to 5 times a day...while the VM is replicated every minute ;)

    Would there be something to do about the VM or proxmox nodes ?
    Many thanks
     
  4. Dubard

    Dubard Member

    Joined:
    Oct 5, 2016
    Messages:
    50
    Likes Received:
    2
    Hi everybody,
    Anyone have any ideas about the problem I mentioned above ?

    Many thanks !
     
  5. Romkus

    Romkus New Member

    Joined:
    Nov 13, 2016
    Messages:
    9
    Likes Received:
    0
    Hello! I'm having this problem with replication too.
    Maybe my VM have too much load for my hardware, as it shows 100% loaded hard disk very often, mostly on much non-sequential write operations...
    It does not crush my work, but I have to check such messages, because when everything is Ok there no messages, and when replication failing at all there just one message about that fail and then no messages, too.
    I tried to find if duration to this "timeout" can be enlarged to fit my hardware possibilities, but no luck. If somebody point me to that place it may be good.
     
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice