[SOLVED] LXC Backup randomly hangs at suspend

Discussion in 'Proxmox VE: Installation and configuration' started by pa657, Dec 28, 2015.

  1. eislon

    eislon Member

    Joined:
    May 17, 2009
    Messages:
    44
    Likes Received:
    0
    I also have this problem, I normally backup to NFS but due to this problem I put a local disk on it and did backups, indeed the first backup seems ok, but also on local storage the backup is hung at suspend command.

    Is this solved yet?
     
  2. EricM

    EricM New Member

    Joined:
    Feb 13, 2016
    Messages:
    5
    Likes Received:
    0
  3. Jacob Tranholm

    Jacob Tranholm New Member
    Proxmox Subscriber

    Joined:
    Feb 10, 2016
    Messages:
    23
    Likes Received:
    0
    All of our servers are in production use. So these lockups during vzdump-backups are disasters. For most of our LXC-based VMs I have been forced to switch backup-methods. At all 3 of our Proxmox VE servers I only have one LXC-VM left using vzdump for backups, and that's only because I haven't found any other methods for creating a snapshot of the 400GB B+ data tree with billions of very small files (without taking the server offline).

    So you should seek the testing data elsewhere…
     
  4. iMer

    iMer New Member

    Joined:
    Feb 17, 2015
    Messages:
    16
    Likes Received:
    0
    @fabian: The fix I tried (just checking if the /proc/$PID dir exists) seems to have worked great for me so far
    The other containers I set up havent hung so far, but it's kinda hard without production data to give the cronjob something proper to work on
    I'll try and see if I can sort something out next week when im less busy
     
  5. fabian

    fabian Proxmox Staff Member
    Staff Member

    Joined:
    Jan 7, 2016
    Messages:
    3,270
    Likes Received:
    505
    Could you try the updated lxcfs packages in pve-test ( http://download.proxmox.com/debian/dists/jessie/pvetest/binary-amd64/lxcfs_2.0.0-pve1_amd64.deb and http://download.proxmox.com/debian/dists/jessie/pvetest/binary-amd64/lxcfs-dbg_2.0.0-pve1_amd64.deb ). This is a new upstream version which does not fix the issue completely (yet), but should move it from "occurs rarely" to "occurs almost never" territory. A complete fix is in the works and will hopefully follow soon.

    Note that you need to stop your affected containers after updating before the new lxcfs binary is used, and should probably try this on non-production systems first (I did not encounter issues so far, but better be safe than sorry!)
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
    Jacob Tranholm likes this.
  6. Jacob Tranholm

    Jacob Tranholm New Member
    Proxmox Subscriber

    Joined:
    Feb 10, 2016
    Messages:
    23
    Likes Received:
    0
    I have tried your new lxcfs-package at all 3 of our Proxmox VE servers. I followed your advise and tested if everything worked at one of the servers. And when I didn't see any problems, I installed the package at the other two Proxmox VE servers.

    Now the servers have been running with the new lxcfs-package in a few days. And I have again activated scheduled vzdumps for all of the LXC-based VM. And up until now I haven't seen any lockups… Before the lxcfs-package update, the VM's locked up about one third of the times.

    So I would like to say thanks… And from my experiences it looks like your changes are in the correct area.
     
  7. fabian

    fabian Proxmox Staff Member
    Staff Member

    Joined:
    Jan 7, 2016
    Messages:
    3,270
    Likes Received:
    505
    Sounds good! Please keep us posted if you experience any issues with the updated packages.
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  8. dzunk

    dzunk New Member

    Joined:
    Feb 24, 2016
    Messages:
    3
    Likes Received:
    0
    Hey, just wanted to check in and see if there is a timeframe for the lxcfs update/the next stable release - we're running PVE 3.4 in production, and in starting the process of upgrading, I set up a test server on 4.1 and ran in to this dealbreaker issue immediately (within the first 24 hours).

    Since the server in question is just for testing the upgrade, I can use pvetest for now, would love to see it in stable before end of support for 3.4 in April though!
     
  9. fabian

    fabian Proxmox Staff Member
    Staff Member

    Joined:
    Jan 7, 2016
    Messages:
    3,270
    Likes Received:
    505
    The more (positive) feedback we get, the sooner it will move to the non-test repositories. Did the package from pve-test fix the issue for you?
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  10. peterx

    peterx Member

    Joined:
    May 5, 2008
    Messages:
    39
    Likes Received:
    1
    Since a few days after the last updates I do backups every night wthout any problem: lxc, on NFS, LZO, snapshot.
    Peter
     
  11. dzunk

    dzunk New Member

    Joined:
    Feb 24, 2016
    Messages:
    3
    Likes Received:
    0
    So far so good, backups ran last night without a problem on:
    Code:
    lxc-pve: 1.1.5-7
    lxcfs: 2.0.0-pve1
    Using an NFS target, GZIP, suspend mode. I'm going to migrate a few more containers over today and see what happens.
     
  12. RobFantini

    RobFantini Active Member
    Proxmox Subscriber

    Joined:
    May 24, 2012
    Messages:
    1,516
    Likes Received:
    21
    we did 12 LXC backups snapshot mode after upgrade and reboot.

    10 worked the 1-st time. [ big improvement ].

    2 backups failed on different hosts .

    here is error output :
    Code:
    2228: Feb 25 22:26:30 INFO: Starting Backup of VM 2228 (lxc)
    2228: Feb 25 22:26:30 INFO: status = running
    2228: Feb 25 22:26:30 INFO: found old vzdump snapshot (force removal)
    2228: Feb 25 22:26:30 ERROR: Backup of VM 2228 failed - Can't delete snapshot: 2228 vzdump zfs error: could not find any snapshots to destroy; check snapshot names.
    
    Code:
    INFO: Starting Backup of VM 7597 (lxc)
    INFO: status = running
    INFO: found old vzdump snapshot (force removal)
    ERROR: Backup of VM 7597 failed - Can't delete snapshot: 7597 vzdump zfs error: could not find any snapshots to destroy; check snapshot names.
    
    in both cases rerunning the backup worked.


    Today there were no backup errors:
    Code:
      VMID  STATUS  TIME  SIZE  FILENAME
      2100  ok  00:00:28  298MB  /bkup2/dump/vzdump-lxc-2100-2016_02_27-16_32_02.tar.lzo
      2214  ok  00:00:52  1.03GB  /bkup2/dump/vzdump-lxc-2214-2016_02_27-16_32_30.tar.lzo
      2217  ok  00:04:20  3.17GB  /bkup2/dump/vzdump-lxc-2217-2016_02_27-16_33_22.tar.lzo
      2219  ok  00:01:57  959MB  /bkup2/dump/vzdump-lxc-2219-2016_02_27-16_37_42.tar.lzo
      2227  ok  00:03:34  2.62GB  /bkup2/dump/vzdump-lxc-2227-2016_02_27-16_39_39.tar.lzo
      2228  ok  00:01:41  960MB  /bkup2/dump/vzdump-lxc-2228-2016_02_27-16_43_13.tar.lzo
      2249  ok  00:01:54  985MB  /bkup2/dump/vzdump-lxc-2249-2016_02_27-16_44_54.tar.lzo
    
     
  13. fabian

    fabian Proxmox Staff Member
    Staff Member

    Joined:
    Jan 7, 2016
    Messages:
    3,270
    Likes Received:
    505
    Those two errors are because the config referenced a snapshot which does not actually exist on the storage level (probably an old failed backup run where you killed lxc-freeze?). This is/was expected behaviour and fixes itself on subsequent backup runs, like you said. We improved the error handling in that area already, so this hopefully does not happen again.

    Thanks for the feedback!
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  14. fips

    fips Member

    Joined:
    May 5, 2014
    Messages:
    141
    Likes Received:
    5
    Well I installed that 2 patches on my 3 hosts, since that day I can run backups on nfs storage like a charm.
    BUT already a third time it happens that 1 host cant quit processes and start more and more until it stuck.
    I can't stop container or migrate them, just got a timeout. I even can't reboot via shell, I have to make it via IPMI.

    As I said it happens since that patch and only on 1 host, to investigate I moved alle container except 2 to another host.

    Syslog says just before reboot:
    Feb 29 10:32:20 vmbase3 pvedaemon[6326]: Use of uninitialized value in concatenation (.) or string at /usr/share/perl5/PVE/Tools.pm line 827.

    Feb 29 10:32:20 vmbase3 pvedaemon[6326]: Use of uninitialized value in concatenation (.) or string at /usr/share/perl5/PVE/Tools.pm line 827.

    Feb 29 10:32:20 vmbase3 pvedaemon[6326]: Use of uninitialized value in concatenation (.) or string at /usr/share/perl5/PVE/Tools.pm line 827.

    Feb 29 10:32:20 vmbase3 pvedaemon[3226]: Argument "\n" isn't numeric in int at /usr/share/perl5/PVE/Tools.pm line 840, <GEN1142> line 1.

    Feb 29 10:32:20 vmbase3 pvedaemon[3226]: Argument "\n" isn't numeric in int at /usr/share/perl5/PVE/Tools.pm line 841, <GEN1142> line 2.

    Feb 29 10:32:20 vmbase3 pvedaemon[3226]: Argument "\n" isn't numeric in int at /usr/share/perl5/PVE/Tools.pm line 842, <GEN1142> line 3.
     
  15. fabian

    fabian Proxmox Staff Member
    Staff Member

    Joined:
    Jan 7, 2016
    Messages:
    3,270
    Likes Received:
    505
    The patch did not change anything in our own codebase - only in lxcfs. Are the other packages uptodate? Could you post the output of "pveversion -v"?

    What exactly do you mean with "1 host cant quit processes and start more and more until it stuck" ? vzdump processes? lxc processes?
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  16. camaran

    camaran Member
    Proxmox Subscriber

    Joined:
    Jul 14, 2013
    Messages:
    52
    Likes Received:
    0
    hi, this problem is fixed with the latest proxmox version?
     
  17. pa657

    pa657 New Member

    Joined:
    Dec 7, 2015
    Messages:
    7
    Likes Received:
    0
    Since last version, for the time being, no more issue for me...
     
  18. fabian

    fabian Proxmox Staff Member
    Staff Member

    Joined:
    Jan 7, 2016
    Messages:
    3,270
    Likes Received:
    505
    Should be fixed with lxcfs-2.0.0-pve1
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  19. camaran

    camaran Member
    Proxmox Subscriber

    Joined:
    Jul 14, 2013
    Messages:
    52
    Likes Received:
    0
    and it's released?
     
  20. fabian

    fabian Proxmox Staff Member
    Staff Member

    Joined:
    Jan 7, 2016
    Messages:
    3,270
    Likes Received:
    505
    yes, available in both pve-no-subscription and pve-enterprise.
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice