Proxmox hypervisor crash during backup

Discussion in 'Proxmox VE: Installation and configuration' started by deludi, Oct 22, 2013.

  1. deludi

    deludi New Member

    Joined:
    Oct 14, 2013
    Messages:
    26
    Likes Received:
    0
    Hello all,

    i have a serious problem with proxmox 3.x.
    When i do a backup of a larger vm (> 50GB) the proxmox hypervisor crashes 50% of the times.
    I have tested this on proxmox 3.0 and 3.1.
    The vm is on a napp-it nfs store with gigabit interconnect.
    As a target i have tested both an nfs store and a cifs store on the same storage box.
    The nfs target causes relatively more crashes than the cifs target.
    The hardware is premium hardware: supermicro cases, intel xeons, ecc memory, etc.
    I have 3 proxmox hypervisor boxes and can replay the error on all 3 boxes.
    I have reinstalled proxmox multiple times without result.
    When the hypervisor crashes the screen is complete black in ipmi.
    The only possible action is to reboot from ipmi.
    I can provide logs, etc. if needed.
    Thank you in advance.

    Dirk Adamsky
     
  2. tom

    tom Proxmox Staff Member
    Staff Member

    Joined:
    Aug 29, 2006
    Messages:
    13,567
    Likes Received:
    412
    read the logs and post relevant parts here, also include a sample VM config and all info about your NFS server, also your pveversion -v.

    just to note, this is NOT expected.
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  3. deludi

    deludi New Member

    Joined:
    Oct 14, 2013
    Messages:
    26
    Likes Received:
    0
    Hi Tom,

    I will post them asap.
    The behaviour is indeed strange.
    I have 5 or 6 HP microserver boxes with proxmox 2.3 and proxmox 3, they have 1 100GB windows 7 vm
    and make backups to netgear nasboxes (nfs async) without problems....
     
  4. deludi

    deludi New Member

    Joined:
    Oct 14, 2013
    Messages:
    26
    Likes Received:
    0
    Hi Tom,

    here is the messages log and a screenshot of the nappit nfs settings.
    All further nappit settings are default, no dedup, encryption, etc.
    There is a dedicated zil ssd on the vm storage.
    Please let me know if you need more logs and/or screenshots.
     

    Attached Files:

  5. deludi

    deludi New Member

    Joined:
    Oct 14, 2013
    Messages:
    26
    Likes Received:
    0
  6. deludi

    deludi New Member

    Joined:
    Oct 14, 2013
    Messages:
    26
    Likes Received:
    0
    i have upgraded one of the three proxmox nodes to the latest kernel: 2.6.32-23-pve
    Then started a backup at 17:00 hr.
    Between 17:04 hr. and 17:08 hr. the node restarted.
    Here is the deamon log:

    Oct 23 17:00:01 proxmox3 vzdump[9943]: <root@pam> starting task UPID:proxmox3:000026D9:00051E5D:5267E471:vzdump::root@pam:
    Oct 23 17:00:01 proxmox3 vzdump[9945]: INFO: starting new backup job: vzdump 101 --quiet 1 --mode snapshot --compress lzo --storage storage02-backupstorage01
    Oct 23 17:00:01 proxmox3 vzdump[9945]: INFO: Starting Backup of VM 101 (qemu)
    Oct 23 17:00:02 proxmox3 qm[9950]: <root@pam> update VM 101: -lock backup
    Oct 23 17:00:06 proxmox3 ntpd[2353]: Listen normally on 12 tap101i0 fe80::547f:98ff:fede:ebcb UDP 123
    Oct 23 17:00:06 proxmox3 ntpd[2353]: peers refreshed
    Oct 23 17:04:16 proxmox3 rrdcached[2397]: flushing old values
    Oct 23 17:04:16 proxmox3 rrdcached[2397]: rotating journals
    Oct 23 17:04:16 proxmox3 rrdcached[2397]: started new journal /var/lib/rrdcached/journal/rrd.journal.1382540656.349880
    Oct 23 17:04:17 proxmox3 pveproxy[3305]: worker 7454 finished
    Oct 23 17:04:17 proxmox3 pveproxy[3305]: starting 1 worker(s)
    Oct 23 17:04:17 proxmox3 pveproxy[3305]: worker 10397 started
    Oct 23 17:08:39 proxmox3 ntpd[2338]: ntpd 4.2.6p5@1.2349-o Sat May 12 09:54:55 UTC 2012 (1)


    I also (like in the other thread) suspect that this is a kernel problem.
    I am working around the problem by not using proxmox backup, instead make a copy of the vm on storage level.
    kinda sucks though...

    Regards,

    Dirk Adamsky
     
  7. symmcom

    symmcom Active Member

    Joined:
    Oct 28, 2012
    Messages:
    1,071
    Likes Received:
    24
    Did you try to do a backup on local storage and see if it creates error during backup?

    Sent from my ASUS Transformer Pad TF700T using Tapatalk
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  8. deludi

    deludi New Member

    Joined:
    Oct 14, 2013
    Messages:
    26
    Likes Received:
    0
    Thank you for the tip.
    Unfortunately the 3 proxmox nodes all have only 1 SSD bootdisk (60GB,60GB and 40GB).
    The backup problem is with large vm's (~100GB).
    I will try to connect an extra hdd to one node but that will be next week when i am in the datacenter.
     
  9. deludi

    deludi New Member

    Joined:
    Oct 14, 2013
    Messages:
    26
    Likes Received:
    0
    I did another test this morning:
    did a backup from prompt with the bwlimit argument added:

    vzdump 110 --remove 0 --mode snapshot --compress lzo --storage storage02-backupstorage01 --node proxmox1 --bwlimit 30000

    The proxmox1 hypervisor unfortunately did crash again (backup of the vm was at 41%).
     
  10. symmcom

    symmcom Active Member

    Joined:
    Oct 28, 2012
    Messages:
    1,071
    Likes Received:
    24
    Was this backup done locally?
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  11. deludi

    deludi New Member

    Joined:
    Oct 14, 2013
    Messages:
    26
    Likes Received:
    0
    Hi symmcom,

    Thank you for your input.
    i ran this job through putty on the first of our 3 proxmox nodes. The vm is on an nfs share (omnios+napp-it) and the backup target is another nfs share on the same storage box.
    My collegue and i have decided to use napp-it (zfs) for backups. We have ~500GB of vm's and made a napp-it local replication task from the vm volume to a local backup volume (on another pool).
    The task ran in about 22 minutes (360MB/s). This task will be scheduled weekly.
    When we have rebuilded our second storage box from freenas to napp-it, the replication task will be to the other machine instead of a local copy.
    We will not use proxmox backup because of the above problems.

    Regards,

    Dirk Adamsky
     
  12. m.ardito

    m.ardito Active Member

    Joined:
    Feb 17, 2010
    Messages:
    1,473
    Likes Received:
    12
    @deludi, I think everybody else is trying to say to you: <<if you could try to backup locally, you could sort out if nfs backup is involved in crashing or not>>,
    basically you could take the network (and perhaps remote hosts issues) out of the backup job, in order to find out what is causing your issue...

    Marco
     
  13. deludi

    deludi New Member

    Joined:
    Oct 14, 2013
    Messages:
    26
    Likes Received:
    0
    Hi Marco,

    the 3 proxmox hypervisors only have small ssd bootdisks (60GB, 60GB and 40GB).
    They are in a datacenter 60km away....
    I do have several other single node proxmox setups with local storage that backup large vm's without problems.
    The issue is that i need my vm's on shared storage (in this setup an NFS share) because it's a 3 node proxmox cluster.
    The current proxmox kernel has a problem with vzdumps of large vm's when both vm and storage target are on an NFS share.
    I have also tested vm's on NFS share and backup target on a CIFS share: same result.
    I have tested this multiple times: the backup halts halfway, the proxmox node crashes, the other vm's are failed over to the other nodes.
    The cluster works ok as long as i do not start vzdump on a large vm.
    As stated above i will make my backup now on storage level with zfs replication jobs.
    I will try the proxmox backup again with newer kernels (proxmox 3.5) but for now i gave up and chose for zfs/napp-it backups.

    Regards,

    Dirk
     
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice