Proxmox hypervisor crash during backup

deludi

New Member
Oct 14, 2013
26
0
1
Hello all,

i have a serious problem with proxmox 3.x.
When i do a backup of a larger vm (> 50GB) the proxmox hypervisor crashes 50% of the times.
I have tested this on proxmox 3.0 and 3.1.
The vm is on a napp-it nfs store with gigabit interconnect.
As a target i have tested both an nfs store and a cifs store on the same storage box.
The nfs target causes relatively more crashes than the cifs target.
The hardware is premium hardware: supermicro cases, intel xeons, ecc memory, etc.
I have 3 proxmox hypervisor boxes and can replay the error on all 3 boxes.
I have reinstalled proxmox multiple times without result.
When the hypervisor crashes the screen is complete black in ipmi.
The only possible action is to reboot from ipmi.
I can provide logs, etc. if needed.
Thank you in advance.

Dirk Adamsky
 
read the logs and post relevant parts here, also include a sample VM config and all info about your NFS server, also your pveversion -v.

just to note, this is NOT expected.
 
Hi Tom,

I will post them asap.
The behaviour is indeed strange.
I have 5 or 6 HP microserver boxes with proxmox 2.3 and proxmox 3, they have 1 100GB windows 7 vm
and make backups to netgear nasboxes (nfs async) without problems....
 
Hi Tom,

here is the messages log and a screenshot of the nappit nfs settings.
All further nappit settings are default, no dedup, encryption, etc.
There is a dedicated zil ssd on the vm storage.
Please let me know if you need more logs and/or screenshots.
 

Attachments

  • nfs-settings-nappit.jpg
    nfs-settings-nappit.jpg
    39.3 KB · Views: 14
  • messages.zip
    19.1 KB · Views: 6
i have upgraded one of the three proxmox nodes to the latest kernel: 2.6.32-23-pve
Then started a backup at 17:00 hr.
Between 17:04 hr. and 17:08 hr. the node restarted.
Here is the deamon log:

Oct 23 17:00:01 proxmox3 vzdump[9943]: <root@pam> starting task UPID:proxmox3:000026D9:00051E5D:5267E471:vzdump::root@pam:
Oct 23 17:00:01 proxmox3 vzdump[9945]: INFO: starting new backup job: vzdump 101 --quiet 1 --mode snapshot --compress lzo --storage storage02-backupstorage01
Oct 23 17:00:01 proxmox3 vzdump[9945]: INFO: Starting Backup of VM 101 (qemu)
Oct 23 17:00:02 proxmox3 qm[9950]: <root@pam> update VM 101: -lock backup
Oct 23 17:00:06 proxmox3 ntpd[2353]: Listen normally on 12 tap101i0 fe80::547f:98ff:fede:ebcb UDP 123
Oct 23 17:00:06 proxmox3 ntpd[2353]: peers refreshed
Oct 23 17:04:16 proxmox3 rrdcached[2397]: flushing old values
Oct 23 17:04:16 proxmox3 rrdcached[2397]: rotating journals
Oct 23 17:04:16 proxmox3 rrdcached[2397]: started new journal /var/lib/rrdcached/journal/rrd.journal.1382540656.349880
Oct 23 17:04:17 proxmox3 pveproxy[3305]: worker 7454 finished
Oct 23 17:04:17 proxmox3 pveproxy[3305]: starting 1 worker(s)
Oct 23 17:04:17 proxmox3 pveproxy[3305]: worker 10397 started
Oct 23 17:08:39 proxmox3 ntpd[2338]: ntpd 4.2.6p5@1.2349-o Sat May 12 09:54:55 UTC 2012 (1)


I also (like in the other thread) suspect that this is a kernel problem.
I am working around the problem by not using proxmox backup, instead make a copy of the vm on storage level.
kinda sucks though...

Regards,

Dirk Adamsky
 
Did you try to do a backup on local storage and see if it creates error during backup?

Sent from my ASUS Transformer Pad TF700T using Tapatalk
 
Thank you for the tip.
Unfortunately the 3 proxmox nodes all have only 1 SSD bootdisk (60GB,60GB and 40GB).
The backup problem is with large vm's (~100GB).
I will try to connect an extra hdd to one node but that will be next week when i am in the datacenter.
 
I did another test this morning:
did a backup from prompt with the bwlimit argument added:

vzdump 110 --remove 0 --mode snapshot --compress lzo --storage storage02-backupstorage01 --node proxmox1 --bwlimit 30000

The proxmox1 hypervisor unfortunately did crash again (backup of the vm was at 41%).
 
Hi symmcom,

Thank you for your input.
i ran this job through putty on the first of our 3 proxmox nodes. The vm is on an nfs share (omnios+napp-it) and the backup target is another nfs share on the same storage box.
My collegue and i have decided to use napp-it (zfs) for backups. We have ~500GB of vm's and made a napp-it local replication task from the vm volume to a local backup volume (on another pool).
The task ran in about 22 minutes (360MB/s). This task will be scheduled weekly.
When we have rebuilded our second storage box from freenas to napp-it, the replication task will be to the other machine instead of a local copy.
We will not use proxmox backup because of the above problems.

Regards,

Dirk Adamsky
 
@deludi, I think everybody else is trying to say to you: <<if you could try to backup locally, you could sort out if nfs backup is involved in crashing or not>>,
basically you could take the network (and perhaps remote hosts issues) out of the backup job, in order to find out what is causing your issue...

Marco
 
Hi Marco,

the 3 proxmox hypervisors only have small ssd bootdisks (60GB, 60GB and 40GB).
They are in a datacenter 60km away....
I do have several other single node proxmox setups with local storage that backup large vm's without problems.
The issue is that i need my vm's on shared storage (in this setup an NFS share) because it's a 3 node proxmox cluster.
The current proxmox kernel has a problem with vzdumps of large vm's when both vm and storage target are on an NFS share.
I have also tested vm's on NFS share and backup target on a CIFS share: same result.
I have tested this multiple times: the backup halts halfway, the proxmox node crashes, the other vm's are failed over to the other nodes.
The cluster works ok as long as i do not start vzdump on a large vm.
As stated above i will make my backup now on storage level with zfs replication jobs.
I will try the proxmox backup again with newer kernels (proxmox 3.5) but for now i gave up and chose for zfs/napp-it backups.

Regards,

Dirk
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!