Proxmox hypervisor crash during backup

deludi · Oct 22, 2013

Hello all,

i have a serious problem with proxmox 3.x.
When i do a backup of a larger vm (> 50GB) the proxmox hypervisor crashes 50% of the times.
I have tested this on proxmox 3.0 and 3.1.
The vm is on a napp-it nfs store with gigabit interconnect.
As a target i have tested both an nfs store and a cifs store on the same storage box.
The nfs target causes relatively more crashes than the cifs target.
The hardware is premium hardware: supermicro cases, intel xeons, ecc memory, etc.
I have 3 proxmox hypervisor boxes and can replay the error on all 3 boxes.
I have reinstalled proxmox multiple times without result.
When the hypervisor crashes the screen is complete black in ipmi.
The only possible action is to reboot from ipmi.
I can provide logs, etc. if needed.
Thank you in advance.

Dirk Adamsky

tom · Oct 22, 2013

read the logs and post relevant parts here, also include a sample VM config and all info about your NFS server, also your pveversion -v.

just to note, this is NOT expected.

deludi · Oct 22, 2013

Hi Tom,

I will post them asap.
The behaviour is indeed strange.
I have 5 or 6 HP microserver boxes with proxmox 2.3 and proxmox 3, they have 1 100GB windows 7 vm
and make backups to netgear nasboxes (nfs async) without problems....

deludi · Oct 22, 2013

Hi Tom,

here is the messages log and a screenshot of the nappit nfs settings.
All further nappit settings are default, no dedup, encryption, etc.
There is a dedicated zil ssd on the vm storage.
Please let me know if you need more logs and/or screenshots.

deludi · Oct 23, 2013

Here is a similar problem:

http://188.165.151.221/threads/14309-CRITICAL-Huge-IO-load-causes-freezing-during-backups

deludi · Oct 23, 2013

i have upgraded one of the three proxmox nodes to the latest kernel: 2.6.32-23-pve
Then started a backup at 17:00 hr.
Between 17:04 hr. and 17:08 hr. the node restarted.
Here is the deamon log:

Oct 23 17:00:01 proxmox3 vzdump[9943]: <root@pam> starting task UPIDroxmox3:000026D9:00051E5D:5267E471:vzdump::root@pam:
Oct 23 17:00:01 proxmox3 vzdump[9945]: INFO: starting new backup job: vzdump 101 --quiet 1 --mode snapshot --compress lzo --storage storage02-backupstorage01
Oct 23 17:00:01 proxmox3 vzdump[9945]: INFO: Starting Backup of VM 101 (qemu)
Oct 23 17:00:02 proxmox3 qm[9950]: <root@pam> update VM 101: -lock backup
Oct 23 17:00:06 proxmox3 ntpd[2353]: Listen normally on 12 tap101i0 fe80::547f:98ff:fede:ebcb UDP 123
Oct 23 17:00:06 proxmox3 ntpd[2353]: peers refreshed
Oct 23 17:04:16 proxmox3 rrdcached[2397]: flushing old values
Oct 23 17:04:16 proxmox3 rrdcached[2397]: rotating journals
Oct 23 17:04:16 proxmox3 rrdcached[2397]: started new journal /var/lib/rrdcached/journal/rrd.journal.1382540656.349880
Oct 23 17:04:17 proxmox3 pveproxy[3305]: worker 7454 finished
Oct 23 17:04:17 proxmox3 pveproxy[3305]: starting 1 worker(s)
Oct 23 17:04:17 proxmox3 pveproxy[3305]: worker 10397 started
Oct 23 17:08:39 proxmox3 ntpd[2338]: ntpd 4.2.6p5@1.2349-o Sat May 12 09:54:55 UTC 2012 (1)

I also (like in the other thread) suspect that this is a kernel problem.
I am working around the problem by not using proxmox backup, instead make a copy of the vm on storage level.
kinda sucks though...

Regards,

Dirk Adamsky

wahmed · Oct 23, 2013

Did you try to do a backup on local storage and see if it creates error during backup?

Sent from my ASUS Transformer Pad TF700T using Tapatalk

deludi · Oct 23, 2013

Thank you for the tip.
Unfortunately the 3 proxmox nodes all have only 1 SSD bootdisk (60GB,60GB and 40GB).
The backup problem is with large vm's (~100GB).
I will try to connect an extra hdd to one node but that will be next week when i am in the datacenter.

deludi · Oct 28, 2013

I did another test this morning:
did a backup from prompt with the bwlimit argument added:

vzdump 110 --remove 0 --mode snapshot --compress lzo --storage storage02-backupstorage01 --node proxmox1 --bwlimit 30000

The proxmox1 hypervisor unfortunately did crash again (backup of the vm was at 41%).

wahmed · Oct 28, 2013

Was this backup done locally?

deludi · Oct 28, 2013

Hi symmcom,

Thank you for your input.
i ran this job through putty on the first of our 3 proxmox nodes. The vm is on an nfs share (omnios+napp-it) and the backup target is another nfs share on the same storage box.
My collegue and i have decided to use napp-it (zfs) for backups. We have ~500GB of vm's and made a napp-it local replication task from the vm volume to a local backup volume (on another pool).
The task ran in about 22 minutes (360MB/s). This task will be scheduled weekly.
When we have rebuilded our second storage box from freenas to napp-it, the replication task will be to the other machine instead of a local copy.
We will not use proxmox backup because of the above problems.

Regards,

Dirk Adamsky

m.ardito · Oct 29, 2013

@deludi, I think everybody else is trying to say to you: <<if you could try to backup locally, you could sort out if nfs backup is involved in crashing or not>>,
basically you could take the network (and perhaps remote hosts issues) out of the backup job, in order to find out what is causing your issue...

Marco

deludi · Oct 29, 2013

Hi Marco,

the 3 proxmox hypervisors only have small ssd bootdisks (60GB, 60GB and 40GB).
They are in a datacenter 60km away....
I do have several other single node proxmox setups with local storage that backup large vm's without problems.
The issue is that i need my vm's on shared storage (in this setup an NFS share) because it's a 3 node proxmox cluster.
The current proxmox kernel has a problem with vzdumps of large vm's when both vm and storage target are on an NFS share.
I have also tested vm's on NFS share and backup target on a CIFS share: same result.
I have tested this multiple times: the backup halts halfway, the proxmox node crashes, the other vm's are failed over to the other nodes.
The cluster works ok as long as i do not start vzdump on a large vm.
As stated above i will make my backup now on storage level with zfs replication jobs.
I will try the proxmox backup again with newer kernels (proxmox 3.5) but for now i gave up and chose for zfs/napp-it backups.

Regards,

Dirk

Search

Search

Proxmox hypervisor crash during backup

deludi

New Member

tom

Proxmox Staff Member

deludi

New Member

deludi

New Member

Attachments

deludi

New Member

deludi

New Member

wahmed

Famous Member

deludi

New Member

deludi

New Member

wahmed

Famous Member

deludi

New Member

m.ardito

Famous Member

deludi

New Member

We value your privacy