Issue with backups taking very very long time in V2.1

Long story short:

We had 3 proxmox servers in a customer environment all running v1.9

We had an issue with backups to a NFS filer taking a long time that we posted both here and purchased a support ticket for. The 'fix' was to move to the following kernel:
atom1:/# pveversion -v
pve-manager: 1.9-26 (pve-manager/1.9/6567)
running kernel: 2.6.35-2-pveThis did fix the problem... However we upgraded one host to proxmox 2.1 and now we have the issue again. What makes it worse is that once we do a backup from the v2.1 server that starts to take a long time, it seems to actually affect the openfiler server itself as the load average goes high and we can only get it back by removing all NFS connections to regardless of whether there is traffic or not. ie. we remove the storage from the config on all the proxmox servers and then put them back and all seems well.

So whatever the issue is in Kernel 2-6.32 vs 2-6.35... we have the issue again.
I'm hoping there is a fix to this as it stands right now we need to actually downgrade the server back to v1.9 and ultimately we are looking for some of the new functionality in v2.1

As a side note when we went to upgrade the box originally to v2.1 from 1.9 it failed miserably and I had to actually install 2.1 on the server and restore the VMs from backup.

Also when I say a long time... I mean a backup that normally takes 1.5 hours in v1.9 took over 24 hours to complete.
 
post the backup log and your 'pveversion -v'

what kind of NFS server do you run? a lot of users reported problem with some NFS servers (e.g. badly configured freenas).
 
root@atom2:/var/log/vzdump# pveversion -v
pve-manager: 2.1-1 (pve-manager/2.1/f9b0f63a)
running kernel: 2.6.32-11-pve
proxmox-ve-2.6.32: 2.0-66
pve-kernel-2.6.32-11-pve: 2.6.32-66
lvm2: 2.02.95-1pve2
clvm: 2.02.95-1pve2
corosync-pve: 1.4.3-1
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.8-3
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.7-2
pve-cluster: 1.0-26
qemu-server: 2.0-39
pve-firmware: 1.0-15
libpve-common-perl: 1.0-27
libpve-access-control: 1.0-21
libpve-storage-perl: 2.0-18
vncterm: 1.0-2
vzctl: 3.0.30-2pve5
vzprocps: 2.0.11-2
vzquota: 3.0.12-3
pve-qemu-kvm: 1.0-9
ksm-control-daemon: 1.1-1

Backup log from backup that took so long is gone now. I've got another one in flight right now that is taking a long time. I backed up a VM yesterday that was a 38GB file that backed up in 90 minutes, and now a similar vm that should be about 38 GB after compressions is stuck at 35 GB right now. The backup file hasn't been touched since 8:30 this morning even though the backup is still running.

The load average on the filer is zero and the load average on the hypervisor doing the backup is :
atom3:/var/log/vzdump# w
10:38:27 up 2 days, 2:27, 2 users, load average: 1.07, 1.14, 1.19

The filer is openfiler 2.99 on a reasonable machine with quad core Xeon:
processor : 3
vendor_id : GenuineIntel
cpu family : 6
model : 30
model name : Intel(R) Xeon(R) CPU X3430 @ 2.40GHz
stepping : 5
cpu MHz : 2400.081
cache size : 8192 KB

and 16 GB of RAM.
Intel motherboard, etc...
We were using freenas originally but had nothing but problems with it so switched to openfiler 2.99

Thanks for the reply.
 
no backup logs? see /var/log/vzdump/..
 
Didn't mean that there wasn't any. just that the relevant one was gone... I currently have another stalled backup running right now. It started at 6:46 am but hasn't written to the backup file since 8:53 am:
atom3:/var/log/vzdump# cat qemu-201.log
Sep 21 06:45:01 INFO: Starting Backup of VM 201 (qemu)
Sep 21 06:45:01 INFO: running
Sep 21 06:45:01 INFO: status = running
Sep 21 06:45:01 INFO: backup mode: snapshot
Sep 21 06:45:01 INFO: ionice priority: 7
Sep 21 06:45:01 INFO: Logical volume "vzsnap-atom3-0" created
Sep 21 06:45:02 INFO: creating archive '/mnt/pve/graphite/vzdump-qemu-201-2012_09_21-06_45_01.tgz '

I see that the snapshot for it has gone inactive:
--- Logical volume ---
LV Name /dev/atom3-disk2/vzsnap-atom3-0
VG Name atom3-disk2
LV UUID iUGC2k-lai3-JbQc-AWTX-mA5d-brCE-FdK1AM
LV Write Access read/write
LV snapshot status INACTIVE destination for /dev/atom3-disk2/vm-201-disk-1
LV Status available
# open 1
LV Size 150.00 GB
Current LE 38401
COW-table size 1.00 GB
COW-table LE 256
Snapshot chunk size 4.00 KB
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 251:4
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!