Super slow Backup with pve 2.x

one last suggestion, is freenas server set to Asynchronous ?
no, asynchronous is not activated

i mounted it with this
Code:
mount 192.168.1.1:/mnt/DATA/pve /mnt/pve/pve2 -o rsize=8192,wsize=8192,nolock,timeo=14,intr,proto=tcp,port=2049
now i did a dd-test with "time dd if=/dev/zero of=/mnt/pve/pve2/bigfile03.txt bs=32k count=640000" and now i'm getting the following entrys in the syslog :(
Code:
Jun 25 13:27:31 pve01 kernel: ct0 nfs: server 192.168.1.1 not responding, still trying
Jun 25 13:27:31 pve01 kernel: ct0 nfs: server 192.168.1.1 not responding, still trying
Jun 25 13:27:31 pve01 kernel: ct0 nfs: server 192.168.1.1 not responding, still trying
Jun 25 13:27:31 pve01 kernel: ct0 nfs: server 192.168.1.1 not responding, still trying
Jun 25 13:27:31 pve01 kernel: ct0 nfs: server 192.168.1.1 not responding, still trying
Jun 25 13:27:31 pve01 kernel: ct0 nfs: server 192.168.1.1 not responding, still trying
Jun 25 13:27:31 pve01 kernel: ct0 nfs: server 192.168.1.1 not responding, still trying
Jun 25 13:27:31 pve01 kernel: ct0 nfs: server 192.168.1.1 not responding, still trying
Jun 25 13:27:31 pve01 kernel: ct0 nfs: server 192.168.1.1 not responding, still trying
Jun 25 13:27:31 pve01 kernel: ct0 nfs: server 192.168.1.1 not responding, still trying
Jun 25 13:27:31 pve01 kernel: ct0 nfs: server 192.168.1.1 not responding, still trying
Jun 25 13:27:31 pve01 kernel: ct0 nfs: server 192.168.1.1 not responding, still trying
Jun 25 13:27:31 pve01 kernel: ct0 nfs: server 192.168.1.1 not responding, still trying
Jun 25 13:27:31 pve01 kernel: ct0 nfs: server 192.168.1.1 not responding, still trying
Jun 25 13:27:31 pve01 kernel: ct0 nfs: server 192.168.1.1 not responding, still trying
Jun 25 13:27:31 pve01 kernel: ct0 nfs: server 192.168.1.1 not responding, still trying
Jun 25 13:27:32 pve01 kernel: ct0 nfs: server 192.168.1.1 OK
Jun 25 13:27:32 pve01 kernel: ct0 nfs: server 192.168.1.1 OK
Jun 25 13:27:32 pve01 kernel: ct0 nfs: server 192.168.1.1 OK
Jun 25 13:27:32 pve01 kernel: ct0 nfs: server 192.168.1.1 OK
Jun 25 13:27:32 pve01 kernel: ct0 nfs: server 192.168.1.1 OK
Jun 25 13:27:33 pve01 kernel: ct0 nfs: server 192.168.1.1 OK
Jun 25 13:27:33 pve01 kernel: ct0 nfs: server 192.168.1.1 OK
Jun 25 13:27:33 pve01 kernel: ct0 nfs: server 192.168.1.1 OK
Jun 25 13:27:33 pve01 kernel: ct0 nfs: server 192.168.1.1 OK
Jun 25 13:27:33 pve01 kernel: ct0 nfs: server 192.168.1.1 OK
Jun 25 13:27:33 pve01 kernel: ct0 nfs: server 192.168.1.1 OK
Jun 25 13:27:33 pve01 kernel: ct0 nfs: server 192.168.1.1 OK
Jun 25 13:27:33 pve01 kernel: ct0 nfs: server 192.168.1.1 OK
Jun 25 13:27:33 pve01 kernel: ct0 nfs: server 192.168.1.1 OK
Jun 25 13:27:33 pve01 kernel: ct0 nfs: server 192.168.1.1 OK
Jun 25 13:27:33 pve01 kernel: ct0 nfs: server 192.168.1.1 OK
Jun 25 13:27:50 pve01 kernel: ct0 nfs: server 192.168.1.1 not responding, still trying
Jun 25 13:27:50 pve01 kernel: ct0 nfs: server 192.168.1.1 not responding, still trying
Jun 25 13:27:50 pve01 kernel: ct0 nfs: server 192.168.1.1 not responding, still trying
Jun 25 13:27:50 pve01 kernel: ct0 nfs: server 192.168.1.1 not responding, still trying
Jun 25 13:27:50 pve01 kernel: ct0 nfs: server 192.168.1.1 not responding, still trying
Jun 25 13:27:50 pve01 kernel: ct0 nfs: server 192.168.1.1 not responding, still trying
Jun 25 13:27:50 pve01 kernel: ct0 nfs: server 192.168.1.1 not responding, still trying
Jun 25 13:27:50 pve01 kernel: ct0 nfs: server 192.168.1.1 not responding, still trying
Jun 25 13:27:50 pve01 kernel: ct0 nfs: server 192.168.1.1 not responding, still trying
Jun 25 13:27:50 pve01 kernel: ct0 nfs: server 192.168.1.1 not responding, still trying
Jun 25 13:27:50 pve01 kernel: ct0 nfs: server 192.168.1.1 not responding, still trying
Jun 25 13:27:50 pve01 kernel: ct0 nfs: server 192.168.1.1 not responding, still trying
Jun 25 13:27:50 pve01 kernel: ct0 nfs: server 192.168.1.1 not responding, still trying
Jun 25 13:27:50 pve01 kernel: ct0 nfs: server 192.168.1.1 not responding, still trying
Jun 25 13:27:50 pve01 kernel: ct0 nfs: server 192.168.1.1 not responding, still trying
Jun 25 13:27:50 pve01 kernel: ct0 nfs: server 192.168.1.1 OK
Jun 25 13:27:51 pve01 kernel: ct0 nfs: server 192.168.1.1 OK
Jun 25 13:27:51 pve01 kernel: ct0 nfs: server 192.168.1.1 OK
Jun 25 13:27:51 pve01 kernel: ct0 nfs: server 192.168.1.1 OK
Jun 25 13:27:51 pve01 kernel: ct0 nfs: server 192.168.1.1 OK
Jun 25 13:27:51 pve01 kernel: ct0 nfs: server 192.168.1.1 OK
Jun 25 13:27:51 pve01 kernel: ct0 nfs: server 192.168.1.1 OK
Jun 25 13:27:51 pve01 kernel: ct0 nfs: server 192.168.1.1 OK
Jun 25 13:27:51 pve01 kernel: ct0 nfs: server 192.168.1.1 not responding, still trying
Jun 25 13:27:51 pve01 kernel: ct0 nfs: server 192.168.1.1 OK
Jun 25 13:27:51 pve01 kernel: ct0 nfs: server 192.168.1.1 OK
Jun 25 13:27:51 pve01 kernel: ct0 nfs: server 192.168.1.1 OK
Jun 25 13:27:51 pve01 kernel: ct0 nfs: server 192.168.1.1 OK
Jun 25 13:27:51 pve01 kernel: ct0 nfs: server 192.168.1.1 OK
Jun 25 13:27:51 pve01 kernel: ct0 nfs: server 192.168.1.1 OK
Jun 25 13:27:51 pve01 kernel: ct0 nfs: server 192.168.1.1 OK
Jun 25 13:27:51 pve01 kernel: ct0 nfs: server 192.168.1.1 OK

and it stopped at 12GB and i had to abort the test
now i'm going to try the same with the 2 alternative NAS if the errors appears too

how can i mount the nfs-share manually so that the directory appears in the pve webgui for backup-tests?
thank you very much!
 
Unfortunately, the backup process hangs at ~9GB again (i haven't done a reboot of freenas, but i don' think this is necessary)
next thing i try is: start a backup on pve01 and wait till it hangs and than immediately start another backup on pve02 and lets see how the speed is from pve02 is

edit:
@dietmar! thanks, thats something i forgot completely, will test this too!
 
Hello Dietmar,
Is it fast when you backup to local storage?

I tried it with one of my biggest VMs (2 diskimages 100GB each). And even local backup is very slow. I took nearly 6hrs. My other proxmox box runs this backup in aprox 45 mins including nfs transfer.

Code:
INFO: starting new backup job: vzdump 111 --remove 0 --mode snapshot --compress 
lzo --storage local --node proxmox3
INFO: Starting Backup of VM 111 
(qemu)
INFO: status = running
INFO: mode failure - unable to dump into 
snapshot (use option --dumpdir)
INFO: trying 'suspend' mode instead
INFO: 
backup mode: suspend
INFO: bandwidth limit: 10000 KB/s
INFO: ionice 
priority: 7
INFO: suspend vm
INFO: creating archive 
'/var/lib/vz/dump/vzdump-qemu-111-2012_06_25-12_45_33.tar.lzo'
INFO: adding 
'/var/lib/vz/dump/vzdump-qemu-111-2012_06_25-12_45_33.tmp/qemu-server.conf' to 
archive ('qemu-server.conf')
INFO: adding 
'/var/lib/vz/images/111/vm-111-disk-1.raw' to archive 
('vm-disk-ide0.raw')
INFO: adding '/var/lib/vz/images/111/vm-111-disk-2.raw' 
to archive ('vm-disk-virtio0.raw')
INFO: Total bytes written: 214756756480 
(9.77 MiB/s)
INFO: archive file size: 69.00GB
INFO: resume vm
INFO: vm 
is online again after 20975 seconds
INFO: Finished Backup of VM 111 
(05:49:35)
INFO: Backup job finished successfully
TASK OK

I can't find anything on the box logs. The cpu load is moderate, iowaits are not unusual, all running VMs are responding well and even copying a file localy on that box works with reasonable speed.
 
in the meantime i tried a lot of things, even the directory storage with tcp mounted nfs storage hasnt changed anything

but i have upgraded the TEST-pve04 server with pve 1.9 to pve2.1 and yes, what should i say, here is it again:
Code:
Jun 26 10:29:14 pve04 pvestatd[1549]: WARNING: command 'df -P -B 1 /mnt/pve/nas01' failed: got timeout
Jun 26 10:29:14 pve04 pvestatd[1549]: status update time (8.484 seconds)

with 1.9 it worked and immediately after the upgrade it has the same problem :(

atm it looks like i have to downgrade to 1.9, but i don't want to
2.x is so much better with everything else
 
in the meantime i tried a lot of things, even the directory storage with tcp mounted nfs storage hasnt changed anything

but i have upgraded the TEST-pve04 server with pve 1.9 to pve2.1 and yes, what should i say, here is it again:
Code:
Jun 26 10:29:14 pve04 pvestatd[1549]: WARNING: command 'df -P -B 1 /mnt/pve/nas01' failed: got timeout
Jun 26 10:29:14 pve04 pvestatd[1549]: status update time (8.484 seconds)

with 1.9 it worked and immediately after the upgrade it has the same problem :(

atm it looks like i have to downgrade to 1.9, but i don't want to
2.x is so much better with everything else

I get exactly the same error over and over when I tried backing up to an NFS share, don't know why, I eventually gave up and back up to a second hard drive, then used wget to grab the backup dir contents quickly across to the backup server.
 
gz ewuweu that your problem could be solved :)
maybe some of you guys have another solution-idea for Erk and me

@tom or dietmar: do you have any clou why there is this difference between 1.9 and 2.x?
 
I'm having troubles with my backups also. It seems very similar to previous posters. I have a single node (soon to be more) that is doing daily backups of all VMs to a NFS. My NFS has a couple TB of free space. It had been working perfectly for a number of days with several windows VMs running (sizes are from a few GB up to 80GB). I then added another VM (just a basic Linux installation, little data) and it worked correctly the next time. However the second time the backup hung without completing at this point:
Code:
INFO:  starting new backup job: vzdump --quiet 1 --mailto me@domain.org  --mode snapshot --compress lzo --storage file-server --all 1
INFO: Starting Backup of VM 100 (qemu)
INFO: status = running
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO:   Logical volume "vzsnap-proliant-0" created
INFO: creating archive '/mnt/pve/file-server/dump/vzdump-qemu-100-2012_06_23-01_00_01.tar.lzo'
INFO:  adding  '/mnt/pve/file-server/dump/vzdump-qemu-100-2012_06_23-01_00_01.tmp/qemu-server.conf'  to archive ('qemu-server.conf')
INFO: adding '/mnt/vzsnap0/images/100/vm-100-disk-1.vmdk' to archive ('vm-disk-ide0.vmdk')
INFO: Total bytes written: 3559983616 (47.15 MiB/s)
INFO: archive file size: 1.30GB


The real problem is that when it hangs the CPU on all VMs drops to 0 and are unreachable on the network and I can't get them back without rebooting the node. This is a 24 hour operation so I need to get that sorted. I found a post on the forums that mentioned increasing the size in /etc/vzdump.conf so I increased it to 2048 after rebooting. The backup worked the next time. Then thinking about future growth I increased it to 10GB. It failed the next time. So now that it's been rebooted it may work again but I'd like some advice first. When I increase the size in vzdump.conf what does that change and what drive does it take space from? Is there any reason to increase it over a couple GB? Here's my output from lvdisplay:
Code:
  --- Logical volume ---
  LV Name                /dev/pve/swap
  VG Name                pve
  LV UUID                iysj5K-81dO-6U8T-Jvyy-Cxwh-sfFk-GxnobB
  LV Write Access        read/write
  LV Status              available
  # open                 1
  LV Size                31.00 GiB
  Current LE             7936
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:1
   
  --- Logical volume ---
  LV Name                /dev/pve/root
  VG Name                pve
  LV UUID                3Z3LzD-v3Gr-A7j0-BlKu-E4fV-5ZIS-138O6e
  LV Write Access        read/write
  LV Status              available
  # open                 1
  LV Size                96.00 GiB
  Current LE             24576
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:0
   
  --- Logical volume ---
  LV Name                /dev/pve/data
  VG Name                pve
  LV UUID                d65nmH-B8Je-Um01-5ubc-U0Qc-b1h5-24tHGK
  LV Write Access        read/write
  LV Status              available
  # open                 1
  LV Size                694.60 GiB
  Current LE             177817
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:2
   
  --- Logical volume ---
  LV Name                /dev/pve/vzsnap-proliant-0
  VG Name                pve
  LV UUID                a83YLX-VtJB-jExN-GDMK-75wU-NvXz-r2ueL3
  LV Write Access        read/write
  LV Status              available
  # open                 0
  LV Size                10.00 GiB
  Current LE             2560
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:3

Any thoughts on the matter? If you need more logs I'll be happy to provide them.
 
The size in vzdump.conf represents the amount of data that can change in the VM disk while the snapshot exists.
If more changes occur than size is set to then the snapshot will stop working and vzdump will hang.
This bug is already reported but so far no perfect solution has been found: https://bugzilla.proxmox.com/show_bug.cgi?id=183

The space for the size is taken from the free space in the volume.
Your volume needs to have free space >= (number of VM disks in the current VM being backed up) X (size in vzdump.conf)
If you do not have enough free space then creating the snapshots will fail.

pvscan will show how much free space you have on each volume.

Many of my servers have a snapshot size of at least 30GB just to be sure there are no problems.

You can read more about how LVM snapshots actually work here: http://tldp.org/HOWTO/LVM-HOWTO/snapshotintro.html
 
Thanks for the detailed response e100. My pvscan shows this:
Code:
  PV /dev/sda2   VG pve   lvm2 [837.59 GiB / 6.00 GiB free]
  Total: 1 [837.59 GiB] / in use: 1 [837.59 GiB] / in no VG: 0 [0   ]
So I'm guessing from that output that 6GB is the max I can currently set in my vzdump.conf? Should I use lvreduce to free up more space? There is only 140GB of data. I'm a little puzzled as to why I'm running out of space... the backup takes place during the night when there's very little data changing
 
yesterday afternoon i made from the test-pve04 server another freenas server with 8.0.4.p3
as of the disk-space there is limited i tried it there with lzo compression and the first backups went quite good with very low error output
with time and increasing VMs backuped/decreasing space on the nas the errors appeared more often and often and once the backup hung up too
so the problem is on another server too, but ok, this one is already really old and the performance could be the bottleneck

i think about to try another zfs solution like Illumian + napp-it on the test-server if the errors occur than again or not
what do you think about this?
 
how is your zfs set up?

what % of disk space is in use?

have you been following http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide ? especially this section: Storage Pool Performance Considerations . note this : Keep pool space under 80% utilization to maintain pool performance. Currently, pool performance can degrade when a pool is very full and file systems are updated frequently, such as on a busy mail server. Full pools might cause a performance penalty, but no other issues. If the primary workload is immutable files (write once, never remove), then you can keep a pool in the 95-96% utilization range. Keep in mind that even with mostly static content in the 95-96% range, write, read, and resilvering performance might suffer.


Also , There are NFS differences between Debian Lenny and Squeeze . you could try using iscsi on zfs to see there is something wrong with nfs settings.

I use freenas and tested iscsi and nfs. Our kvm tests work the same speed in both. So we will use nfs as it is a lot easier to set up.
 
the productive nas01 server has about 8,2 TB storage (5x3TB raidz2) and currently between 3 and 6 TB are free (i deleted the last days a lot of old backups from it)
there is no compression active and completely default settings from freenas

on the test-nas02/pve04 server i've setup a illumian 1.0 with napp-it and did a raidz2 with 4x 146gb for testing (zfs v15), atm the second backup got ready.
speed was between 25 to 35mb/s and i think thats ok for the old server it is (specs can be seen at the bottom of the first post, the dell poweredge 2950)
in this 2 backups i got only one "failed" error, but the first two were very small VMs with about ~15gb, the next one will be a bigger one with ~80gb

if this state continues i'm thinking about the change my nas01 server from freenas to illumian and napp-it, but time will tell
i still can't unterstand why these errors appear:confused:

i'm going to have a look at iscsi, but at the moment i'm completely new to this topic, do you have maybe some sort of guide for it at hand (freenas)?
thanks
 
Last edited:
regarding iscsi - I set it up one time and am still using for a kvm. how to set up is on freenas wiki I think

for pve, you need to add the iscsi storage,
then create lvm storage on the iscsi

the lvm on iscsi is where images,backups etc go to.

One more question -
I've used the same zfs pool for debian kfreebsd, freebsd and zfs on linux.

Have your re installs used the same pool?
 
Have your re installs used the same pool?

not yet
but thats also something i think about to do
particularly it would be done very fast and without much work (just some downtime of the nas)

the big backup just got ready
Code:
Jun 27 15:45:52 INFO: Starting Backup of VM 202 (qemu)
Jun 27 15:45:52 INFO: status = running
Jun 27 15:45:53 INFO: backup mode: snapshot
Jun 27 15:45:53 INFO: ionice priority: 7
Jun 27 15:45:53 INFO:   Logical volume "vzsnap-pve02-0" created
Jun 27 15:45:53 INFO: creating archive '/mnt/pve/pve-test/dump/vzdump-qemu-202-2012_06_27-15_45_52.tar'
Jun 27 15:45:53 INFO: adding '/mnt/pve/pve-test/dump/vzdump-qemu-202-2012_06_27-15_45_52.tmp/qemu-server.conf' to archive ('qemu-server.conf')
Jun 27 15:45:53 INFO: adding '/mnt/vzsnap0/images/202/vm-202-disk-1.raw' to archive ('vm-disk-ide0.raw')
Jun 27 15:56:29 INFO: adding '/mnt/vzsnap0/images/202/vm-202-disk-2.raw' to archive ('vm-disk-ide1.raw')
Jun 27 16:21:37 INFO: Total bytes written: 79910823424 (35.54 MiB/s)
Jun 27 16:21:37 INFO: archive file size: 74.42GB
Jun 27 16:21:40 INFO: Finished Backup of VM 202 (00:35:48)

thats something i could live with and i got no errors in the syslog
 
using the same pool is too easy.

it is just suposed to be 'exported' 1-st. But I did not know that when I just did an import.

I have not tried that from freenas menu, but it is worth a try

or search pool import zfs . it is easy from cli.
 
i will do that export - reinstall - import thing later
the next two days i have holiday, so i will report as soon as possible

maybe the proxmox team has another hints why 2 of 3 pve server doesn't like this freenas server
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!