High server load during backup creation

Nico Haase

Member
Feb 27, 2013
31
1
8
Darmstadt, Germany
Hi there,
our proxmox server is creating heavy load in a virtual machine while it is backuped. Starting with the backup (we are using LVM as storage, the backup is stored on a NFS share), the load within the VM rises from less than 0.5 to over 6 within thirty minutes. During the backup, the load stays above 2. Is there anybody experiencing the same problem? Which other information can I give you to help solve this problem?
Regards
Nico
 

Nico Haase

Member
Feb 27, 2013
31
1
8
Darmstadt, Germany
Here's the output of pveperf:
Code:
root@proxmox:~# pveperf
CPU BOGOMIPS:      31997.12
REGEX/SECOND:      817650
HD SIZE:           94.49 GB (/dev/mapper/pve-root)
BUFFERED READS:    112.32 MB/sec
AVERAGE SEEK TIME: 9.16 ms
FSYNCS/SECOND:     2411.21
DNS EXT:           33.75 ms
DNS INT:           1.01 ms (mydomain)
Our server is a HP Proliant DL180G6 with three drives for the proxmox section and nine drives for the VMs, each 500GB large (this one), both in a RAID 5. CPU: Intel E5504, quad-core at 2 GHz. 48 GB RAM. It is connected through Gigabit to our backup server with these mount options:
Code:
10.162.32.7:/backup/hd2/proxmox on /mnt/pve/ninja type nfs (rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.162.32.7,mountvers=3,mountport=39276,mountproto=udp,local_lock=none,addr=10.162.32.7)
 

jinjer

Member
Oct 4, 2010
194
5
18
It's very easy to saturate nfs with the amount of writes that the backup creates. To simulate the same load try dd if=/dev/zero of=/mnt/nfs/out bs=1M
 

Nico Haase

Member
Feb 27, 2013
31
1
8
Darmstadt, Germany
Do you think there is a problem with NFS? To clarify that once more: the high load occurs within the VM while the backup is running, not on the hosting proxmox server
 

e100

Active Member
Nov 6, 2010
1,235
24
38
Columbus, Ohio
ulbuilder.wordpress.com
It could be the NFS causing the problem.
I was researching this the other day and found an article about a new way of handeling the IO queue for block devices: http://lwn.net/Articles/552904/
One of the things that is mentioned, what I was looking for, is:
This request queue turns out to be one of the biggest bottlenecks in the entire system. It is protected by a single lock which, on a large system, will bounce frequently between the processors.
That is what I see, when IO stalls on one device it can and often does stall IO to other devices that might be working perfectly fine.
I've seen this with NFS, NFS server dies while there are pending IO requests, load climbs, other IO becomes slow or stalled too.

The backup also introduces a lot of reading while your VM might want to read or write to some other place. The effect here is that during the backup you are doing random IO which is not good for performance.
 

Nico Haase

Member
Feb 27, 2013
31
1
8
Darmstadt, Germany
Ah, okay - I think I see where you are going to. My VM is also writing to the backup server using NFS, and it could stall because the host is already writing tons of stuff there? Lets see, I changed the execution times of the backups from within my VM....
 

mir

Well-Known Member
Apr 14, 2012
3,489
97
48
Copenhagen, Denmark
Just a suggestion. Could you try temporarily to configure a NFS server which is just for backup purposes and then change your backup job to use this NFS server? I myself have a Qnap which I use for backup storage via NFS, hosting one redundant low-load DNS/DHCP server, and acting as Quorum disk via iSCSI. When my backup jobs run once a week - full backups of 12 VM's and 3 CT's, I see backup speed between 30-60 MB/s and no measurable IO rise on the servers. Maybe the problem is high load on the NFS server apart from the backup job.
 

Nico Haase

Member
Feb 27, 2013
31
1
8
Darmstadt, Germany
The backup server is only storing backups, doing nothing else. But it is used in two ways: as a backup storage for full VM backups done through Proxmox, and additionally as a storage for backups done from within one VM. These backups are done using backupninja and rsync, but I configured them to run not in parallel with the VM backups - without success :(
 
Dec 22, 2010
78
0
6
Not sure what you talk about here.
I thought during the backup, instead of a direct write to the virtual disk the following happens:
The block is read and written to the backup before the write is done and reported successful?

But I could misinterpreted the discussion on the devel list (and I read this last year..) and we stopped using the integrated backup because we had high system load with this, too. Especially when the backup storage slows down. With the old one this wasn't depended to each other..

New backup avoids any unneeded IO, so it should perform much better than lvm snapshots.
We do incremental block based backups with LVM snapshots where only changes to the last backup were written to the backup storage. So there is much less IO and even when to storage connectivity slows down there is no impact to the running VMs ;)

esco
 
Dec 22, 2010
78
0
6
As we still use the backup with LVM snapshots (we set up the server more than two years ago and did not change the core setup), is there any way to make use of what you call the "new backup"?
But you updated your system? You said that the version is "3.1-21"? So you already using the "new" (since 2.3) implementation?

esco
 

Nico Haase

Member
Feb 27, 2013
31
1
8
Darmstadt, Germany
Yes, we use a current system, but the log starts with the following lines:
Code:
vzdump 111 --quiet 1 --mailto @adress --mode snapshot --compress lzo --storage nfs --node proxmox

111: Nov 06 22:50:02 INFO: Starting Backup of VM 111 (qemu)
111: Nov 06 22:50:02 INFO: status = running
111: Nov 06 22:50:02 INFO: update VM 111: -lock backup
111: Nov 06 22:50:02 INFO: backup mode: snapshot
111: Nov 06 22:50:02 INFO: bandwidth limit: 150000 KB/s
111: Nov 06 22:50:02 INFO: ionice priority: 7
111: Nov 06 22:50:02 INFO: creating archive '/mnt/pve/ninja/dump/vzdump-qemu-111-2013_11_06-22_50_02.vma.lzo'
111: Nov 06 22:50:02 INFO: started backup task 'aea89fac-12e6-4e6a-9adb-310d7ec188a3'
111: Nov 06 22:50:05 INFO: status: 0% (163774464/805306368000), sparse 0% (8937472), duration 3, 54/51 MB/s
111: Nov 06 22:53:16 INFO: status: 1% (8101953536/805306368000), sparse 0% (159510528), duration 194, 41/40 MB/s
111: Nov 06 22:57:20 INFO: status: 2% (16145907712/805306368000), sparse 0% (287911936), duration 438, 32/32 MB/s
So, I think I'm still using a LVM snapshot during the backup execution?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE and Proxmox Mail Gateway. We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!